The Value of Specialized Models

April 28, 2023 · 3 min read

Co-founder & CEO of Dev Tools

Robot looking at a plank

The Rise of Super Models

With the recent launch of Meta's Segment Anything Model we see a new image super model that can segment photos into every object on the screen. Even when there are 50+ objects that start to blend together, the model can still segment them out. The craziest part is also the speed at which this can be done. It is so fast that it is being used for videos already. This is a huge step forward for computer vision!

In the photo below you can see the model segment out VCRs, TVs, speakers, even the shelf. It feels like a super model that can do everything. Segment anythign example photo

Super Models are Still Limited

While there are tons of examples on their website, and you can play with it yourself, it is still not perfect. A common use case for developers would be to use this model within the context of a webpage. Lets take the New York Times homepage:

NyTimes Homepage

With this model, you can see that it misses large chunks the page, including article titles, date, and even menu items. This is a huge limitation for developers wanting to use this model within their automation framework.

Specialized Models are still valuable

Despite being less generalizable, specialized models are still going to be very valuable for developers. While super models are great, and tremendous steps forward in the field, they are not going to be able to handle every use case. This is where specialized models come in. They are able to handle the edge cases that these super models can't.

The New York Times homepage example before, using a specialized modelm trained by Dev Tools AI, on hundreds of apps & webpages, can segment all the core elements: Dev Tools AI Segmentation

With a specialized model it is able to get all the objects on the page, and in the images. It can also detect what are buttons like the search icon & menu. This can be used both for automation, and for accessibility in apps where there isn't accesibility inherently (lack of alt tags, inaccesible doms for screen readers, etc).

Conclusion

While this is just one example in the computer vision field, this will apply to all of the new super models. These super models are good at many things at once, but are ultimately overly generic. There will be a specialized model that can out perform GPT on medical or legal questions for example, but will be at the cost of not being able to answer questions on art history as well. Specialized models, for specific domains, can be applied when needed & will be the key to unlocking the full potential of AI.

Dev Tools AI will continue to lead innovation in computer vision for Web & Mobile apps, and only continue to improve. To try it out, sign up today!

The human intelligence layer

April 24, 2023 · 4 min read

Etienne DEGUINE

Co-founder & CTO of Dev Tools

Two intelligent life forms connecting

The human intelligence layer

As we move to a world where more and more content is generated by AI, in particular code, we want to think about what successful tech companies will look like. We think it will be the companies that empower, refine and measure the feedback loop between humans and AI.

AI and humans

The strength of the AI is in its ability to leverage very large corpuses of data, knowledge and recognize patterns. It is also starting to develop reasoning faculties to derive non-trivial insights or solve complex problems.

On the other hand, humans have access to sensory input, the larger context of how the real world relates to the product being built, they also have a great ability to identify, analyze and summarize a problem, specify it in terms the AI can understand, assess the quality of AI output and guide the AI in the direction where the project should go.

When we think about a tech company today, the code is the ground level. It is an artifact that is the result of a cumulative series of functional specifications, design principles and implementation choices.

We are already observing the trend where a decent percentage (30+%) of code is AI generated today through tools like Github Copilot, or ChatGPT.

We are also witnessing that AI content is becoming commodified. Compute prices are going down, AI models are being open sourced, for instance by Stability AI. If these two trends continue, the price of AI generated content will go down most likely to the marginal value of electricity and hardware.

So then, if code is not expensive, where is the value captured in the tech company of the future.

Capturing value from AI generated content

We believe the value is going to be what we call the “human intelligence layer”. It’s the human part of the feedback loop between AI and humans that will have the most value. The feedback to the AI, the orientation given for development, the constraints, requirements and specifications given to the AI.

The way the human brain works and organizes the work of AI is highly valuable for two reasons.

First, the laws of the natural world evolve slowly, mathematical principles or philosophical insights from Ancient Greece still work today. Developing a new technology or product is in great part a work of information collection and distillation. So capturing that process and how it interfaces with AI is going to be valuable.

Second, humans still have an advantage of creativity, innovation and the ability to imagine something “out of nothing”. This spark of creation in the human mind is also very valuable. We do not know what we do not know, but we like to imagine what it is.

Example of the human intelligence layer in Midjourney

Let me solidify that thesis with an example. When we think about Midjourney AI and what they built, we do not think the valuable asset of that company are the generated artworks. Sure they are nice to look at, but what we think is really valuable is the millions of prompt that were inputted in the system.

Think about it, Midjourney has released 5 models in one year, every time the output got better, so really the natural behavior is to rerun old prompts to see how they improve, and I think that is the proof that the valuable asset here are the prompts, not the renderings.

On top of that Midjourney tracks the reactions to the image output (in the forms of emojis) as well as the sequence of prompts that progressively refine the initial prompt.

This asset of human input, human feedback, human refinement is an example of the intelligence layer for Midjourney.

What we are building

So we have that insight of the human intelligence layer, and we want to apply it to the process of creating a technology or product through code.

We are starting simple with code reviews to which you can reply or react, but we want to keep getting deeper and help companies capture the value created from the interaction between AI and humans.

In the future you could imagine that each company will have its own dataset made of decisions and insights from all its current and former employees that reflects the values and context of that company in their industry.