October 03 2024

OpenAI's Movement in AI Agent Products



blogGuantum @PuppyAgent founderblog

OpenAI, a frontrunner in large language model, has also been pioneering the implementation of agent-based products. Throughout the past year, OpenAI has unveiled multiple agent products. I will examine the progression of OpenAI's agent products and their influence in this article.

Plugin Store

The earliest commercial application of agents originated from OpenAI's Plugin store launched in April 2023. Users could specifying up to three plugins an agent could access and assist in chatting with users.

Initially released in early April 2023, it was regard as the next-generation app store. However, following user numbers did not meet expectations. Finally it was shut down in November 2023 and replaced by GPTs and All-tools.

GPTs

GPTs was launched as the agent store on Dev Day in November 2023. It focused on the ability to quickly develop and deploy one's own agent through natural language dialogue, targeting ToC scenarios. Initially, it met with high expectations.

However, it was later proven that GPTs cannot meet the needs of in-depth development and cannot handle complex scenarios, whereas simple scenarios can be completely solved using all-tools agent.

Finally, GPTs became more like a tool for startups to attract users to their products, rather than a cornerstone for a thriving ecosystem.

All-tools Agent

All-tools agent was launched on Dev Day in November 2023. It integrated three tools: web searching, code interpreting, and DALLE-3. Additionally, it implicitly included web browsing and local RAG.

These tools integrations, by transferring the cost of configuring and selecting tools onto the model, removes the need for users to manually set up their own tools, objectively decrease user's effort.

However, OpenAI secretly removed the web searching tool from All-tools agent in mid-2024 for selected users. This AB test was so subtle that many people did not notice it.

This change was possibly because using search in scenarios where it was not intended could lower the quality of responses. Moreover, recognizing users' intent on whether to use search for answering questions proved challenging. Therefore,OpenAI's product approach to this issue remains ambiguous.

This may suggest that there is differentiation between two kinds of products: AI search engine and ChatBot.

Code interpreter (Data analysis)

Code interpreter was launched in early July 2023, featuring the ability for ChatGPT to run code automatically after finishing it. If the code results in an error, it automatically generates new code based on the error and attempts to run it again. If there are three or more consecutive errors, a response of 'Unable to complete the task' will be given. Code interpreter can automatically debug to some extent, making it quite practical.

Subsequently, Code interpreter was renamed to Data analysis. Many user experience enhancements were made around data analysis scenarios. For example, images generated during data processing can be enlarged and centered, and the chat bar has been moved to the sidebar, allowing users to chat with the figure.

OpenAI-o1

OpenAI-o1 is not a product, but a model. This model was released in September 2024, aiming to solve complex problems by increasing inference costs and using Chain of Thought (CoT). It requires thinking before providing an answer, so it can be considered a form of agent, deciding what to think about next based on its earlier thoughts. This step-by-step thinking helps the model solve more difficult problems. This methodical approach enables the model to tackle complex problems that were previously challenging to solve.

Summary

OpenAI has tried many agent scenarios. Plugin store and GPTs have mixed results, not quite a success. In contrast, the Code Interpreter and All-tools Agent have shown considerable practical utility.

Plugin store VS All-tools agent

An agent relies on tools for work must have enough model-layer data about these tools. Simply using prompts to call tools, as done in the plugin store, is currently ineffective with the model's capabilities.

All-tools agent (with web search) VS All-tools agent (without web search)

Even a pioneer like OpenAI struggles with deciding when to use a search engine for support in general scenarios. Currently, OpenAI has simply handed this issue over to users partly (deciding to use ChatGPT or SearchGPT), which may also mean opportunities for some AI searching startups.

Code interpreter VS GPTs

A practical agent requires more than just prompt-level programming. It necessitates numerous decision-making mechanisms, which are often achieved by code rather than natural language prompt. Ironically, OpenAI's own assistant platform can't create an agent as advanced as the code interpreter.