What is AutoGPT? What are its application scenarios? - Cykapu

What is AutoGPT? What are its application scenarios?

In the past two days, Auto-GPT, a model that allows the strongest language model GPT-4 to complete tasks autonomously, has driven the entire AI circle crazy.

The only thing that is not very easy to use in ChatGPT, which has exploded before, is that it needs a human to prompt.

And a major breakthrough of Auto-GPT is that AI can prompt itself, that is to say, this AI does not need us humans at all.

In just seven days, it has gained an astonishing number of stars on GitHub (it has exceeded 50,000), and has attracted the attention of countless open source communities.

Project address: https://github.com/Torantulino/Auto-GPT?ref=jina-ai-gmbh.ghost.io

How popular is Auto-GPT? In just a few days, it equaled the 11-year-old star of the Bitcoin project.

However, while carnival for Auto-GPT, it is also necessary for us to step back and examine its potential shortcomings, and discuss the limitations and challenges faced by this "AI prodigy".

How does Auto-GPT work?

It has to be said that Auto-GPT has made huge waves in the field of AI. It is like endowing GPT-4 with memory and entities, allowing it to independently cope with tasks, and even learn from experience to continuously improve its performance.

To facilitate how Auto-GPT works, let's break it down with some simple metaphors.

First, imagine Auto-GPT as a resourceful robot.

Every time we assign a task, Auto-GPT will give a corresponding solution plan. For example, if it needs to browse the Internet or use new data, it will adjust its strategy until the task is completed. It's like having a personal assistant that can handle various tasks like market analysis, customer service, marketing, finance, etc.

Specifically, to make Auto-GPT work, you need to rely on the following four components:

Auto-GPT is built using powerful GPT-4 and GPT-3.5 language models, which act as the robot's brain, helping it think and reason.

Autonomous iteration:
It's like a robot's ability to learn from its mistakes. Auto-GPT can look back at its work, build on previous efforts, and use its history to produce more accurate results.

Memory management:
Integration with Vector Database, an in-memory storage solution, enables Auto-GPT to preserve context and make better decisions. It's like equipping a robot with a long-term memory that can remember past experiences.

Auto-GPT's features such as file manipulation, web browsing, and data retrieval make it versatile. It's like giving robots multiple skills to handle a wider range of tasks.

However, these tantalizing prospects may not yet translate into the real achievable capabilities of Auto-GPT.

sky-high cost

If you want to use Auto-GPT in a real production environment, the first obstacle you face is its high cost.

Since the task needs to be completed through a series of thinking iterations, in order to provide better reasoning and prompts, each step of the model usually uses up all tokens.

However, GPT-4 tokens are not cheap.

According to OpenAI, the GPT-4 model with an 8K context window charges $0.03 per 1000 tokens for the hint part and $0.06 per 1000 tokens for the results part.

And 1000 tokens can be converted into about 750 English words.

Let's break down the cost of each step in the thought chain, assuming that each action uses up a context window of 8000 tokens, of which 80% are hints (6,400 tokens) and 20% are results (1,600 tokens).

Tip cost: 6,400 tokens x $0.03 / 1,000 tokens = $0.192

Resulting cost: 1,600 tokens x $0.06 / 1,000 tokens = $0.096

Therefore, the cost per step is: $0.192 + $0.096 = $0.288

On average, Auto-GPT takes 50 steps to complete a small task.

Therefore, the cost of completing a single task is: 50 steps x $0.288/step = $14.4

VUEGPT THOUGHTS: Let's start by checking if there are any updates to VueJS. If there are, we can update to the latest version and proceed. Otherwise, we can move on to creating the TODO list website application.
REASONING: Starting with the most updated and stable version of the framework will ensure our project has the latest features, bug fixes and is properly supported. Creating the TODO list website application is our primary goal, so we can move on to that if there are no updates.
- Check for VueJS updates
- Update to latest version if there are updates
- If no updates, move on to create the TODO list website application
NEXT ACTION: COMMAND = google ARGUMENTS = {'input': 'VueJS latest version update'}
Enter 'y' to authorize command, 'y -N' to run N continuous commands, 'n' to exit program, or enter feedback for VueGPT...  

Take VueGPT as an example: this is an AI created by Auto-GPT, designed to use Vue JS to create website applications, let's take a look at a step in its thinking chain

And this is the case where the result can be produced once, if it needs to be regenerated, the cost will be higher.

From this perspective, Auto-GPT is currently unrealistic for most users and organizations.

development and production

At first glance, spending $14.40 for a complex task might seem like a no-brainer.

As an example, we first let Auto-GPT make a Christmas recipe. Then, ask it for a Thanksgiving recipe, and guess what?

That's right, Auto-GPT will do it all over again according to the same chain of thought, that is to say, we need to spend another $14.4.

But in fact, there should be only one difference in the "parameters" between these two tasks: festivals.

Now that we have spent $14.40 developing a method to create a recipe, it is obviously illogical to spend the same amount on tuning the parameters.

Imagine playing Minecraft and building everything from scratch every time. Obviously, this would make the game very boring

And this exposes a fundamental problem with Auto-GPT: it cannot distinguish between development and production.

When Auto-GPT achieves its goals, the development phase is complete. Unfortunately, we have no way to "serialize" this series of operations into a reusable function for production.

Therefore, users have to start from the starting point of development every time they want to solve a problem, which is not only time-consuming and labor-intensive, but also expensive.

This inefficiency raises questions about the usefulness of Auto-GPT in real-world production environments and highlights the limitations of Auto-GPT in providing sustainable, cost-effective solutions to large-scale problem solving.

cycle of quagmire

Still, if $14.4 does the trick, it's still worth it.

But the problem is that when Auto-GPT is actually used, it often falls into an endless loop...

So, why does Auto-GPT get stuck in these loops?

To understand this, we can think of Auto-GPT as relying on GPT to solve tasks using a very simple programming language.

The success of solving a task depends on two factors: the range of functions available in the programming language and the divide and conquer capability of GPT, i.e. how well GPT can decompose the task into a predefined programming language. Unfortunately, GPT falls short on both of these points.

The limited functionality provided by Auto-GPT can be observed in its source code. For example, it provides functionality for searching the web, managing memory, interacting with files, executing code, and generating images. However, this restricted feature set narrows the range of tasks that Auto-GPT can effectively perform.

In addition, the decomposition and reasoning capabilities of GPT are still limited. Although GPT-4 has a significant improvement over GPT-3.5, its reasoning ability is far from perfect, further limiting the problem-solving ability of Auto-GPT.

The situation is similar to trying to build a game as complex as StarCraft using Python. While Python is a powerful language, breaking down StarCraft into Python functions is extremely challenging.

Essentially, the combination of the limited feature set and GPT-4's constrained reasoning capabilities ended up creating a quagmire of this cycle, making Auto-GPT unable to achieve the desired results in many cases.

The difference between humans and GPT
Divide and conquer is the key to Auto-GPT. Although GPT-3.5/4 has significantly improved over its predecessors, its reasoning ability still cannot reach human level when using divide and conquer.

Insufficient decomposition of the problem:
The effectiveness of divide and conquer depends heavily on the ability to decompose a complex problem into smaller, manageable sub-problems. Human reasoning can often find multiple ways to break down problems, while GPT-3.5/4 may not have the same level of adaptability or creativity.

Difficulty in identifying suitable base cases:
Humans can intuitively choose appropriate base cases for efficient solutions. In contrast, GPT-3.5/4 may struggle to determine the most efficient base case for a given problem, which significantly affects the overall efficiency and accuracy of the divide-and-conquer process.

Insufficient understanding of the problem background:
While humans can use their domain knowledge and background understanding to better deal with complex problems, GPT-3.5/4 is limited by its pre-trained knowledge and may lack the background information needed to effectively solve some problems with divide and conquer.

Handle overlapping subproblems:
Humans can often recognize when solving overlapping subproblems and strategically reuse previously computed solutions. Whereas GPT-3.5/4 may not have the same degree of awareness and may redundantly solve the same subproblem multiple times, resulting in a less efficient solution.

Vector DB: An Overkill Solution
Auto-GPT relies on vector databases for faster k-nearest neighbor (kNN) searches. These databases retrieve previous chains of thought and incorporate them into the context of the current query in order to provide GPT with a sort of memory effect.

However, considering the constraints and limitations of Auto-GPT, this approach has been criticized as excessive and unnecessarily resource consuming. Among them, the main argument against using a vector database stems from the cost constraints associated with the Auto-GPT chain of thought.

A 50-step thought chain will cost $14.40, while a 1,000-step chain will cost more. Consequently, memory sizes or the length of thought chains rarely exceed four digits. In this case, an exhaustive search of nearest neighbors (i.e. the dot product between a 256-dimensional vector and a 10,000 x 256 matrix) proved to be efficient enough, taking less than a second.

In comparison, each GPT-4 call takes about 10 seconds to process, so it is GPT, not the database, that is actually limiting the system's processing speed.

Although vector databases may have some advantages in certain scenarios, implementing a vector database in an Auto-GPT system to speed up kNN "long-term memory" searches seems like an unnecessary luxury and an overkill solution.

The Birth of the Agent Mechanism

Auto-GPT introduces a very interesting concept that allows generating agents to delegate tasks.

Although, this mechanism is still in its infancy, and its potential has not been fully tapped. Still, there are ways to enhance and extend current agent systems, opening up new possibilities for more efficient and dynamic interactions.

Significant efficiency gains can be achieved with asynchronous agents

A potential improvement is to introduce asynchronous agents. By incorporating the async-wait pattern, agents can operate concurrently without blocking each other, which significantly improves the overall efficiency and responsiveness of the system. The concept was inspired by modern programming paradigms that have adopted an asynchronous approach to managing multiple tasks simultaneously.