Agents Need “State” Instead of “Memory”
As I mentioned in this article, when we design an agent, what we ultimately want is an omnipotent super function: given input, obtains output, and produces some side effects in the middle.
But as programmers, we all know that functions are still a relatively low-level abstraction and are not expressive enough in business scenarios. One of the main reasons for this is that the function itself does not maintain state, it can only change external state by producing side effects. In order to solve this problem, modern software engineering has two solutions: one is continuation or coroutine, and the other is object or OOP. The latter is obviously more popular due to its closer proximity to business modeling.
So how do we upgrade the agent from "super function" to "super object"? Friends who are familiar with large model applications may have already thought of adding memory to the agent.
When using the word memory, we might as well think about it, what is it worth remembering? For example, as an end user, I hope that my chatbot can remember every detail of every conversation between us. This is a kind of memory. But if my agent wants to handle a specific task, it calls tools one after another, and it uses memory to remember each specific output of each tool. Is this valuable?
From this point of view, the output of tools is more like a log to the agent, and the answers the agent really needs are hidden in the log. The information that will affect the next round of actions of the agent is the information that the agent needs to remember internally.
Therefore, compared to memory, we should call it another word: state. During each round of running, the agent first makes decisions and calls tools based on its own state, then updates its state based on the tool output, and then enters the next round of running. Just like a pathfinding robot needs to know its position in the maze and the previously explored routes, and then make its next move accordingly.
What the word memory describes is that the "all observed history" of the agent is completely regarded as the "state" of the agent. This is really the most general understanding when we talk about AGI agents operating in an open world.
But if we want a problem-solving "industrial robot" agent, this kind of memory is neither consistent with the software philosophy we appreciate, nor technically questionable. What the "industrial robot" agent needs to design is state.