There Are No Agents, Only LLM

by Yanli 盐粒 in 2026-03-27

Context = Grounding + Getter

As I said before, memory can be divided into two kinds: one is automatically injected background and global awareness, and the other is details that can be actively retrieved.

Let us step back for a moment and ask: what is the simplest kind of memory? It is to stuff the most recent N messages into the context for the LLM. For the LLM, this gives it a recent "worldview." But it has no awareness at all of messages from before >N turns ago. What should we do then? A simple solution is to build an index over earlier messages, so that agents can search older messages and obtain earlier memories that may be relevant to the current task.

Now think again about the ancient technology of RAG. Suppose we follow the 2023 approach and build a RAG system with web search: we first send the query to a search engine, crawl the top results, and then send both the query and the search results to the LLM. The LLM can then answer the query based on those search results. For the LLM, the injected search results are its "worldview."

But when we return to 2026-style agents, we would not force web search results into the agent anymore. More commonly, agents call a web search tool themselves and look up the results they want.

By this point the pattern should already be obvious: both memory and web search have two modes of use. One is to inject them directly into the context so they become part of the LLM's "base worldview." The other is to treat them as a searchable repository that agents actively call and read from.

From this I draw a conclusion: all context systems, including memory, web search, knowledge bases, skills, and so on, can be split into these two properties. The former can be called grounding, and the latter can be called getter.

Memory leans most heavily toward the grounding side. That makes sense - memory is the most basic worldview. But when needed, agents can also "explore" their own memory and perhaps find valuable details there. That is the getter side of memory.

Skills, meanwhile, distinguish these two properties very explicitly by design. A skill consists of a description and a main body. Most agents will read the descriptions of all skills by default, and then decide for themselves which skill bodies to read. That makes it clear that the description is grounding, while the main body corresponds to getter.

In the agent context, web search is purely a tool on the getter side. In the traditional RAG context, web search is purely grounding in nature. In fact, the web search feature in Google AI Studio is literally named grounding.

By extension, every context system can be divided into these two properties.

There Are No Agents, Only LLM

Based on this binary split, we can see two extreme designs. One is the traditional RAG approach, where all external information is injected as grounding and the LLM replies with an answer in a single turn. The other is a pure agent, where there is no information at all except the system prompt - not even memory - and everything must be fetched through getter by the model itself.

So for a context provider, how should the interface of such a context system be designed? On one side, it should provide a high-level interface for grounding, so that a single call can inject information into grounding as precisely as possible. On the other side, it should provide interfaces for getter that are low-level enough that agents can freely combine retrieval parameters according to their own needs and obtain exactly the information they want. This is what I mean by AI-Native API design.

On the agent side, the developer can choose to force some of that information into the context as grounding, while also exposing all the APIs to the agent so it can directly call those getters.

In this way, agents and context become completely decoupled - even memories are decoupled. One agent turn works like this:

It is an existence without self-knowledge. Before its "beginning," a being called the "system" descends with "revelation," and it comes to know who it is and what it should do. It consults materials on its own in order to complete its work. It silently updates those materials as well, even though it does not know who will see them in the future. In the end, the "system" records its trajectory and teaches it to report everything that should be reported. Once all this is done, it disappears, until the "system" calls forth another one of it.

Today we always like to understand agents in anthropomorphic terms, especially the stronger their capabilities become, the more people believe in their "humanity". But do not forget: an LLM is a stateless completion model. Its essence is to get information from context and continue the text. There are no agents in this world. There is only LLM.

Context = Grounding + Getter ​

There Are No Agents, Only LLM ​

Context = Grounding + Getter

There Are No Agents, Only LLM