The 1st Memory Mode and the 2nd Memory Mode: Kapy's Memory System

by Yanli 盐粒 in 2026-03-14

Kapy (short for Kapybara) is a general-purpose agent I built during the Lunar New Year holiday. Since my slightly-longer-than-other-people's holiday (thanks to dify.ai) is about to end, I am writing this devlog first. The polished GitHub repo will come later.

I have been thinking about memory system design for a long time. I think anyone who has built a chatbot has probably thought about it too. But I never formed a clear conclusion. This time I used the opportunity to push myself and finally produce my own answer. In fact, building Kapy was the period in which I hand-wrote the most code recently. Hand-writing code is like being a sculptor, at least in my imagination, since I do not know what real sculptors are actually like. Ideas are refined in the process of carving.

A Unified Architecture

Now to the point. From the very beginning I decided that this memory system should be unified. I did not want thread-based splitting. I wanted the raw information to be the single source of truth. At that point, agents including bub had already shown this path worked, so for me this was not just an idea. It was a completely clear direction.

Once the unified architecture was decided, the application layer still needed compacting. I have long believed that turns matter more than threads:

Many tasks are turn-based. Search tasks, for example, often do not have follow-up questions.
In the agent context, a turn is not just an input-output pair, but: input -> tool call loop -> output
I had already said on Twitter that the tool call loop is not memory. It is a work log. The interaction memory between agent and human is almost unrelated to the details of the tool call loop.

So the compacting scheme was also naturally decided: compact the tool call loop within one turn as the smallest unit, producing one compact node. Then the basic memory unit took shape: one memory record corresponds to one turn, containing its raw work log and its compact node.

DAGs and Causal Chains

So if we do not want threads, how should turns be connected? I immediately thought of git, which means a DAG. This shape is very reasonable: edges represent sequence and causality, and if a memory record points to parents, that naturally forms a causal chain.

Then how should parents be attached? I tried different approaches. For example, should I just force them together using thread? But if the agent actively queries its own memories, and not just the recent ones, doesn't that also become causality? In the end I thought this could simply be decided by the agent itself: after finishing the task, tell me which past memories you referred to, and I will persist that DAG structure.

Read/Write Logic and Retrieval

Good. The write path was decided, so what about reading? First, basic recent memory is obviously necessary. Then the agent should query past memories it may care about on its own, including specific past work logs. At that point I had already realized how pleasant it is to build on the filesystem. I stored memory records in the filesystem and then wrote a small script.

This script has three retrieval paths:

Thread dimension: retrieve recent memory records filtered by thread, naturally serving as recent contextual memory.
User dimension: Kapy identifies different human users and retrieves accordingly.
Keyword dimension: depend on the keywords argument passed in by the agent and retrieve past memories by keywords.

Although this script is called by the agent itself, it is really a semi-fixed injection. I require that in every turn, the first tool call at the very beginning must use this script.

With this basic memory, the agent can at least understand the context. After that, the agent may think of valuable details it needs to inspect: maybe after reading the compact it still needs to check the raw logs, or maybe a new keyword appears and it needs to look through past memories again. Remember that memory records are all stored on the filesystem. Just use rg, sed, and so on to search by yourself. Just remember to tell me the referenced memory IDs at the end.

Two Coexisting Memory Modes

At this point, Kapy's memory system was complete. Of course I adjusted the design more or less along the way, but in any case, more than ten days of usage has shown that it works very well.

In short, the principle of this system lies in the coexistence of two memory modes:

Mode One: fixed injected memory. This memory is broad, rough, global, and mainly based on compact information.
Mode Two: agent-driven memory queries. This memory is concrete and target-oriented. The agent knows what it is looking for.

Implementation Details and Future Directions

In terms of implementation details, it is naturally still a bit rough, but it should fit the two modes above:

For the first mode, there should be a high-level API, allowing the agent to understand who I am, where I am, and what I am supposed to do from a zero-context state. The parameters of this high-level API should be as few as possible. Anything that can be hard-coded should be hard-coded, and the call should be mandatory or semi-mandatory.
For the second mode, there should be a low-level API. At this point, the agent already clearly knows what information it wants, so it should be given as much flexibility as possible to find it.

For example, in the current version of Kapy, the first memory mode is currently three-way retrieval plus the recent ancestors on the DAG causal chain; the second memory mode is based on the filesystem. Although that is low-level enough, rg is still keyword-based after all. If semantic retrieval and full-text retrieval are added, that should give it two more hands.

At this point, another question can actually be raised: second memory mode, knowledge bases, and skills all involve the agent using a tool to query things from the outside. What is the relationship between them, and what is the difference? Are they really the same thing under one unified view? Or should they be separated, with different index structures? I have not figured that out yet either, so I will leave it for later.

(Nowledge Mem has done some work from the methodological perspective of memory, and I think it is valuable.)

A Unified Architecture ​

DAGs and Causal Chains ​

Read/Write Logic and Retrieval ​

Two Coexisting Memory Modes ​

Implementation Details and Future Directions ​

A Unified Architecture

DAGs and Causal Chains

Read/Write Logic and Retrieval

Two Coexisting Memory Modes

Implementation Details and Future Directions