Will Advances in Model Capability Crush All LLM Application Engineering?
The Constantly Challenged “Value of Applications”
This view is actually not uncommon. In the past, friends often asked me: “As model context windows keep getting longer, does RAG still have any value?”
With coding agents led by Claude Code rising strongly over the past year, this kind of doubt no longer comes only from “outsiders.” More and more people inside the LLM application industry have also started to hold this view.
Starting from “Abstraction Leakage” in CSAPP
I want to begin from an angle that seems unrelated.
One friend complained to me about how unreasonable prefix caching is. Caching, they said, should be an optimization handled by model providers themselves. Why should application engineers be asked to care about it? Isn't this textbook “abstraction leakage”?
Yes. From an engineer's sense of aesthetics, it is abstraction leakage. It is ugly.
But I was immediately reminded of the opening of Computer Systems: A Programmer's Perspective: why should software engineers learn computer architecture?
The book's answer is: software engineers need to understand how computer architecture works in order to optimize software efficiency and write code that matches the underlying logic.
That answer can feel unconvincing. In fact, most programmers today do not think about branch prediction or page faults at all. Middle layers such as compilers have long since taken care of these concerns. If anything, optimization attempted manually by programmers often becomes baggage instead.
LLMs Have Not Reached a “Steady State”
The reason for this gap is not hard to guess: computer architecture has already become extremely stable.
Outside of embedded systems or supercomputers, general-purpose computing devices such as servers, desktops, and phones have almost no meaningful difference in their underlying logic. So middle layers can provide strong abstraction, and programmers no longer need to care about architectural details.
LLMs are not like that.
- On one hand, LLM efficiency is still far from the point where we can spend it as casually as hardware efficiency.
- On the other hand, the underlying structure of LLMs is still changing rapidly. Model intelligence and architecture alike may change at any time.
So the situation described by CSAPP actually does apply to LLM applications: to design more efficient applications, programmers need to understand how the underlying architecture of LLMs works. A clear example of this principle is prefix caching. If an application does not respect prefix caching, it can only accept being slow and expensive.
The Nature of Engineering Is Solving Present Problems
Then naturally someone will ask: what about the coupling caused by abstraction leakage? If one day diffusion architectures rise and nobody uses Transformers anymore, wouldn't prefix caching become irrelevant?
Yes. This is also a possible example of model capabilities crushing application engineering. The answer is that there is no better option: turn quickly and embrace the new architecture.
In fact, every engineering practice will eventually be overturned by changes in underlying technology. Masonry gets overturned by reinforced concrete. Even the Colosseum, after standing for thousands of years, is now only a landscape.
But what is engineering? Isn't it simply solving problems under existing technological conditions? Technology may move quickly or slowly, but you still have to solve problems with the technology that exists now.
From Engineer to Hacker
Recently people often mention the idea of “low-hanging fruit.” LLMs can solve many valuable problems, and for engineers some of them can be done almost effortlessly. When does this kind of low-hanging fruit appear? It appears when underlying technology is updating rapidly.
Connecting to new technology quickly and solving a problem before others even realize the problem exists: that is exactly what people in those earlier brilliant eras were doing.
The first-generation celebrity LangChain and the second-generation “little lobster” OpenClaw are recent examples. An engineer's ability is to solve problems steadily and responsibly, while a hacker, beyond that, also has to move a bit faster.