Organic LLM human-computer interaction: more than just dialog boxes
Dialog: stateful deep interaction
The first, of course, is the dialog box. Dialogs are the most common form of LLM interaction and have been known since ChatGPT. The scenarios include but are not limited to answering questions, information retrieval, dialogue exchanges, word games, etc. Its input and output are both text or rich text.
The core advantage of a dialog box is its statefulness—the ability to remember the conversation history and support multiple rounds of in-depth interactions. Users can continue to discuss the same topic and gradually complete complex tasks. However, this form also has its limitations: users need to clearly express their needs, and an open interactive interface sometimes makes people wonder where to start.
Search box: focus on clear query requirements
Compared with the openness of the dialog box, the search box is more concise and clear, mainly focusing on two different usage scenarios: question and answer and retrieval.
In a Q&A scenario, the user enters an explicit question, usually starting with a question word, such as "When is PyCon China this year?" or "How does Python's dependency lock mechanism work?". In the information retrieval scenario, users enter keywords or phrases for index search, such as "PyCon China 2025", "PEP 751", etc., hoping to find specific web pages or resources.
In contrast to the stateful nature of the dialog box, the search box adopts a stateless interaction mode. Each query is independent, making it simple and efficient to use. However, when users need to further inquire based on the search results, this interaction often naturally evolves into a dialog box.
Integrated LLM applications: seamless integration into workflow
From the previous two forms of direct human-computer interaction, we move to another type of application - LLM is no longer an independent interaction interface, but is embedded as part of the software function.
Pan Copilot: Reshaping the professional creative process
When we integrate LLM into professional creation software and allow users to seamlessly call LLM for auxiliary work during the creation process, we form what I call a "pan-Copilot" interaction model.
The outstanding feature of this type of interaction is that it barely changes the user's existing workflow, but cleverly replaces some of the steps with LLM calls. For example, the LSP-based code completion in the code editor has been upgraded to LLM-based intelligent completion; the text editor provides LLM-based grammar checking and text polishing functions.
This model truly realizes the deep integration of LLM with the user content production process, making AI capabilities a natural extension of creative tools.
Text processing applications: from "unusable" to "easy to use"
Another area of concern is text processing applications. With the support of LLM, many such applications have achieved qualitative improvements in functionality, achieving a critical leap from "unusable" to "easy to use".
Translation software typifies this change. After integrating LLM, translation software is no longer limited to a auxiliary role in processing small paragraphs of text, but has become a more trustworthy professional tool capable of translating entire articles or even entire books. What is even more surprising is that in translation tasks in professional fields such as programming, law, medicine, and finance, LLM not only performs well, but its accuracy may even surpass human experts.
Language learning software has also undergone significant changes. Traditional language learning tasks such as conversation practice, oral practice, and essay correction are all well implemented in LLM-based software.
In addition, functions that were relatively cumbersome in the past, such as email classification summary, RSS information subscription and organization, have become within reach with the support of LLM.
Native LLM application: a new form that relies entirely on AI capabilities
The so-called "native" refers to applications built entirely on LLM's ability to produce text, sequences, or data.
Agent: From simple instructions to complex tasks
Agent is undoubtedly one of the hottest concepts this year. In my opinion, it refers to a universal assistant program: the user only needs to enter simple instructions, and it can complete complex tasks according to the instructions. According to different task types, Agents can be divided into two categories: agents that produce content and agents that perform tasks.
Among agents that produce content, the most typical examples include deep research and coding agents. Although deep research still focuses on text tasks, it can produce comprehensive reports through multiple iterations, sometimes even formatted into HTML format to improve readability. The Coding agent focuses on code production. Although the instructions given by the user are simple, it can independently explore the huge code base and provide precise modifications that meet the user's needs.
The agent that performs the task is different, and the "takeaway agent" is a good example. The value of this type of agent does not lie in the sophistication of text output, but in its ability to complete actual tasks as expected by the user.
It is worth noting that although agent applications are powerful, most still use dialog boxes as the front-end interaction form.
More native application possibilities: beyond text commands
Some people believe that dialog boxes will replace GUI or CLI/TUI and become the new generation of mainstream interaction mode. But please note that entering text commands (even using voice input) is actually a rather tedious process: it first requires mobilizing the brain to think about the words, and then completing the input action on the physical level. In contrast, the core interaction of short videos only requires a simple sliding action for users to complete content consumption. The common components such as checkboxes, buttons, sliders, etc. in the UI represent quite simple forms of interaction.
So, if we abandon text command input and adopt a simpler input method, while still focusing on the core ability of LLM to produce text or data, what kind of native applications will be born?
The simplest example is the "one-click xxx" function, such as one-click conversion of figure pictures, one-click translation, etc. These applications treat LLM as a "magic button" that users only need to click to trigger LLM to complete the corresponding task.
Recently I designed a small toy called "WikiSurfing: Infinite Virtual Wikipedia". Its interactive core is to use the most basic hyperlink mechanism in the Web to build a virtual encyclopedia website. When users click on the hyperlink, they will jump to the corresponding LLM generation page. This design aims to recreate the traditional fun of "surfing the Internet".
There is also huge potential in gaming, especially role-playing games. The core of RPG games lies in the interaction between players and the game world, and LLM can provide players with a virtual and infinite world while creating NPCs with highly anthropomorphic characteristics. If NPCs are further given the same behavioral capabilities as players, then NPCs will truly be on equal footing with players.
Companion: AI existence with equal status
If NPCs in RPG games are on equal footing with players, then this may give rise to a new game form similar to MMORPG (massively multiplayer online role-playing game). But if this concept of equal-status NPCs were extended to the real world, what shape would it take? I can be called a companion.
I think companions represent a rather special type of interaction. Its characteristic is not that it has certain specific functions, but that it can have real interaction with users on a psychological level. This first requires users to believe in the existence of their personification, which is the essential meaning of the "Turing Test"; and the prerequisite for passing the "Turing Test" is that it is functionally rich and powerful enough. The birth of LLM makes all this possible: from basic conversational communication, to various behavioral performances in the digital world, and finally extends to actual actions in the physical world.
Summary: Towards organic human-computer interaction
This article systematically sorts out the various forms of LLM human-computer interaction: from the most basic dialog boxes and search boxes, to deeply integrated pan-Copilot and text processing applications, to native applications such as Agent that completely rely on LLM capabilities, and finally looks forward to the form of companions with personalized characteristics.
These forms of interaction show a clear evolution: from simple text input and output, to deep integration of user workflows, to the creation of new interactive experiences. Each form has its own unique value and applicable scenarios. The relationship between them is not a simple substitution, but an ecological system that complements each other.
Truly "organic" LLM human-computer interaction should flexibly choose the most appropriate interaction form according to specific scenarios, allowing technology to be naturally integrated into people's lives and work. As LLM capabilities continue to improve, we have reason to expect the emergence of more innovative forms of interaction, ultimately realizing the harmonious coexistence of humans and AI.