docs/chapter2/Chapter2-History-of-Agents.md
To deeply understand why modern agents present their current form and the origins of their core design philosophies, this chapter will trace back through history: starting from the classical era of artificial intelligence, exploring how the earliest "intelligence" was defined within rule systems of logic and symbols; then witnessing the major shift from single, centralized intelligence models to distributed, collaborative intelligence thinking; and finally understanding how the "learning" paradigm completely transformed the way agents acquire capabilities, giving birth to the modern agents we see today.
<div align="center"> <p>Figure 2.1 The evolutionary ladder of AI agents</p> </div>As shown in Figure 2.1, the emergence of each new paradigm is to solve the core "pain points" or fundamental limitations of the previous generation paradigm. While new solutions bring capability leaps, they also introduce new "limitations" that are difficult to overcome at the time, which in turn lay the groundwork for the birth of the next generation paradigm. Understanding this "problem-driven" iterative process helps us more profoundly grasp the deep reasons and historical inevitability behind modern agent technology choices.
Early explorations in the field of artificial intelligence were deeply influenced by mathematical logic and fundamental principles of computer science. In that era, researchers generally held a belief: human intelligence, especially logical reasoning ability, could be captured and reproduced by formalized symbolic systems. This core idea gave birth to the first important paradigm of artificial intelligence—Symbolicism, also known as "Logic AI" or "Traditional AI."
In the view of symbolicism, the core of intelligent behavior is operating on symbols based on a set of explicit rules. Therefore, an agent can be viewed as a physical symbol system: it represents the external world through internal symbols and plans actions through logical reasoning. The "wisdom" of agents in this era came entirely from knowledge bases and reasoning rules pre-coded by designers, rather than acquired through autonomous learning.
The theoretical foundation of the symbolicism era was the Physical Symbol System Hypothesis (PSSH)<sup>[1]</sup>, jointly proposed by Allen Newell and Herbert A. Simon in 1976. These two Turing Award winners provided theoretical guidance and criteria for implementing general artificial intelligence on computers through this hypothesis.
The hypothesis contains two core assertions:
A physical symbol system here refers to a system that can exist in the physical world, composed of a set of distinguishable symbols and a series of processes that operate on these symbols, with constituent elements as shown in Figure 2.2. These symbols can be combined into more complex structures (such as expressions), while processes can create, modify, copy, and destroy these symbol structures.
<div align="center"> <p>Figure 2.2 Constituent elements of a physical symbol system</p> </div>In short, PSSH boldly declared: The essence of intelligence is the computation and processing of symbols.
This hypothesis had far-reaching influence. It transformed the study of the vague and complex philosophical problem of human mind into a concrete problem that could be engineered and implemented on computers. It instilled strong confidence in early artificial intelligence researchers that as long as we could find the right way to represent knowledge and design effective reasoning algorithms, we could definitely create machine intelligence comparable to humans. Almost all research in the symbolicism era, from expert systems to automated planning, was conducted under the guidance of this hypothesis.
Under the direct influence of the physical symbol system hypothesis, Expert Systems became the most important and successful application achievement of the symbolicism era. The core goal of expert systems was to simulate the ability of human experts to solve problems in specific domains. By encoding expert knowledge and experience into computer programs, they could provide conclusions or recommendations comparable to or even surpassing human experts when facing similar problems.
A typical expert system usually consists of several core components including a knowledge base, inference engine, and user interface, with a general architecture as shown in Figure 2.3.
<div align="center"> <p>Figure 2.3 General architecture of expert systems</p> </div>This architecture clearly embodies the design philosophy of separating knowledge from reasoning, an important characteristic of symbolicism AI.
Knowledge Base and Inference Engine
The "intelligence" of expert systems mainly comes from its two core components: the knowledge base and the inference engine.
Application Case and Analysis: MYCIN System
MYCIN is one of the most famous and influential expert systems in history, developed by Stanford University in the 1970s<sup>[2]</sup>. It was designed to assist doctors in diagnosing bacterial blood infections and recommending appropriate antibiotic treatment plans.
If expert systems demonstrated the "depth" of symbolic AI in professional domains, then the SHRDLU project<sup>[3]</sup> developed by Terry Winograd from 1968-1970 achieved a revolutionary breakthrough in "breadth." As shown in Figure 2.5, SHRDLU aimed to build a comprehensive intelligent agent that could interact fluently with humans through natural language in the micro-environment of the "blocks world." The "blocks world" is a simulated three-dimensional virtual space containing blocks of different shapes, colors, and sizes, as well as a virtual robotic arm that can grasp and move them. Users issue commands or ask questions to SHRDLU in natural language, and SHRDLU executes actions in the virtual world or provides text responses.
<div align="center"> <p>Figure 2.5 SHRDLU's "blocks world" interaction interface</p> </div>SHRDLU attracted widespread attention at the time mainly because it was the first to integrate multiple independent artificial intelligence modules (such as language parsing, planning, memory) into a unified system and make them work collaboratively:
Pick up a big red block.) but also handle more complex instructions, such as:
Find a block which is taller than the one you are holding and put it into the box. In this instruction, the system needs to understand that the one you are holding refers to the object currently grasped by the robotic arm.Grasp the pyramid., then ask What does the box contain?, and the system could answer by connecting the context.Is there a large block behind a pyramid?Did you touch any pyramid before you put the green one on the little cube?Why did you pick up the red block? SHRDLU could answer: BECAUSE YOU ASKED ME TO.SHRDLU's historical status and influence are mainly reflected in three aspects:
Despite significant achievements in early projects, starting from the 1980s, symbolic AI encountered fundamental difficulties inherent in its methodology when moving from "micro-worlds" to the open, complex real world. These difficulties can mainly be summarized into two major categories:
(1) Common-sense Knowledge and Knowledge Acquisition Bottleneck
The "intelligence" of symbolic agents depends entirely on the quality and completeness of their knowledge bases. However, how to build a knowledge base that can support real-world interaction has proven to be an extremely arduous task, mainly reflected in two aspects:
(2) Frame Problem and System Brittleness
In addition to knowledge-level challenges, symbolicism also encountered logical dilemmas when dealing with a dynamically changing world.
After exploring the theoretical challenges of symbolicism, in this section we will intuitively experience how rule-based systems work through a specific programming practice. We will attempt to reproduce ELIZA, an extremely influential early chatbot in the history of artificial intelligence.
ELIZA was a computer program released in 1966 by MIT computer scientist Joseph Weizenbaum<sup>[6]</sup>, one of the famous early attempts in the field of natural language processing. ELIZA was not a single program but a framework that could execute different "scripts." Among them, the most widely known and successful script was "DOCTOR," which imitated a Rogerian non-directive psychotherapist.
ELIZA's working method was extremely clever: it never directly answered questions or provided information but identified keywords in user input, then applied a set of preset transformation rules to convert user statements into open-ended questions. For example, when a user said "I am sad about my boyfriend," ELIZA might identify the keyword "I am sad about..." and apply a rule to generate the response: "Why are you sad about your boyfriend?"
Weizenbaum's design philosophy was not to create an agent that could truly "understand" human emotions; on the contrary, he wanted to prove that through some simple sentence transformation techniques, machines could create an illusion of "intelligence" and "empathy" without understanding the conversation content at all. However, to his surprise, many people who interacted with ELIZA (including his secretary) developed emotional dependence on it, deeply believing it could understand them.
The practical goal of this section is to reproduce ELIZA's core mechanism to deeply understand the advantages and fundamental limitations of this rule-driven approach.
ELIZA's algorithm flow is based on Pattern Matching and Text Substitution, which can be clearly decomposed into the following four steps:
mother, dreamed, depressed). When input contains multiple keywords, the program selects the rule corresponding to the keyword with the highest priority for processing.*) to capture the rest of the sentence.
* my *"My mother is afraid of me"["", "mother is afraid of me"]"Tell me more about your family.""Tell me more about your family."I → you, my → your) to maintain conversation coherence.The entire workflow can be represented by a simple pseudocode idea:
FUNCTION generate_response(user_input):
// 1. Split user input into words
words = SPLIT(user_input)
// 2. Find the highest priority keyword rule
best_rule = FIND_BEST_RULE(words)
IF best_rule is NULL:
RETURN a_generic_response() // For example: "Please go on."
// 3. Use rule to decompose user input
decomposed_parts = DECOMPOSE(user_input, best_rule.decomposition_pattern)
IF decomposition_failed:
RETURN a_generic_response()
// 4. Perform pronoun conversion on decomposed parts
transformed_parts = TRANSFORM_PRONOUNS(decomposed_parts)
// 5. Use reassembly rules to generate response
response = REASSEMBLE(transformed_parts, best_rule.reassembly_patterns)
RETURN response
Through this mechanism, ELIZA successfully simplified the complex natural language understanding problem into an operable, rule-based pattern matching game.
Now, we will transform the technical principles described in the previous section into a simple, runnable Python function. The following code implements a mini version of ELIZA, which contains a small set of rules but is sufficient to demonstrate its core working mechanism.
import re
import random
# Define rule base: pattern (regular expression) -> response template list
rules = {
r'I need (.*)': [
"Why do you need {0}?",
"Would it really help you to get {0}?",
"Are you sure you need {0}?"
],
r'Why don\'t you (.*)\?': [
"Do you really think I don't {0}?",
"Perhaps eventually I will {0}.",
"Do you really want me to {0}?"
],
r'Why can\'t I (.*)\?': [
"Do you think you should be able to {0}?",
"If you could {0}, what would you do?",
"I don't know -- why can't you {0}?"
],
r'I am (.*)': [
"Did you come to me because you are {0}?",
"How long have you been {0}?",
"How do you feel about being {0}?"
],
r'.* mother .*': [
"Tell me more about your mother.",
"What was your relationship with your mother like?",
"How do you feel about your mother?"
],
r'.* father .*': [
"Tell me more about your father.",
"How did your father make you feel?",
"What has your father taught you?"
],
r'.*': [
"Please tell me more.",
"Let's change focus a bit... Tell me about your family.",
"Can you elaborate on that?"
]
}
# Define pronoun conversion rules
pronoun_swap = {
"i": "you", "you": "i", "me": "you", "my": "your",
"am": "are", "are": "am", "was": "were", "i'd": "you would",
"i've": "you have", "i'll": "you will", "yours": "mine",
"mine": "yours"
}
def swap_pronouns(phrase):
"""
Perform first/second person conversion on pronouns in input phrase
"""
words = phrase.lower().split()
swapped_words = [pronoun_swap.get(word, word) for word in words]
return " ".join(swapped_words)
def respond(user_input):
"""
Generate response based on rule base
"""
for pattern, responses in rules.items():
match = re.search(pattern, user_input, re.IGNORECASE)
if match:
# Capture matched part
captured_group = match.group(1) if match.groups() else ''
# Perform pronoun conversion
swapped_group = swap_pronouns(captured_group)
# Randomly select one from templates and format
response = random.choice(responses).format(swapped_group)
return response
# If no specific rule is matched, use the last wildcard rule
return random.choice(rules[r'.*'])
# Main chat loop
if __name__ == '__main__':
print("Therapist: Hello! How can I help you today?")
while True:
user_input = input("You: ")
if user_input.lower() in ["quit", "exit", "bye"]:
print("Therapist: Goodbye. It was nice talking to you.")
break
response = respond(user_input)
print(f"Therapist: {response}")
>>>
Therapist: Hello! How can I help you today?
You: I am feeling sad today.
Therapist: How long have you been feeling sad today?
You: I need some help with my project.
Therapist: Are you sure you need some help with your project?
You: My mother is not happy with my work.
Therapist: Tell me more about your mother.
You: quit
Therapist: Goodbye. It was nice talking to you.
Through the above programming practice, we can intuitively summarize the fundamental limitations of rule-driven systems, which are direct confirmations of the theoretical challenges of symbolicism discussed in Section 2.1.4:
I am (.*) rule and generate a semantically incorrect response because it cannot understand the role of the negation word "not."However, despite these obvious defects, ELIZA produced the famous "ELIZA effect" at the time, with many users believing it could understand them. This illusion of intelligence mainly stemmed from its clever conversation strategies (such as playing a passive questioner, using open-ended templates) and humans' innate emotional projection psychology.
ELIZA's practice clearly revealed the core contradiction of the symbolicism approach: the system's seemingly intelligent performance depends entirely on rules pre-coded by designers. However, facing the infinite possibilities of real-world language, this exhaustive method is destined to be unscalable. The system has no true understanding, only executing symbol operations, which is the root of its brittleness.
The exploration of symbolicism and ELIZA's practice jointly pointed to a problem: a single, centralized reasoning engine built through preset rules seems difficult to lead to true intelligence. No matter how large the rule base, the system always appears rigid and brittle when facing the ambiguity, complexity, and infinite changes of the real world. This dilemma prompted some top thinkers to reflect on the most fundamental design philosophy of artificial intelligence. Among them, Marvin Minsky did not continue trying to add more rules to a single reasoning core but proposed a revolutionary question in his book "The Society of Mind"<sup>[7]</sup>: "What magical trick makes us intelligent? The trick is that there is no trick. The power of intelligence stems from our vast diversity, not from any single, perfect principle."
From the 1970s to the 1980s, the limitations of symbolicism became increasingly apparent. Although expert systems achieved success in highly vertical domains, they could not possess child-like common sense; although SHRDLU could perform excellently in a closed blocks world, it could not understand anything outside that world; although ELIZA could imitate conversation, it knew nothing about the conversation content itself. These systems all followed a top-down design approach: an omniscient central processor that processes information and makes decisions according to a unified set of logical rules.
Facing this universal failure, Minsky began to raise a series of fundamental questions:
These questions directly addressed the core drawbacks of single holistic intelligence models. Such models attempt to solve all problems with a unified representation and reasoning mechanism, but this is far from how we observe natural intelligence (especially human intelligence) operating. Minsky believed that forcibly cramming diverse mental activities into a rigid logical framework was the root cause of early artificial intelligence research stagnation.
Based on this reflection, Minsky proposed a subversive conception: he no longer viewed the mind as a pyramid-like hierarchical structure but saw it as a flattened "society" full of interaction and collaboration.
In Minsky's theoretical framework, the definition of an agent differs from the modern agents we discussed in Chapter 1. Here, an agent refers to an extremely simple, specialized mental process that is itself "mindless." For example, a LINE-FINDER agent responsible for identifying lines, or a GRASP agent responsible for grasping.
These simple agents are organized to form more powerful Agencies. An agency is a group of agents working together to complete a more complex task. For example, a BUILD agency responsible for building blocks might be composed of multiple lower-level agents or agencies such as SEE, FIND, GET, and PUT. They influence each other through decentralized activation and inhibition signals, forming dynamic control flow.
Emergence is key to understanding the society of mind theory. Complex, purposeful intelligent behavior is not pre-planned by some high-level agent but spontaneously arises from local interactions among numerous simple bottom-level agents.
Let's use the classic "building a block tower" task as an example to illustrate this process, as shown in Figure 2.6. When a high-level goal (such as "I want to build a tower") appears, it activates a high-level agency called BUILD-TOWER.
BUILD-TOWER agency doesn't know how to execute specific physical actions; its only role is to activate its subordinate agencies, such as BUILDER.BUILDER agency is also very simple; it might only contain loop logic: as long as the tower isn't finished, activate the ADD-BLOCK agency.ADD-BLOCK agency is responsible for coordinating more specific subtasks; it sequentially activates three sub-agencies: FIND-BLOCK, GET-BLOCK, and PUT-ON-TOP.GET-BLOCK agency activates the SEE-SHAPE agent in the visual system and the REACH and GRASP agents in the motor system.In this process, no single agent or agency has a global plan for the entire task. GRASP is only responsible for grasping; it doesn't know what a tower is; BUILDER is only responsible for looping; it doesn't know how to control the arm. However, when this society composed of countless "mindless" agents interacts through simple activation and inhibition rules, a seemingly highly intelligent behavior—building a block tower—naturally emerges.
The most far-reaching influence of the society of mind theory is that it provided an important conceptual foundation for Distributed Artificial Intelligence (DAI) and later Multi-Agent Systems (MAS). It prompted researchers to think:
If intelligence within a mind emerges through collaboration of numerous simple agents, then can more powerful "collective intelligence" also emerge through collaboration among multiple independent, physically separated computational entities (computers, robots)?
The raising of this question directly shifted research focus from "how to build an omnipotent single agent" to "how to design an efficiently collaborating agent group." Specifically, the society of mind directly inspired MAS research in the following aspects:
It can be said that Minsky's "society of mind" theory provided an important analytical framework for AI researchers to understand the internal structure of "collective intelligence." It provided later researchers with a completely new perspective to explore complex systems composed of independent, autonomous, socially capable computational agents, formally opening the prelude to multi-agent system research.
The "society of mind" theory discussed earlier pointed the way for collective intelligence and decentralized collaboration at the philosophical level, but the implementation path remained unclear. Meanwhile, the fundamental challenges exposed by symbolicism in dealing with real-world complexity also indicated that truly robust intelligence could not be built solely on pre-coded rules.
These two threads jointly pointed to a question: If intelligence cannot be completely designed, can it be learned?
This question opened the "learning" era of artificial intelligence. Its core goal was no longer to manually encode knowledge but to build systems that could automatically acquire knowledge and capabilities from experience and data. This section will trace the evolution of this paradigm: from the learning foundation laid by connectionism, to interactive learning achieved by reinforcement learning, to modern agents driven by large language models today.
As a direct response to the limitations of symbolicism, Connectionism re-emerged in the 1980s. Unlike symbolicism's top-down design philosophy relying on explicit logical rules, connectionism is a bottom-up approach inspired by mimicking the neural network structure of biological brains<sup>[8]</sup>. Its core ideas can be summarized as follows:
Under this paradigm, agents are no longer passive logical reasoning machines executing rules but adaptive systems capable of self-optimization through experience. As shown in Figure 2.7, this represents a fundamental shift in the core idea of building agents. Symbolicism attempted to explicitly encode human knowledge to machines, while connectionism attempted to create machines that could learn knowledge like humans.
<div align="center"> <p>Figure 2.7 Comparison of symbolicism and connectionism paradigms</p> </div>The rise of connectionism, especially the success of deep learning in the 21st century, endowed agents with powerful perception and pattern recognition capabilities, enabling them to directly understand the world from raw data (such as images, sounds, text), which was unimaginable in the symbolicism era. However, how to enable agents to learn to make optimal sequential decisions in dynamic interactions with the environment required supplementation from another learning paradigm.
Connectionism mainly solved perception problems (for example, "What's in this picture?"), but the more core task of agents is decision-making (for example, "What should I do in this situation?"). Reinforcement Learning (RL) is precisely the learning paradigm focused on solving sequential decision problems. It does not directly learn from labeled static datasets but learns how to maximize its long-term benefits through direct interaction between agents and the environment, learning through "trial and error."
Taking AlphaGo as an example, its core self-play learning process is a classic embodiment of reinforcement learning<sup>[9]</sup>. In this process, AlphaGo (the agent) observes the current board layout (environment state) and decides where to place the next stone (action). After a game ends, based on the win-loss result, it receives a clear signal: winning is a positive reward, losing is a negative reward. Through millions of such self-play sessions, AlphaGo continuously adjusts its internal strategy, gradually learning which actions to choose in which board situations are most likely to lead to final victory. This process is completely autonomous, not relying on direct guidance from human game records.
This learning mechanism of optimizing one's own behavior through interaction with the environment and based on feedback signals is the core framework of reinforcement learning. Below we will detail its basic constituent elements and working mode.
The reinforcement learning framework can be described by several core elements:
Based on the above core elements, reinforcement learning agents continuously iterate in a "perceive-act-learn" closed loop, with their working mode shown in Figure 2.8.
<div align="center"> <p>Figure 2.8 Core interaction loop of reinforcement learning</p> </div>The specific steps of this loop are as follows:
The agent's learning goal is not to maximize the immediate reward at a certain time step but to maximize the Cumulative Reward from the current moment to the future, also called Return. This means the agent needs to have "foresight"; sometimes to obtain greater future rewards, it needs to sacrifice current immediate rewards (for example, the "sacrifice" strategy in Go). Through continuous exploration, feedback collection, and policy optimization in the above loop, the agent can ultimately learn to make autonomous decisions and long-term planning in complex dynamic environments.
Reinforcement learning endowed agents with the ability to learn decision-making strategies from interactions, but this typically requires massive task-specific interaction data, resulting in agents lacking prior knowledge at the beginning of learning and needing to build understanding of tasks from scratch. Whether it's the common sense that symbolicism attempted to manually encode or the background knowledge humans rely on when making decisions, both are missing in RL agents. How to enable agents to have broad understanding of the world before starting to learn specific tasks? The solution to this problem ultimately emerged in the field of Natural Language Processing (NLP), with its core being Pre-training based on large-scale data.
From Specific Tasks to General Models
Before the emergence of the pre-training paradigm, traditional natural language processing models were typically trained from scratch independently for single specific tasks (such as sentiment analysis, machine translation) on specially annotated small to medium-scale datasets. This mode led to several problems: models had narrow knowledge scope, difficulty generalizing knowledge learned in one task to another, and each new task required substantial human effort for data annotation. The proposal of the Pre-training and Fine-tuning paradigm completely changed this situation. Its core idea is divided into two steps:
As shown in Figure 2.9, this intuitively demonstrates the complete process of pre-training and fine-tuning: general text data forms a foundation model through self-supervised learning, then fine-tuning with specific task data ultimately adapts to various downstream tasks.
<div align="center"> <p>Figure 2.9 Schematic diagram of the "pre-training-fine-tuning" paradigm</p> </div>Birth of Large Language Models and Emergent Abilities
Through pre-training on trillions of texts, the neural network weights of large language models have actually constructed a highly compressed implicit model of world knowledge. It solves the most troublesome "knowledge acquisition bottleneck" problem of the symbolicism era in a completely new way. More surprisingly, when the model's scale (number of parameters, data volume, computation) crosses a certain threshold, they begin to exhibit unexpected Emergent Abilities that were not directly trained, such as:
The emergence of these abilities marks that LLMs are no longer just language models; they have evolved into components playing dual roles as both massive knowledge bases and general reasoning engines.
At this point, in the long river of agent development history, several key technical puzzle pieces have all appeared: symbolicism provided the framework for logical reasoning, connectionism and reinforcement learning provided learning and decision-making capabilities, while large language models provided unprecedented world knowledge and general reasoning capabilities obtained through pre-training. In the next section, we will see how these technologies are integrated in the design of modern agents.
With the rapid development of large language model technology, LLM-centric agents have become a new paradigm in the field of artificial intelligence. They can not only understand and generate human language but, more importantly, can autonomously perceive, plan, decide, and execute tasks through interaction with the environment.
<div align="center"> <p>Figure 2.10 Core component architecture of LLM-driven agents</p> </div>As described in Chapter 1, the interaction between agents and the environment can be abstracted as a core loop. LLM-driven agents complete tasks through a continuously iterative closed-loop process where multiple modules work together. This process follows the architecture shown in Figure 2.10, with specific steps as follows:
This modular collaborative mechanism and continuous iterative loop constitute the core workflow of LLM-driven agents solving complex problems.
The development history of artificial intelligence agents is not a straight single-lane road but a process of interweaving, competition, and fusion of several core ideological schools over more than half a century. Understanding this process helps us gain insight into the profound origins of current agent architecture paradigm formation.
Among these, three major trends dominated research paradigms in different periods:
Entering the 2020s, these ideological schools have deeply integrated in unprecedented ways. Large language models represented by the GPT series are themselves products of connectionism but have become the core "brain" for executing symbolic reasoning, tool invocation, and planning decisions, forming a modern agent architecture combining neural and symbolic approaches. To systematically review this development context, Figure 2.11 below organizes key theories, projects, and events in the development history of artificial intelligence agents from the 1950s to the present, providing readers with a clear global overview as a consolidation of this chapter's knowledge.
<div align="center"> <p>Figure 2.11 Timeline of agent development evolution (incomplete version)</p> </div>Thanks to breakthroughs in large language models, the agent technology stack presents unprecedented activity and diversity. Figure 2.12 shows a typical full view of the current AI Agent field technology stack, covering all aspects from underlying models to upper-layer applications.
<div align="center"> <p>Figure 2.12 Overview of AI Agent technology stack</p> </div>This technology stack diagram was released by Letta in November 2024<sup>[10]</sup>. It layers and categorizes AI agent-related tools, platforms, and services, providing valuable reference for understanding current market landscape and technology selection.
This chapter reviewed the historical context of agent development, exploring the process from birth to evolution of its core ideas, covering several key paradigm revolutions in the field of artificial intelligence:
Through this chapter's learning, we not only understand where the modern agents introduced in Chapter 1 came from but also established a macro cognitive framework about agent technology evolution. We can discover that agent development is not simple technical iteration but a thought revolution about how to define "intelligence," acquire "knowledge," and make "decisions."
Since the core of modern agents is large language models, deeply understanding their underlying principles is crucial. The next chapter will focus on large language models themselves, exploring their basic concepts, laying a solid foundation for subsequent advanced applications in multi-agent systems.
Note: Some of the following exercises do not have standard answers, aiming to help learners establish systematic understanding of agent development history and cultivate "learning from history" technical insight.
The Physical Symbol System Hypothesis<sup>[1]</sup> is the theoretical cornerstone of the symbolicism era. Please analyze:
The expert system MYCIN<sup>[2]</sup> achieved significant success in the medical diagnosis field but was ultimately not widely applied in clinical practice. Please think:
Hint: Can analyze from multiple perspectives including technology, ethics, law, user acceptance, etc.
In Section 2.2, we implemented a simplified version of the ELIZA chatbot. Please expand on this basis:
Hint: This is a hands-on practice question; actual code writing is recommended
Marvin Minsky proposed a revolutionary viewpoint in the "society of mind" theory<sup>[7]</sup>: intelligence stems from collaboration of numerous simple agents, not a single perfect system.
GRASP agent suddenly failed? What are the advantages and disadvantages of this decentralized architecture?Reinforcement learning and supervised learning are two different learning paradigms. Please analyze:
The pre-training-fine-tuning paradigm is an important breakthrough in the modern artificial intelligence field. Please think deeply:
Suppose you want to design an "intelligent code review assistant" that can automatically review code submissions (Pull Requests), summarize code implementation logic, check code quality, discover potential bugs, and propose improvement suggestions.
[1] NEWELL A, SIMON H A. Computer science as empirical inquiry: symbols and search[J]. Communications of the ACM, 1976, 19(3): 113-126.
[2] BUCHANAN B G, SHORTLIFFE E H, ed. Rule-based expert systems: the MYCIN experiments of the Stanford Heuristic Programming Project[M]. Reading, Mass.: Addison-Wesley, 1984.
[3] WINOGRAD T. Understanding natural language[M]. New York: Academic Press, 1972.
[4] LENAT D B, GUHA R V. Cyc: a midterm report[J]. AI magazine, 1990, 11(3): 32.
[5] MCCARTHY J, HAYES P J. Some philosophical problems from the standpoint of artificial intelligence[C]//MELTZER B, MICHIE D, ed. Machine intelligence 4. Edinburgh: Edinburgh University Press, 1969: 463-502.
[6] WEIZENBAUM J. ELIZA: a computer program for the study of natural language communication between man and machine[J]. Communications of the ACM, 1966, 9(1): 36-45.
[7] MINSKY M. The society of mind[M]. New York: Simon & Schuster, 1986.
[8] RUMELHART D E, MCCLELLAND J L, PDP RESEARCH GROUP. Parallel distributed processing: explorations in the microstructure of cognition[M]. Cambridge, MA: MIT Press, 1986.
[9] SILVER D, HUANG A, MADDISON C J, ed. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484-489.
[10] LETTA. The AI agents stack[EB/OL]. (2024-11) [2025-09-07]. https://www.letta.com/blog/ai-agents-stack.