复旦大学
基于
大型
语言
模型
智能
兴起
发展
86
WN9
The Rise and Potential of Large Language ModelBased Agents:A SurveyZhiheng Xi,Wenxiang Chen,Xin Guo,Wei He,Yiwen Ding,Boyang Hong,Ming Zhang,Junzhe Wang,Senjie Jin,Enyu Zhou,Rui Zheng,Xiaoran Fan,Xiao Wang,Limao Xiong,Qin Liu,Yuhao Zhou,Weiran Wang,Changhao Jiang,Yicheng Zou,Xiangyang Liu,Zhangyue Yin,Shihan Dou,Rongxiang Weng,Wensen Cheng,Qi Zhang,Wenjuan Qin,Yongyan Zheng,Xipeng Qiu,Xuanjing Huang and Tao GuiFudan NLP Group,miHoYo IncAbstractFor a long time,humanity has pursued artifi cial intelligence(AI)equivalent to orsurpassing the human level,with AI agents considered a promising vehicle forthis pursuit.AI agents are artifi cial entities that sense their environment,makedecisions,and take actions.Many efforts have been made to develop intelligent AIagents since the mid-20th century.However,these efforts have mainly focused onadvancement in algorithms or training strategies to enhance specifi c capabilitiesor performance on particular tasks.Actually,what the community lacks is asuffi ciently general and powerful model to serve as a starting point for designingAI agents that can adapt to diverse scenarios.Due to the versatile and remarkablecapabilities they demonstrate,large language models(LLMs)are regarded aspotential sparks for Artifi cial General Intelligence(AGI),offering hope for buildinggeneral AI agents.Many research efforts have leveraged LLMs as the foundationto build AI agents and have achieved signifi cant progress.We start by tracingthe concept of agents from its philosophical origins to its development in AI,andexplain why LLMs are suitable foundations for AI agents.Building upon this,wepresent a conceptual framework for LLM-based agents,comprising three maincomponents:brain,perception,and action,and the framework can be tailored to suitdifferent applications.Subsequently,we explore the extensive applications of LLM-based agents in three aspects:single-agent scenarios,multi-agent scenarios,andhuman-agent cooperation.Following this,we delve into agent societies,exploringthe behavior and personality of LLM-based agents,the social phenomena thatemerge when they form societies,and the insights they offer for human society.Finally,we discuss a range of key topics and open problems within the fi eld.1Correspondence to:,qz,Equal Contribution.1A repository for the related papers at https:/ cs.AI 14 Sep 2023群内每日免费分享5份+最新资料 群内每日免费分享5份+最新资料 300T网盘资源+4040万份行业报告为您的创业、职场、商业、投资、亲子、网赚、艺术、健身、心理、个人成长 全面赋能!添加微信,备注“入群”立刻免费领取 立刻免费领取 200套知识地图+最新研报收钱文案、增长黑客、产品运营、品牌企划、营销战略、办公软件、会计财务、广告设计、摄影修图、视频剪辑、直播带货、电商运营、投资理财、汽车房产、餐饮烹饪、职场经验、演讲口才、风水命理、心理思维、恋爱情趣、美妆护肤、健身瘦身、格斗搏击、漫画手绘、声乐训练、自媒体打造、效率软件工具、游戏影音扫码先加好友,以备不时之需扫码先加好友,以备不时之需行业报告/思维导图/电子书/资讯情报行业报告/思维导图/电子书/资讯情报致终身学习者社群致终身学习者社群关注公众号获取更多资料关注公众号获取更多资料Contents1Introduction42Background62.1Origin of AI Agent.62.2Technological Trends in Agent Research.72.3Why is LLM suitable as the primary component of an Agents brain?.93The Birth of An Agent:Construction of LLM-based Agents103.1Brain.113.1.1Natural Language Interaction.123.1.2Knowledge.133.1.3Memory.143.1.4Reasoning and Planning.153.1.5Transferability and Generalization.163.2Perception.173.2.1Textual Input.173.2.2Visual Input.173.2.3Auditory Input.183.2.4Other Input.193.3Action.193.3.1Textual Output.203.3.2Tool Using.203.3.3Embodied Action.214Agents in Practice:Harnessing AI for Good244.1General Ability of Single Agent.254.1.1Task-oriented Deployment.254.1.2Innovation-oriented Deployment.274.1.3Lifecycle-oriented Deployment.274.2Coordinating Potential of Multiple Agents.284.2.1Cooperative Interaction for Complementarity.284.2.2Adversarial Interaction for Advancement.304.3Interactive Engagement between Human and Agent.304.3.1Instructor-Executor Paradigm.314.3.2Equal Partnership Paradigm.325Agent Society:From Individuality to Sociality335.1Behavior and Personality of LLM-based Agents.345.1.1Social Behavior.3525.1.2Personality.355.2Environment for Agent Society.365.2.1Text-based Environment.375.2.2Virtual Sandbox Environment.375.2.3Physical Environment.375.3Society Simulation with LLM-based Agents.385.3.1Key Properties and Mechanism of Agent Society.385.3.2Insights from Agent Society.395.3.3Ethical and Social Risks in Agent Society.406Discussion416.1Mutual Benefi ts between LLM Research and Agent Research.416.2Evaluation for LLM-based Agents.426.3Security,Trustworthiness and Other Potential Risks of LLM-based Agents.446.3.1Adversarial Robustness.446.3.2Trustworthiness.446.3.3Other Potential Risks.456.4Scaling Up the Number of Agents.456.5Open Problems.467Conclusion4831Introduction“If they fi nd a parrot who could answer to everything,I would claim it to be anintelligent being without hesitation.”Denis Diderot,1875Artifi cial Intelligence(AI)is a fi eld dedicated to designing and developing systems that can replicatehuman-like intelligence and abilities 1.As early as the 18th century,philosopher Denis Diderotintroduced the idea that if a parrot could respond to every question,it could be considered intelligent2.While Diderot was referring to living beings,like the parrot,his notion highlights the profoundconcept that a highly intelligent organism could resemble human intelligence.In the 1950s,AlanTuring expanded this notion to artifi cial entities and proposed the renowned Turing Test 3.Thistest is a cornerstone in AI and aims to explore whether machines can display intelligent behaviorcomparable to humans.These AI entities are often termed“agents”,forming the essential buildingblocks of AI systems.Typically in AI,an agent refers to an artifi cial entity capable of perceiving itssurroundings using sensors,making decisions,and then taking actions in response using actuators1;4.The concept of agents originated in Philosophy,with roots tracing back to thinkers like Aristotleand Hume 5.It describes entities possessing desires,beliefs,intentions,and the ability to takeactions 5.This idea transitioned into computer science,intending to enable computers to understandusers interests and autonomously perform actions on their behalf 6;7;8.As AI advanced,the term“agent”found its place in AI research to depict entities showcasing intelligent behavior and possessingqualities like autonomy,reactivity,pro-activeness,and social ability 4;9.Since then,the explorationand technical advancement of agents have become focal points within the AI community 1;10.AIagents are now acknowledged as a pivotal stride towards achieving Artifi cial General Intelligence(AGI)2,as they encompass the potential for a wide range of intelligent activities 4;11;12.From the mid-20th century,signifi cant strides were made in developing smart AI agents,as researchdelved deep into their design and advancement 13;14;15;16;17;18.However,these efforts havepredominantly focused on enhancing specifi c capabilities,such as symbolic reasoning,or masteringparticular tasks like Go or Chess 19;20;21.Achieving a broad adaptability across varied scenariosremained elusive.Moreover,previous studies have placed more emphasis on the design of algorithmsand training strategies,overlooking the development of the models inherent general abilities likeknowledge memorization,long-term planning,effective generalization,and effi cient interaction22;23.Actually,enhancing the inherent capabilities of the model is the pivotal factor for advancingthe agent further,and the domain is in need of a powerful foundational model endowed with a varietyof key attributes mentioned above to serve as a starting point for agent systems.The development of large language models(LLMs)has brought a glimmer of hope for the furtherdevelopment of agents 24;25;26,and signifi cant progress has been made by the community22;27;28;29.According to the notion of World Scope(WS)30 which encompasses fi velevels that depict the research progress from NLP to general AI(i.e.,Corpus,Internet,Perception,Embodiment,and Social),the pure LLMs are built on the second level with internet-scale textualinputs and outputs.Despite this,LLMs have demonstrated powerful capabilities in knowledgeacquisition,instruction comprehension,generalization,planning,and reasoning,while displayingeffective natural language interactions with humans.These advantages have earned LLMs thedesignation of sparks for AGI 31,making them highly desirable for building intelligent agentsto foster a world where humans and agents coexist harmoniously 22.Starting from this,if weelevate LLMs to the status of agents and equip them with an expanded perception space and actionspace,they have the potential to reach the third and the fourth levels of WS.Furthermore,theseLLMs-based agents can tackle more complex tasks through cooperation or competition,and emergentsocial phenomena can be observed when placing them together,potentially achieving the fi fth WSlevel.As shown in Figure 1,we envision a harmonious society composed of AI agents where humancan also participate.In this paper,we present a comprehensive and systematic survey focusing on LLM-based agents,attempting to investigate the existing studies and prospective avenues in this burgeoning fi eld.To thisend,we begin by delving into crucial background information(2).In particular,we commence bytracing the origin of AI agents from philosophy to the AI domain,along with a brief overview of the2Also known as Strong AI.4Let me experience thefestival in this world.UserMulti-AgentOrdering dishes and cooking Task planning and solvingBand performingDiscussing decorationKitchenConcertCooperationOutdoorsActing with toolsAn Envisioned Agent SocietyFigure 1:Scenario of an envisioned society composed of AI agents,in which humans can alsoparticipate.The above image depicts some specifi c scenes within society.In the kitchen,one agent isordering dishes,while another agent is responsible for planning and solving the cooking task.At theconcert,three agents are collaborating to perform in a band.Outdoors,two agents are discussinglantern-making,planning the required materials and fi nances by selecting and using tools.Users canparticipate in any of these stages of this social activity.debate surrounding the existence of artifi cial agents(2.1).Next,we take the lens of technologicaltrends to provide a concise historical review of the development of AI agents(2.2).Finally,wedelve into an in-depth introduction of the essential characteristics of agents and elucidate why largelanguage models are well-suited to serve as the main component of brains or controllers for AI agents(2.3).Inspired by the defi nition of the agent,we present a general conceptual framework for the LLM-based agents with three key parts:brain,perception,and action(3),and the framework can betailored to suit different applications.We fi rst introduce the brain,which is primarily composed ofa large language model(3.1).Similar to humans,the brain is the core of an AI agent because itnot only stores crucial memories,information,and knowledge but also undertakes essential tasksof information processing,decision-making,reasoning,and planning.It is the key determinant ofwhether the agent can exhibit intelligent behaviors.Next,we introduce the perception module(3.2).For an agent,this module serves a role similar to that of sensory organs for humans.Its primaryfunction is to expand the agents perceptual space from text-only to a multimodal space that includesdiverse sensory modalities like text,sound,visuals,touch,smell,and more.This expansion enablesthe agent to better perceive information from the external environment.Finally,we present the actionmodule for expanding the action space of an agent(3.3).Specifi cally,we expect the agent beable to posses textual output,take embodied actions,and use tools,so that it can better respond toenvironmental changes and provide feedback,and even alter and shape the environment.After that,we provide a detailed and thorough introduction to the practical applications of LLM-based agents and elucidate the foundational design pursuit“Harnessing AI for good”(4).To start,we delve into the current applications of a single agent and discuss their performance in text-basedtasks and simulated exploration environments,with a highlight on their capabilities in handlingspecifi c tasks,driving innovation,and exhibiting human-like survival skills and adaptability(4.1).Following that,we take a retrospective look at the development history of multi-agents.We introducethe interactions between agents in LLM-based multi-agent system applications,where they engage in5collaboration,negotiation or competition.Regardless of the mode of interaction,agents collectivelystrive toward a shared objective(4.2).Lastly,considering the potential limitations of LLM-basedagents in aspects such as privacy security,ethical constraints,and data defi ciencies,we discussthe human-agent collaboration.We summarize the paradigms of collaboration between agents andhumans:the instructor-executor paradigm and the equal partnership paradigm,along with specifi capplications in practice(4.3).Building upon the exploration of practical applications of LLM-based agents,we now shift ourfocus to the concept of the“Agent Society”,examining the intricate interactions between agents andtheir surrounding environments(5).This section begins with an investigation into whether theseagents exhibit human-like behavior and possess corresponding personality(5.1).Furthermore,weintroduce the social environments within which the agents operate,including text-based environment,virtual sandbox,and the physical world(5.2).Unlike the previous section(3.2),here we willfocus on diverse types of the environment rather than how the agents perceive it.Having establishedthe foundation of agents and their environments,we proceed to unveil the simulated societies thatthey form(5.3).We will discuss the construction of a simulated society,and go on to examine thesocial phenomena that emerge from it.Specifi cally,we will emphasize the lessons and potential risksinherent in simulated societies.Finally,we discuss a range of key topics(6)and open problems within the fi eld of LLM-basedagents:(1)the mutual benefi ts and inspirations of the LLM research and the agent research,wherewe demonstrate that the development of LLM-based agents has provided many opportunities forboth agent and LLM communities(6.1);(2)existing evaluation efforts and some prospects forLLM-based agents from four dimensions,including utility,sociability,values and the ability tocontinually evolve(6.2);(3)potential risks of LLM-based agents,where we discuss adversarialrobustness and trustworthiness of LLM-based agents.We also include the discussion of some otherrisks like misuse,unemployment and the threat to the well-being of the human race(6.3);(4)scaling up the number of agents,where we discuss the potential advantages and challenges of scalingup agent counts,along with the approaches of static and dynamic scaling(6.4);(5)several openproblems,such as the debate over whether LLM-based agents represent a potential path to AGI,challenges from virtual simulated environme