At QCon San Francisco Conference 2024, Victor Dibia of Microsoft Research discussed the challenges of building multi-agent systems powered by generative AI models. Dibia highlighted the immense potential of these systems but noted complexity often leads to failure in real-world applications.
Drawing on insights from AutoGen, an open-source framework for multi-agent workflows, he detailed the common reasons these systems falter and strategies to improve their reliability. Dibia outlined ten major reasons multi-agent workflows fail. Some of his key insights included: using detailed instructions for agents, avoiding the use of small models, ensuring that instructions align with the capabilities of the large language model (LLM), equipping the LLM with effective tools, defining clear stopping criteria for agents, utilizing multi-agent patterns, integrating memory into agent workflows, incorporating metacognition, employing task-specific evaluations and metrics, and establishing a mechanism for agents to delegate tasks to humans when necessary.
He explained how agents, often driven by LLMs, are highly dependent on detailed prompts to function effectively. Without comprehensive and precise guidance, agents can misinterpret tasks or generate incorrect outputs. Another frequent issue is the use of less capable models, which lack the sophistication to handle intricate tasks or understand nuanced prompts.
“Autonomous multi-agent systems are like self-driving cars: proof of concepts are simple, but the last 5% of reliability is as hard as the first 95%.” – Dibia
One of the more technical challenges is orchestration—how agents coordinate and delegate tasks. Dibia emphasized that poorly defined workflows can lead to inefficiencies or outright failures. Additionally, agents often lack proper memory mechanisms, causing them to forget past interactions and repeat mistakes. “The complexity of multi-agent systems grows exponentially as you add more agents. Success requires careful design and constant iteration,” Dibia remarked during his presentation.
Another critical failure point is improper termination conditions. Without clear parameters for when a task is complete, agents can continue indefinitely, wasting computational resources and time. Dibia also addressed the risks associated with giving agents excessive autonomy, such as performing high-stakes actions without human oversight. He recommended implementing safeguards to assess the cost and risk of decisions, delegating to humans when necessary.
Scalability was another topic of focus. He emphasized the importance of robust infrastructure and observability tools for debugging and monitoring, noting that these are critical for managing.
Developers and engineers interested in Victor Dibia’s work can find more resources on the AutoGen GitHub repository, and a video of his QCon SF presentation is expected to be available on the conference website in the coming weeks.