Researchers from Google and MIT published a paper describing a predictive framework for scaling multi-agent systems. The framework shows that there is a tool-coordination trade-off and it can be used to select an optimal agentic architecture for a given task.
The scaling model relies on several predictive factors of the system, including the underlying LLM’s intelligence index; the baseline performance of a single agent; the number of agents; number of tools; and coordination metrics. The researchers found there were three dominant effects in the model: tool-coordination trade-off, where tasks requiring many tools perform worse with multi-agent overhead; capability saturation, where adding agents yields diminishing returns when the single-agent baseline performance exceeds a certain threshold; and topology-dependent error amplification, where centralized orchestration reduces error amplification. They also found that the best coordination strategy is task dependent: financial reasoning benefits from centralized orchestration, while web navigation does better with a decentralized strategy. When evaluated on held-out test data, the scaling framework predicted the optimal coordination strategy at 87%. According to Google:
As foundational models like Gemini continue to advance, our research suggests that smarter models don’t replace the need for multi-agent systems, they accelerate it, but only when the architecture is right. By moving from heuristics to quantitative principles, we can build the next generation of AI agents that are not just more numerous, but smarter, safer, and more efficient.
The Google team classified different multi-agent architectures into the categories based on how the agents in the system coordinated: independent, where there is no inter-agent coordination; centralized, where agents communicate only with a central orchestrator; decentralized, with peer-to-peer coordination; and hybrid, with a balance between centralized and decentralized. Each of these has several configuration parameters, such as the number of agents, number of iterations per agent, etc, as well as different computational and memory complexities and number of LLM calls.
Multi-agent Architectures. Image Source: Google Research
The scaling model the researchers developed is a regression model with 20 terms, based on nine predictor variables as well as interaction terms between the different predictor variables. They excluded “interactions without clear mechanistic justification…to avoid overfitting.” Google notes that the model does have several limitations. In particular, they note that “tool-heavy” tasks cause inefficiencies in multi-agent coordination, and point toward the need for “specialized coordination protocols for tool-intensive tasks.”
In a Hacker News discussion about the paper, several users shared their own experiences with multi-agent workflows. One wrote:
I’ve been building a lot of agent workflows at my day job. Something that I’ve found a lot of success with when deciding on an orchestration strategy is to ask the agent what they recommend as part of the planning for phase. This technique of using the agent to help you improve its performance has been a game changer for me in leveraging this tech effectively. YMMV of course. I mostly use Claude code so who knows with the others.
Collaboration and orchestration strategies for multi-agent systems is an active research topic. In 2025, InfoQ covered Amazon’s multi-agent collaboration framework for Amazon Bedrock, which enables specialized agents to work together under a supervisor agent’s coordination. Earlier this year, InfoQ covered Google’s guide outlining eight essential design patterns for multi-agent systems, which provides concrete explanations of each pattern along with sample code for Google’s Agent Development Kit.
