How to Design Smarter AI Teams

A single specialist agent, working alone with no communication, outperforms a fully connected network of generalists by 6.7%—but only when computational limits are loose. Flip those limits, and the generalists win by 7.3%. This reversal isn’t a bug. It’s a law of collective intelligence—one that could reshape how we design AI teams solving everything from drug discovery to climate policy.

The future of artificial intelligence may not lie in bigger models, but in smarter collectives. As AI systems grow more capable, researchers are increasingly stitching them together into multi-agent teams—swarms of digital specialists and generalists that collaborate, argue, and negotiate toward solutions. These collectives already power grandmaster-level StarCraft II bots, accelerate scientific discovery, and assist in high-stakes medical decisions. But despite their promise, we’ve been designing them largely by instinct, not science.

Now, a new study by John Meluso, Laurent Hébert-Dufresne, Christoph Riedl, and H. Oliver Gao (2026) reveals that the performance of artificial collectives depends on a precise alignment between three factors: the nature of the task, the interpretive abilities of the agents, and their computational limits—what the researchers call rationality bounds. Get the alignment right, and performance soars. Get it wrong, and even the most sophisticated agents flounder.

This isn’t just about efficiency. Data centers running AI now consume more electricity than some countries, and their carbon footprint is growing fast. Designing multi-agent systems that are principled, not just powerful, could slash energy use while boosting results. The key insight? There is no one-size-fits-all architecture. Sometimes, siloed specialists beat collaborative generalists. Other times, the reverse is true. The difference hinges on how much each agent can compute—and what kind of problem they’re solving.

The Science

The researchers didn’t test specific AI models like GPT or Gemini. Instead, they built an abstract simulation of multi-agent problem-solving, stripping away the noise of particular implementations to isolate the core dynamics of collective intelligence.

They modeled agents as optimizers—entities that iteratively adjust their actions to maximize performance on a task, much like how real AI systems search for optimal strategies in games or supply chains. Each agent operates within a high-dimensional state space of 32 decision variables, representing the complexity of real-world problems. But crucially, agents are bounded: at each step, they can only adjust their variables by a limited amount—±0.1%, ±1%, ±10%, or ±100% of the total range. This constraint simulates the computational limits of real systems, echoing Herbert Simon’s classic concept of bounded rationality, where decision-makers lack infinite time or processing power.

The team ran over 250 trials across 30 distinct mathematical tasks, each designed to capture a different flavor of collective difficulty. These tasks fall into four categories derived from organizational psychology:

Generate: Finding novel, high-performing solutions in complex landscapes.
Choose: Selecting the best option among many alternatives.
Coordinate: Aligning interdependent actions across agents.
Negotiate: Balancing conflicting objectives, where agents have competing assessments of what counts as success.

These aren’t just academic distinctions. Drug discovery is a generate task. Portfolio optimization is a choose task. Power grid management is a coordinate task. International climate agreements? That’s negotiate.

Agents were connected in one of 18 network topologies—from fully connected (everyone can interpret everyone) to sparse trees and isolated nodes. The ability to “interpret” another agent’s actions defines a network tie

Figure 1:
Interpretive abilities correspond to network ties.
Agents can have many different interpretive abilities.
Varying how many interpretive abilities an agent has implies a conceptual spectrum, from a narrow set of abilities (can interpret few actions, like specialists) to a broad set (can interpret varied actions, like generalists).
With respect to a group, having greater interpretive abilities corresponds to more network ties while having fewer corresponds to fewer network ties. — Figure 1: Interpretive abilities correspond to network ties. Agents can have many different interpretive abilities. Varying how many interpretive abilities an agent has implies a conceptual spectrum, from a narrow set of abilities (can interpret few actions, like specialists) to a broad set (can interpret varied actions, like generalists). With respect to a group, having greater interpretive abilities corresponds to more network ties while having fewer corresponds to fewer network ties. Source: John Meluso, Laurent Hébert-Dufresne

. Specialists have few ties; generalists have many. Dense, decentralized networks emerge from generalists; sparse, centralized ones from specialists.

Critically, all agents had equal total computational resources. The trade-off wasn’t more compute for some—it was how that compute was allocated: depth versus breadth, independence versus integration.

What They Found

The headline result? Network structure alone has surprisingly little effect on average—just 0.07 standard deviations in performance. But that average hides massive variation. When you condition on task type, the effects jump to 0.33 standard deviations—4.5 times larger. For some tasks, they reach a staggering 1.84 standard deviations.

Network Density Benefits Vary by Task Type

Effect of network density on performance (standard deviations from mean)

Network Density Benefits Vary by Task Type
Label	Value
Generate	0.87
Choose	0.97
Coordinate	1.6
Negotiate	-0.55

Chart 1: Performance effects of network properties by task type Source: Meluso et al. (2026), Fig. 3A

This means the right network can be the difference between mediocrity and excellence—but only if it matches the task.

For generate, choose, and coordinate tasks, dense, decentralized networks dominate. On coordinate tasks—where agents must synchronize interdependent decisions—high network density boosts performance by +1.60 standard deviations, and decentralization by +1.84. Shorter path lengths help too, cutting time to convergence. These tasks benefit from rapid information flow and collective refinement. When everyone can interpret everyone, ideas spread fast, and the group climbs the performance landscape together.

But negotiate tasks tell the opposite story. Here, dense networks hurt performance by -0.55 standard deviations for density and -0.68 for decentralization. Instead, sparse, centralized topologies—like tree structures with a few well-connected mediators—perform best. Why? Because too much connectivity leads to premature convergence. Agents rush to compromise before fully exploring the space of possible solutions. A few generalist “brokers” who can interpret multiple specialists allow for diversity of exploration while still enabling eventual agreement.

Then comes the twist: rationality bounds flip the script.

When agents have loose computational limits (±10%), specialists win. The completely isolated agent—no interpretive ties at all—outperforms the fully connected generalist network by 6.7% and converges 23% faster. Why? Because in high-dimensional spaces, each additional interpretive tie adds complexity. Agents must optimize over more variables, creating an exponentially larger search space. With loose bounds, the landscape is rugged, full of local optima. Specialists, operating in lower-dimensional subspaces, sample more effectively and avoid getting trapped.

But when bounds are tight (±0.1%), the world smooths out. Agents can only make tiny adjustments, so the local landscape looks convex and gentle. Now, each interpretive tie provides valuable gradient information. Generalists, with access to more neighbors, estimate the direction of improvement more accurately. The fully connected network now wins by 7.3% and converges 6% faster.

Specialists vs. Generalists Across Rationality Bounds

Performance and convergence time differences between specialists and generalists

Specialists vs. Generalists Across Rationality Bounds
Label	Value
Loose (±10%)	6.7
Moderate (±1%)	7
Tight (±0.1%)	-7.3

Chart 2: Performance vs. convergence time across rationality bounds Source: Meluso et al. (2026), Fig. 4

At moderate bounds (±1%), a fundamental trade-off emerges: specialists achieve 7% better performance but take 27% longer to converge. The Pareto frontier is discontinuous—there’s no smooth middle ground. You choose: high-quality results at high time cost, or fast, adequate answers.

This isn’t just theoretical. Consider a pharmaceutical company using AI to design new molecules. If each agent has strong computational resources (loose bounds), a team of isolated specialists—each exploring a different chemical subspace—will likely find better candidates than a chatty, collaborative group. But if compute is constrained (tight bounds), a tightly coupled team that shares updates constantly will outperform.

Why This Changes Things

For years, the trend in AI has been toward more connectivity, more collaboration, more generalism. Systems like AutoGPT or MetaGPT assume that more communication is better. But this study shows that’s often wrong. Sometimes, the best way to solve a problem is to let agents work in isolation.

The implications span industries. In climate modeling, where teams of AI simulate complex Earth systems, coordinate tasks dominate. Dense, decentralized networks would allow faster synchronization of atmospheric, oceanic, and biospheric models. But in international climate negotiations, where nations have conflicting interests, a sparse network with a few mediators might avoid premature consensus and surface better long-term agreements.

In healthcare, AI collectives are being tested to diagnose rare diseases. This is a generate task—finding novel patterns in medical data. Here, the study suggests that a diverse team of generalists, freely sharing insights, will outperform siloed specialists.

But in autonomous vehicle fleets, where cars must negotiate right-of-way at intersections, a centralized mediator—or a few well-connected vehicles acting as coordinators—might prevent gridlock better than a fully decentralized swarm.

The study also reframes the energy crisis in AI. Multi-agent systems are major contributors to data center power use. If we can match network design to task and compute limits, we could achieve the same or better performance with fewer agents, less communication, and lower energy use. A specialist-heavy system for a generate task under loose bounds might use 30% less energy than a generalist swarm—without sacrificing results.

This aligns with a broader shift in AI: from brute force to principled efficiency. Just as the human brain evolved to balance exploration and exploitation, so too must artificial collectives. The study extends Simon’s bounded rationality into the multi-agent era, showing that network topology can compensate for limited compute. A well-designed sparse network can outperform a poorly designed dense one, even with the same agents.

It also challenges the assumption that “more intelligence” means more generalism. Nature doesn’t work that way. Ecosystems thrive on a mix of specialists and generalists. Ant colonies solve complex foraging problems not because every ant is smart, but because the system is well-structured. This study suggests artificial collectives should follow the same principle.

What's Next

The study opens at least three major questions.

First, how do these dynamics play out in real-world AI systems, not abstract optimizers? The researchers tested their framework with L-BFGS-B and Dual Annealing algorithms, and the patterns held. But large language models (LLMs) behave differently. They don’t just optimize—they reason, plan, and simulate. Future work must test whether the specialist-generalist trade-off holds when agents use chain-of-thought or reflection.

Second, what about dynamic networks? In this study, ties were fixed. But in reality, agents could adapt their interpretive connections over time—forming and breaking ties based on task progress. Could a hybrid system start as specialists, then switch to generalists when nearing convergence? Early evidence from human teams suggests yes (Centola 2022), but AI systems could do it faster.

Third, how do we measure interpretive ability in real agents? In the study, it’s defined by network ties. But for LLMs, interpretive ability might depend on prompt design, fine-tuning, or embedding space alignment. We need metrics—like a “semantic compatibility score”—to predict which agents can understand each other.

There are also limits to the current work. The tasks are mathematical functions, not real-world scenarios. The agents are homogeneous in capability, unlike real teams where some are stronger than others. And the study assumes perfect information—no deception, no miscommunication.

Yet even with these caveats, the core message is robust: collective intelligence is not just about the agents—it’s about the fit between agents, task, and constraints.

For engineers, this means abandoning one-size-fits-all architectures. For policymakers, it suggests that AI governance should encourage diversity of design, not just scale. And for all of us, it offers a humbling reminder: even in artificial systems, wisdom emerges not from individual brilliance, but from the right kind of connection.

The most powerful AI may not be the one that thinks the most—but the one that knows when to listen, and when to work alone.

The 6.7% Rule: When AI Specialists Beat Generalists (And When They Don’t)