Single-Agent LLMs Outperform Multi-Agent Systems on Multi-Hop Reasoning Under Equal Thinking Token Budgets
Source: https://arxiv.org/abs/2604.02460 ↗
Full text: arXiv preprint ↗
The paper that forced the multi-agent debate to control for what it should have controlled from the start: computational budget.
Tran and Kiela gave single and multi-agent LLM systems identical reasoning token budgets and measured performance on multi-hop reasoning tasks across three model families.
The single agent matched or outperformed multi-agent variants in nearly every condition.
The theoretical backbone is the Data Processing Inequality: every inter-agent handoff can only lose information, never create it.
The study also identifies the regime where multi-agent becomes competitive — degraded contexts where no single agent can maintain coherence — turning a binary debate into a boundary question.
For product leaders deploying agentic systems, the core lesson is Brooks's law restated for the AI era: coordination has costs, and those costs are invisible until you measure them.