Library · paper

Collaborating with AI Agents: Field Experiments on Teamwork, Productivity, and Performance

Harang Ju & Sinan Aral
2025

Source: https://www.semanticscholar.org/paper/a6592233f89114439705adf7244e0fb80bd96e72

Aral and Ju run a genuinely ambitious experiment — 2,234 participants, 11,024 real advertising outputs, a live field test on X generating ~5M impressions — and use it to identify three distinct mechanisms by which AI agents reshape teamwork: more task-oriented communication, increased delegation, and AI-recognition effects that moderate both.

The 'jagged frontier' finding (AI improves text quality but not image quality) connects to the Brynjolfsson task-based economics tradition while adding behavioural texture about what actually changes inside teams.

The 'diversity collapse' mechanism — higher average quality but homogeneous outputs — is the most consequential finding for product direction: organisations adopting AI agents may be trading creative variance for efficiency, a structural shift in organisational capability that compounds over time.

For product leaders, this paper provides empirical scaffolding for thinking about what human-AI collaboration actually does to teams, not as a philosophical proposition but as a measured causal claim about communication patterns, delegation, and output character.

The experimental apparatus itself — a controlled platform producing real-world evaluated outputs — models what rigorous AI organisational research looks like and sets a methodological bar that most adjacent work fails to clear.

Central argument

Ju and Aral run a large-scale field experiment — 2,234 participants producing 11,024 real advertising outputs on X, generating roughly 5 million impressions — to identify three causal mechanisms through which AI agents reshape team dynamics: a shift toward more task-oriented communication, increased delegation to AI, and AI-recognition effects that moderate both. Their 'jagged frontier' finding confirms AI reliably improves text quality but not image quality, consistent with task-based economics frameworks. The most consequential result is 'diversity collapse': AI collaboration raises average output quality while compressing variance, meaning organisations gain efficiency but lose the creative heterogeneity that drives differentiation over time.

Critique

The experiment is conducted entirely within a single platform context — advertising content on X — which raises questions about how far the three mechanisms generalise to product work involving longer time horizons, higher cognitive interdependence, or outputs that are harder to evaluate than ad performance metrics. The AI-recognition effect, which moderates both communication patterns and delegation, is measured at a single point in time; it likely attenuates as AI collaboration becomes ambient and unremarkable, meaning the behavioural findings may describe a transitional period rather than a stable equilibrium. The paper also treats diversity collapse as an organisational-level outcome without fully unpacking whether it is driven by convergence in individual outputs, homogenisation of team processes, or the AI system's own distributional biases — a distinction that matters enormously for any intervention designed to preserve creative variance.

Why it matters for product

The diversity collapse finding is directly actionable for CPOs making platform and tooling decisions: adopting AI agents across discovery, copywriting, or design workflows may improve average output quality on measurable dimensions while quietly eroding the portfolio variance that produces breakthrough ideas — a compound capability loss that standard productivity metrics will not surface. The shift toward task-oriented communication documented in the experiment suggests that human-AI teams restructure their collaboration around execution rather than exploration, which has implications for how product leaders design rituals, team composition, and the moments in the process where AI assistance is deliberately withheld. The experimental apparatus itself — real outputs, real-world evaluation, causal identification — also sets a methodological template for how product organisations could instrument their own AI adoption to measure structural changes in team behaviour rather than relying on self-reported productivity gains.