Library · paper

The Augmentation Trap: AI Productivity and the Cost of Cognitive Offloading

Michael Caosun & Sinan Aral
2026

Source: https://www.semanticscholar.org/paper/0c1b37c92bdc1481bb293835d2b1afe14979b9f1

Full text: open-access via OpenAlex

Aral (MIT, one of the most cited scholars in digital economics) and Caosun construct a formal dynamic model that makes precise what practitioners sense but cannot yet argue: AI adoption can be individually rational at every step while producing an outcome worse than non-adoption, a structural trap rather than a mistake.

The decomposition into five adoption regimes is the kind of analytical architecture that travels — product directors can use it to classify their own deployments rather than simply assert that 'it depends.' The skill-divergence result — that small differences in initial experience, compounded by managerial incentive horizons, can send workers to permanently opposite equilibria — connects directly to the library's themes on bounded rationality, principal-agent problems, and how institutional incentives shape organisational capability.

Read alongside Acemoglu and Restrepo on automation and task displacement, and against Brynjolfsson's more optimistic productivity framing, this paper provides the missing dynamic: the cost that arrives after the measurement window closes.

Central argument

Caosun and Aral argue that AI tools create a structural trap: even a fully informed decision-maker rationally adopts AI when front-loaded productivity gains outweigh long-run skill costs, yet the worker can end up permanently less capable than before adoption. The core mechanism is a productivity decomposition into a skill-neutral component (α) and a knowledge-complementary component (β): high-α, low-β deployments extract output independently of worker judgment, which simultaneously removes the practice through which judgment is maintained. When managers are short-termist or worker skill carries external value the manager doesn't internalize, this dynamic worsens into the augmentation trap — the worker would have been better off never using AI at all. A further finding is that skill divergence is possible: experienced workers can develop toward their full potential while less experienced workers deskill to zero, with small differences in managerial incentives determining which path a worker takes.

Critique

The model holds (α, β) fixed over time, which the authors themselves acknowledge likely understates the problem, since empirical evidence from the automation bias literature suggests effective β drifts downward as users grow familiar with the tool and stop engaging critically with its outputs. More substantively, the model treats usage intensity as a managerial choice variable applied uniformly to a worker, abstracting away from the task-level heterogeneity that would be most actionable: in practice, a single worker uses AI in high-α mode for some tasks and high-β mode for others within the same workday, and the welfare implications depend on that mix in ways the aggregate intensity parameter cannot capture. This limits the model's prescriptive reach — it classifies deployment regimes but cannot yet tell practitioners how to compose workflows across task types to avoid the trap.

Why it matters for product

For a CPO, the β parameter is a direct design variable: the architecture of how AI is embedded in product workflows — whether engineers review and reason about AI-generated code or simply accept it, whether PMs use AI to stress-test their own hypotheses or to generate them — determines whether the team's judgment compounds or erodes over time. The finding that novices are at particular risk of deskilling to zero while experts develop toward their potential has direct implications for team composition strategy: onboarding and early-career roles that rely heavily on AI-assisted output may be producing headcount without building the organizational capability that headcount is supposed to represent. The five-regime classification gives product leaders a concrete diagnostic lens for auditing their current AI tooling — not by asking whether it raises velocity, which short-run metrics will confirm, but by asking what the β value of each deployment is and whether the workflow design keeps human judgment load-bearing.