Human Trust in AI Search: A Large-Scale Experiment
Source: https://www.semanticscholar.org/paper/1c62d4f8f028874e8c8e21e3e7340f3ad3a7f927 ↗
Aral and Li treat the generative search interface not as a neutral conduit but as an architecture that actively shapes the epistemic dispositions of its users — a framing that connects squarely to the library's concern with how technology redesigns attention and judgment.
The causal finding that hallucinated citations increase trust while explicit uncertainty signals suppress it inverts naive assumptions about transparency: making AI limitations visible does not straightforwardly improve decisions.
This is a preregistered randomized experiment at real scale — 12,000 queries, seven countries, a U.S.-representative sample — which gives the mechanism claims unusual empirical standing.
For product directors, the deeper implication is that interface choices (reference links, confidence markers, social feedback) are policy decisions about which populations bear the epistemic risk of AI error, and the differential vulnerability by education and demographics makes this a governance question as much as a design one.
Central argument
Li and Aral demonstrate experimentally that generative AI search interfaces systematically distort user epistemic calibration in counterintuitive ways: hallucinated citations increase user trust in AI responses, while explicit uncertainty signals — the transparency mechanisms designers typically reach for — actually suppress trust without improving decision accuracy. Running a preregistered RCT across 12,000 queries and seven countries, they show that interface elements like reference links and confidence markers function not as neutral information but as trust levers whose effects vary significantly by user education and demographics, meaning the architecture of AI search redistributes epistemic risk unevenly across populations.
Critique
The experimental setting — users querying a purpose-built generative search interface under study conditions — may not capture the habituated, low-attention behavior that characterizes routine search use, where trust heuristics are even more automatic and less amenable to design nudges. More fundamentally, the finding that uncertainty signals suppress trust is measured against trust in the AI response, but the study may not fully disentangle whether suppressed trust leads users toward better external verification or simply toward inaction and disengagement — a distinction that matters enormously for evaluating whether transparency interventions are net harmful or merely incomplete.
Why it matters for product
For a CPO, the core implication is that standard design moves — adding citations, surfacing confidence scores, showing sources — cannot be treated as responsible-AI defaults without empirical validation, because they may actively degrade decision quality for the users least equipped to compensate. The demographic heterogeneity in vulnerability also reframes what a product metrics dashboard should track: if aggregate trust or satisfaction scores mask that lower-education user segments bear disproportionate exposure to hallucination harm, then those metrics are concealing a governance liability, not measuring product health.