The Problem of Irreproducible Bioscience Research
Flier, a former dean of Harvard Medical School, writes about the reproducibility crisis in biomedical research from inside the discipline.
The piece is clear-eyed about the structural causes — career incentives, publication pressures, statistical malpractice — and honest about how hard they are to fix.
For product direction it is a cautionary companion to anyone building a culture of experimentation: bioscience discovered, at scale, that most of its published findings do not replicate, and the reasons are not primarily fraud but ordinary human incentives operating on large numbers of people.
A useful humility check before announcing that your team has "proven" something with data.
Read alongside Merton to see what the norms look like when they hold and what happens when they erode.
Central argument
Flier argues that the biomedical research enterprise is experiencing a systemic reproducibility crisis — not primarily driven by fraud, but by structural incentives that reward novelty, positive results, and publication volume over rigor and replication. Career advancement, grant competition, and journal prestige collectively push researchers toward practices like p-hacking, selective reporting, and under-powered studies, making the published literature an unreliable guide to what is actually true. The crisis reveals that institutional norms alone cannot sustain scientific integrity when individual incentive structures run in the opposite direction.
Critique
Flier diagnoses the incentive structures clearly but offers no serious theory of reform — the prescriptive dimension of the paper is thin relative to the analytical one, leaving the reader with a thorough account of why the problem persists and little guidance on what interventions might actually shift equilibria at scale. A sharper critic might also note that, as a former dean of one of the institutions most implicated in prestige-driven research culture, Flier's position allows him to critique the system without fully reckoning with the degree to which elite institutions like Harvard have actively perpetuated the conditions he laments.
Why it matters for product
Product teams routinely treat A/B test results, user research findings, and analytics-derived insights as settled facts rather than provisional signals subject to the same replication failures Flier documents in bioscience — the underlying incentive dynamics are nearly identical: pressure to ship, to validate roadmap bets, and to show data-backed wins discourages the kind of adversarial re-testing that would reveal how many 'proven' product decisions are noise. For a CPO, this is a concrete argument for building replication checkpoints into the experimentation culture: not just asking whether a test reached significance, but whether the effect held in a follow-on test, a different segment, or a different quarter before committing resources to scaling it.