Question 1

Why is fine-tuning LLMs with human explanations — rather than just outcome labels — critical for exception handling in product systems?

Accepted Answer

DiSorbo et al. demonstrate that judgment cannot be transferred through outcomes alone; you must transmit the reasoning architecture behind decisions. This parallels how organizations actually socialize discretionary decision-making: agents learn not just what the right answer was, but *why* it was right in context. For product directors deploying agentic AI, this means your training data design is an organizational design choice with real consequences for how your system handles edge cases.

Question 2

How does this paper reframe the AI alignment problem for product teams?

Accepted Answer

The work shifts alignment from a safety-engineering problem into an organizational design problem. Rather than hard-coding rules or hoping models generalize from labels, it suggests that institutional judgment can be induced in models through structured fine-tuning with human reasoning. This means decisions about *what* gets trained, *how* it gets trained, and *who explains* the reasoning become your core product and organizational levers.

Question 3

What's the most actionable finding for teams building systems that must handle novel scenarios?

Accepted Answer

The transfer learning result shows that human-aligned exception-handling generalizes to situations the model wasn't explicitly trained on — suggesting that teaching a model the *logic* of discretionary judgment is more powerful than encoding exhaustive rules. For product directors, this validates investing in high-quality explanation data during fine-tuning as a way to build more adaptive and contextually intelligent systems.