Library · paper

Teaching AI to Handle Exceptions: Supervised Fine-Tuning with Human-Aligned Judgment

M. DiSorbo, Harang Ju & Sinan Aral
2025

Source: https://www.semanticscholar.org/paper/ab4717b417b3e48abb7ecaaa4caab19a2d5cab01

The paper's most productive contribution is not technical but conceptual: it demonstrates that LLMs default to rigid policy adherence even when context demands discretionary judgment, which maps directly onto the classic organizational problem of rules versus judgment in incomplete-contract environments.

The finding that fine-tuning requires human explanations — not just labels — parallels tacit knowledge arguments in organization theory: you cannot transfer judgment by transmitting outcomes alone, you must transmit the reasoning architecture behind them.

This connects the alignment problem to older debates in management theory about how organizations socialize discretionary decision-making across agents who face novel situations.

The transfer learning result — that human-aligned exception-handling generalizes to novel scenarios — is the paper's most significant finding, because it suggests that something structurally analogous to institutional judgment can be induced in a model, not just encoded as rules.

For product directors deploying agentic AI in operational contexts, this reframes alignment from a safety-engineering problem into an organizational design problem: what gets trained, how it gets trained, and who provides the explanations are all institutional choices with downstream consequences.