Beyond Gold Standards: Sins of Omission and Sins of Commission
The Perils of Precedent in Scientific Inquiry
I often say that there are no gold standards in biomedicine—or science for that matter. What we call "gold standards" are methodologies that have achieved institutional consensus through various pathways: sometimes through rigorous empirical validation (randomized controlled trials), occasionally through theoretical breakthroughs (Koch's postulates for infectious disease), but often through historical happy accidents that became entrenched practice. Fleming's serendipitous discovery of penicillin became the template for antibiotic screening; Röntgen's accidental observation of X-rays established radiography as diagnostic orthodoxy. These represent not immutable scientific truths, but contingent scientific convention.
Yet we've somehow pay little attention to this fundamental insight. The term "gold standard" itself reveals our cognitive bias toward permanence in a field defined by evolution.
True progress in science has historically occurred precisely when established orthodoxies were challenged and replaced.
Semmelweis overturned the medical establishment's rejection of hand hygiene, despite facing professional ostracism. Wegener's continental drift theory was ridiculed for decades before plate tectonics vindicated his insights. McClintock's genetic transposition discoveries were dismissed as impossible until molecular biology confirmed her prescience. Marshall and Warren faced fierce resistance when they challenged the peptic ulcer orthodoxy with their bacterial hypothesis—until they won the Nobel Prize.
The pattern is consistent: scientific revolutions emerge not from incremental improvements to accepted methods, but from fundamental challenges to accepted wisdom. Gold doesn't tarnish, but many established gold standards should. When methodologies become untouchable—when questioning them is too demanding or feels like heresy—we may end up mistaking institutional inertia for scientific rigor.
Somewhere in the past few decades, we appear to have misplaced the institutional compass that once pointed us toward fundamental questioning. Perhaps it's the scale of modern research enterprises, the complexity of regulatory frameworks, or the risk-averse nature of contemporary funding mechanisms. Whatever the cause, we seem to have developed "orthodoxy paralysis"—an inability to challenge foundational assumptions even when evidence suggests they may be flawed.
A case study in scientific inertia
For years, oncology drug development has operated under a deceptively simple premise: animal models can reliably predict human toxicity. This assumption has become more than methodology—it's become dogma, underlying regulatory frameworks, guiding billions in research expenditure, and determining which therapeutic candidates advance to human trials.
In 2020, we conducted an analysis of 108 investigational oncology drugs, revealing the uncomfortable mathematics of this supposed gold standard. Animal models demonstrated positive predictive values of approximately 65% and negative predictive values approaching 50%—performance that would be considered mediocre in weather forecasting, yet somehow acceptable for first-in-human clinical trials. These findings aren't outliers. Others have reported similar limitations in toxicity prediction, especially with modern therapies such as antibody drug conjugates or tyrosine kinase inhibitors. This suggests systematic predictive inadequacies in preclinical models. We have, one may conclude, built an entire regulatory edifice on a foundation that is almost as reliable as coin-flipping.
This case study reveals a deeper pathology in the conduct, regulation, and funding of new ideas: our persistent reliance on demonstrably inadequate methods simply because they’ve acquired institutional legitimacy. At times, slow progress in basic science and technology is both natural and acceptable—biology is intricate, and caution has its place. But today, with the advent of powerful computational tools, multi-omic profiling, and high-resolution human data streams, such inertia may no longer be benign. This is not merely a matter of moving slowly (we don’t want to move fast and break things); it is a failure to create the conditions for transformation. The tools are here but the resistance seems to be systemic.
Worryingly, this attachment to orthodoxy is now seeping into the development of AI models, where we are training algorithms on “gold standards” that reflect the very limitations we should be trying to overcome. By anchoring labeling and validation to outdated benchmarks, we risk transferring entrenched human biases and methodological fragilities directly into our machines—automating yesterday’s blind spots at scale.
Instead of liberating us from legacy thinking, AI may become its most efficient enforcer.
The architecture of scientific orthodoxy
Why does the biomedical enterprise maintain devotion to demonstrably inadequate models? The answer reveals the complex ecosystem that sustains scientific orthodoxy even in the face of contradictory evidence—an ecosystem that extends far beyond any single field or methodology.
For our case study, regulatory frameworks, established decades ago when animal models represented the best available technology, create what economists recognize as path dependency—a system where historically rational choices constrain future options even when superior alternatives emerge. To be fair, these frameworks initially reflected genuine scientific progress; animal models did represent advances over purely empirical human experimentation. But what begins as rational adaptation can calcify into irrational persistence.
Pharmaceutical companies, operating under intense financial pressure and regulatory uncertainty, rationally choose familiar pathways over potentially superior but unvalidated alternatives—a decision that makes economic sense even when it may not represent optimal science. Academic institutions perpetuate these patterns through training programs that treat current practice as timeless truth rather than historical contingency.
This creates what we might call the "orthodoxy trap"—a state where stakeholders acknowledge limitations privately while maintaining public adherence to established practice. It's worth noting that this isn't necessarily malicious or even irrational; institutions often develop conservative practices precisely because they've learned from past failures. The thalidomide tragedy of the 1960s, for instance, reinforced regulatory emphasis on extensive animal testing as a safeguard against human harm. However, when safeguards become obstacles to better safeguards, conservatism may paradoxically increase rather than decrease risk.
Of course this institutional dynamic transcends preclinical toxicology. We see similar patterns in diagnostic imaging (where "clinical correlation" often masks inadequate predictive value), in psychiatric classification systems (where statistical validity substitutes for biological understanding), and in countless other domains where institutional momentum has replaced scientific scrutiny.
Transformation in plain sight
Beneath this surface of methodological stagnation, superior alternatives are emerging across multiple domains—including our toxicology case study. Advanced technologies are not merely improving upon traditional approaches; they're redefining what predictive science can achieve.
Organ-on-Chip (OoC) platforms represent a shift from cross-species extrapolation to direct simulation of human physiology. Using microfluidics to mimic organ-level functions, OoC systems have revealed toxicities—like sorafenib-induced cardiotoxicity—that animal models failed to detect. Their real-time analytics and high-throughput capabilities can make them a more accurate and cost-effective tool for drug development. Three-dimensional organoid systems also preserve the architectural complexity and genetic diversity that animal models cannot replicate, demonstrating qualitative leaps in predictive accuracy.
These advances in toxicology mirror broader patterns across science where human-relevant models, computational approaches, and systems-level thinking are displacing reductionist methodologies that once seemed indispensable. The tools for scientific revolution exist; what's missing is the institutional will to use them.
The immune response to innovation and Dante’s Inferno
The transition from traditional to advanced methodologies encounters formidable institutional resistance that operates much like an immune system—reflexively rejecting foreign approaches regardless of their potential benefits.
Regulatory agencies, bound by decades of precedent and legal liability, require extensive validation before accepting novel methodologies. The pharmaceutical industry gravitates toward established pathways despite their demonstrated limitations. Academic culture reinforces these tendencies through publication practices, funding mechanisms, and professional advancement criteria that tend to reward conformity.
This institutional immune response creates a profound irony: the systems designed to ensure scientific rigor may be systematically preventing the adoption of more rigorous science. Some argue that we have created a culture so focused on avoiding Type I errors (accepting false positives) that we’ve risked becoming incapable of recognizing Type II errors (rejecting true positives). The very mechanisms meant to protect scientific integrity may be undermining it.
As my colleague Dr. Richard Pazdur once remarked during my time at the FDA, sins of omission (Type II errors) belong to a higher circle of Inferno than sins of commission (Type I errors). I found the metaphor so apt that I began using it regularly in presentations while at the Agency.
Attachment to orthodoxy is a fundamental departure from science's revolutionary tradition. Where previous generations of researchers aspired to overturn established paradigms—from miasma theory to phlogiston, from geocentric astronomy to Newtonian absolutism and Einsteins theory of relativity—contemporary science, particularly biomedicine, seems increasingly unable (or unwilling) to challenge its own foundational assumptions.
The human cost of methodological conservatism is not theoretical—it is measured in delayed therapies and preventable adverse events—but of course so too would be the cost of reckless methodological innovation. This tension between caution and progress defines the central challenge of contemporary science.
Toward scientific courage
Moving beyond current limitations begins with a necessary reckoning: many of our so-called gold standards are, in truth, fool’s gold—gleaming with institutional legitimacy yet poorly suited to their intended purpose. This is particularly evident in the language that scaffolds modern science and medicine. Terms like “non-small cell lung cancer” define disease by negation, offering no insight into biology. “Triple-negative breast cancer” tells us what’s missing, not what’s present. In psychiatry, diagnoses like “borderline personality disorder” and “schizophrenia” persist despite decades of mechanistic ambiguity and overlapping spectra. Even in cardiology, phrases like “heart failure with preserved ejection fraction” reflect conceptual compromise rather than physiological clarity.
Let’s not even get started on “idiopathic” conditions—the medical equivalent of shrugging. “Idiopathic pulmonary fibrosis,” “idiopathic intracranial hypertension,” “idiopathic hypersomnia”—these labels don’t advance understanding; they enshrine uncertainty. They mark the boundary not of disease, but of imagination.
Linguistic relics and semantic oddities encode outdated frameworks into the very tools we use to understand the world. When these terms become the labels for training AI models, structuring clinical trials, or setting regulatory standards, their limitations become systemic. We often talk about bias in the data but not so much about bias in the dictionary, often ignoring the fact that it is our taxonomies that can fundamentally shape the boundaries of scientific inquiry.
Precision begins not at the endpoint of discovery, but in the very language we choose to describe it.
Science progresses not through comfortable consensus, but through uncomfortable questioning—a tradition behind nearly all breakthroughs. The transition requires what we can call "scientific courage"—the willingness to abandon familiar approaches in favor of evidence-based alternatives, even when those alternatives challenge established institutional interests. This is not recklessness; it's the essence of the scientific method applied to the scientific method itself.
Historical precedent suggests that such paradigm shifts, while initially disruptive, ultimately advance both scientific understanding and human welfare. The shift from bloodletting to evidence-based medicine, from astronomical epicycles to heliocentric models, from fixed genetic inheritance to dynamic genomic regulation—each transition faced fierce institutional resistance before becoming the new orthodoxy.
The test of our time
We stand at a critical juncture where the accumulation of evidence against traditional approaches has reached a tipping point—not just in toxicology, but across multiple scientific domains. The question is no longer whether alternative methodologies are superior in many cases—emerging data, such as in our toxicology case study which the FDA has recently started to address— increasingly demonstrate they are. The question is whether the scientific enterprise possesses the institutional flexibility to embrace necessary change.
Perhaps it's time to abandon the concept of gold standards entirely. Instead of seeking permanence in fields defined by discovery, we can consider methodologies valued not for their historical or institutional pedigree, but for their demonstrated performance and adaptive potential.
In this redefinition lies not merely the future of any single scientific domain, but a broader lesson about scientific progress itself. True scientific rigor requires not just methodological precision, but methodological humility—the recognition that today's gold standard may be tomorrow's fool's gold.
The evidence is clear. New tools exist. OoC systems, humanized models, and AI-driven platforms can offer mechanistic fidelity and predictive power far beyond what many legacy methods can deliver. We are no longer obliged to divine the future of human biology through the blurred silhouette of outdated heuristics. It is, in effect, like peering at a scan with the unaided eye and declaring, “clinical correlation advised”—a phrase that has become less a call for nuance than a polite way to concede uncertainty. But today, we are equipped to do better. We stand at a moment where we can replace educated guesswork with engineered insight and tradition with optimism. The patients, the public, and the future of science itself deserve nothing less than our full commitment to the most effective tools available—regardless of their institutional pedigree. This is not the abandonment of rigor, but a long-overdue refinement. After all, progress rarely announces itself in the familiar and comfortable language of precedent.
Bravo Sean for sharing your wonderfully advanced know-how