The Misnomer of Surrogate Endpoints in Oncology
An Examination of a Fundamental Flaw in Regulatory Language and Scientific Precision
I vividly recall my first presentation to FDA leadership during my tenure at the agency. The case was an accelerated approval application for a targeted lung cancer therapy, its primary efficacy endpoint the Objective Response Rate (ORR). Standing before senior leadership, I confidently described ORR as “a surrogate endpoint,” with the qualifier that the Code of Federal Regulations (21 CFR 314.510) requires: “reasonably likely to predict clinical benefit.”
The room went quiet until Dr. Bob Temple, then Deputy Center Director for Clinical Science, cut through with a question that would fundamentally reshape my understanding of regulatory precision:
“Who has validated ORR as a surrogate? The sponsor? The FDA? Your team at OHOP?”
The question was simple, yet unanswerable. It exposed how linguistic shortcuts had crept into regulatory practice, creating the illusion of certainty where only provisional evidence existed. Temple’s intervention revealed a larger truth: regulators, sponsors, and the research community had all become complicit in conflating convenience with scientific rigor. What began as a moment of personal reflection later crystallized into recognition of a systemic flaw: ORR, like most other endpoints used in accelerated approval, has never been formally validated as a true surrogate.
The conceptual foundation of surrogacy
The scientific definition of surrogate endpoints comes from Ross Prentice’s 1989 framework, which articulated rigorous statistical criteria. A true surrogate must not only correlate with the clinical outcome but fully capture the treatment effect such that, once the surrogate is accounted for, no residual treatment effect remains. Formally, the hazard ratio for treatment effect on overall survival, conditional on the surrogate, must equal unity.
This is a stringent requirement—one that few oncology endpoints have ever met. Validation demands large-scale, patient-level meta-analyses across independent trials, demonstrating robust trial-level correlations between treatment effects on the surrogate and overall survival. Standards typically require coefficients of determination (R²) exceeding 0.7–0.8. In other words, validation requires both statistical power and biological plausibility at a level that oncology has rarely delivered.
The regulatory framework's fundamental flaw
The regulatory framework for surrogate endpoints evolved significantly with the FDA's accelerated approval regulations, first codified in 1992 under Subpart H and later expanded under Subpart E. These regulations introduced the phrase "surrogate endpoint reasonably likely to predict clinical benefit"—language that contains a fundamental scientific contradiction that has contributed to decades of confusion throughout the oncology ecosystem.
The phrase "surrogate endpoint reasonably likely to predict clinical benefit" is a scientific oxymoron. By definition, a true surrogate endpoint has been rigorously validated to predict clinical benefit with high statistical confidence—it doesn't require the hedge of "reasonably likely." Conversely, if an endpoint is only "reasonably likely" to predict benefit, it hasn't achieved the validation threshold necessary to be called a surrogate. The CFR attempts to have it both ways: using the authoritative scientific term "surrogate" while simultaneously hedging with qualifiers that undermine that very designation.
This regulatory language is convenience rather than scientific precision. A more honest approach would have established terminology such as "intermediate endpoint reasonably likely to predict clinical benefit" or created an entirely new regulatory category, rather than co-opting and diluting the established scientific definition of surrogacy. The CFR's flawed foundation has enabled decades of imprecise usage throughout the ecosystem, as regulatory reviewers, sponsors, and researchers have all struggled to reconcile scientifically contradictory regulatory language with clinical reality.
Moreover, the linguistic drift within the regulatory community has compounded this fundamental problem. The CFR's complete phrase "surrogate endpoint reasonably likely to predict clinical benefit" has been colloquially shortened to simply "surrogate endpoint," dropping the crucial qualifier that acknowledges uncertainty. This semantic evolution has created false impressions of validation where even the flawed regulatory language itself acknowledges uncertainty through careful qualification.
Examining current endpoints: progression-free survival
Progression-Free Survival (PFS) exemplifies the challenges and opportunities of validation. PFS correlates variably with overall survival depending on disease setting, mechanism of action, and treatment sequence—correlation coefficients ranging from <0.3 to >0.8. Rather than treating this variability as a flaw, it can be leveraged to develop context-specific validation frameworks.
The biological rationale is clear: delaying progression may control symptoms, maintain performance status, and preserve options. But immunotherapy revealed complexities—pseudoprogression, delayed effects, and long post-progression survival—that complicate the relationship between PFS and survival. These phenomena call for more nuanced models, not abandonment of the endpoint. With sufficient pooled data, PFS could be validated as a surrogate in specific contexts with sufficient certainty.
Objective response rate (ORR): an opportunity for definitive validation
ORR, measuring tumor shrinkage, has intuitive appeal and strong mechanistic rationale. Yet rigorous validation has never been completed. This is not a failure of science, but a failure of collective action. Large-scale, pooled analyses could explore not only response presence but depth, duration, and timing, each potentially critical to long-term outcomes.
Modern modeling approaches—accounting for temporal dynamics and heterogeneity of response—could transform ORR into a validated surrogate in varying contexts. Moreover, advances in imaging and response categorization could refine its utility, particularly in an era of targeted, combination, and immune-based therapies where mechanisms of tumor reduction can be better understood.
Pathological complete response: a validation success story
One of the strongest precedent for success is Pathological Complete Response (pCR) in neoadjuvant breast cancer. Here, a small group of motivated FDA reviewers and external researchers collaborated to compile meta-analytic evidence across trials, showing robust correlation with event-free and overall survival—particularly in triple-negative and HER2-positive disease.
This was not universal. In hormone receptor–positive cancers, correlations were weaker. But the subtype-specific validation was itself a scientific advance: a demonstration that surrogate validation can be precise, context-dependent, and evidence-based. It showed that with biological rationale and sufficient pooled data, definitive surrogacy is achievable.
Statistical opportunities in validation
Today’s methodological toolkit for validation is rich: mixed-effects meta-analyses, causal inference frameworks, copula models, and time-varying analyses. The bottleneck is not technique but data access. Initiatives like Project Data Sphere and the CEO Roundtable on Cancer demonstrate that patient-level pooling across sponsors is feasible.
Temporal dynamics add further complexity: treatment effects may manifest differently across time horizons. Adaptive models that capture these dynamics can offer more accurate surrogacy assessments than traditional landmark analyses. The challenge is to move from isolated studies to harmonized frameworks with sufficient scale to generate conclusive evidence.
Regulatory implications
The changing vocabulary of oncology endpoints offers us a chance to establish greater scientific precision and regulatory predictability. When regulators, sponsors, advisory committees, and investigators use language that reflects the true strength of evidence, they reinforce a shared foundation for informed decision-making. Attention to linguistic precision is not cosmetic; it catalyzes the deeper task of achieving rigorous validation.
Current patterns in the literature show how inconsistent terminology can blur distinctions between exploratory signals, intermediate measures, and validated surrogates. Greater clarity in how endpoints are described can directly improve study design, endpoint selection, and the interpretation of trial outcomes. The move toward more accurate terminology is not merely academic—it underpins more efficient drug development and ultimately more reliable patient care.
This effort must extend beyond individual manuscripts or regulatory filings. It is an ecosystem-level task, one that positions the research community to lead in defining standards with benefits for all stakeholders. Clear and consistent communication about the validation status of endpoints lays the groundwork for collaborative initiatives that can deliver the rigorous evidentiary base needed for regulatory confidence and timely patient access to innovation.
The path forward
What lies ahead is not merely the technical task of validating a few endpoints but the larger project of rebuilding coherence in how oncology evaluates evidence. For decades, we have operated in a state of epistemic ambiguity, where regulatory language blurs the line between exploratory measures and validated surrogates. The result has been a drug development ecosystem that tolerates uncertainty not because it is unavoidable, but because it has been routinized.
Breaking out of this pattern requires treating surrogate validation as an enterprise in its own right—a discipline at the intersection of statistics, biology, and regulatory science. That discipline must have its own infrastructure: shared data environments capable of integrating patient-level information across sponsors and disease areas; methodological standards that define how correlations are tested, replicated, and reported; and governance structures that ensure transparency while protecting legitimate commercial interests. The tools already exist in fragments, but without an organizing framework they cannot achieve the critical mass required to generate decisive evidence.
Just as important as the mechanics is the culture of precision that underpins them. The misuse of terms like “surrogate” is not a semantic quibble but a reflection of deeper habits of convenience that have distorted how progress is measured. A more disciplined approach would insist that every designation of an endpoint carries a clear statement of evidentiary strength, with intermediate measures described as such until validation is earned. This would make uncertainty explicit rather than hidden, shifting the regulatory conversation from one of hedged terminology to one of transparent thresholds.
The path forward, then, is not a single validation exercise but the establishment of a durable architecture for how endpoints are defined, tested, and communicated. Such an architecture would allow oncology to evolve beyond case-by-case improvisation toward a system where new endpoints—whether radiographic, molecular, or composite—can be evaluated within a coherent and predictable framework. In that sense, surrogate validation is less the end of a process than the beginning of a more stable scientific order.
From question to leadership
Dr. Temple’s challenge—“Who has validated ORR as a surrogate?”—was never only about ORR. It was about responsibility: who in the ecosystem owns the task of turning provisional measures into validated surrogates? For too long, the answer has been no one. Sponsors pursue approvals, regulators evaluate applications, academics publish correlation studies, but the connective tissue between these efforts remains weak.
Leadership in this space will not come from waiting for consensus to emerge organically. It requires institutions willing to claim the problem as their mandate. That could mean regulatory bodies convening structured programs for endpoint validation, with dedicated funding and methodological oversight. It could mean cross-sponsor consortia agreeing to precompetitive data pooling as a condition of accelerated approval, thereby creating the evidence base for later validation. It could also mean academic centers reframing their role: not only critiquing endpoints retrospectively, but proactively designing validation studies with trialists and regulators.
The opportunity is not just to answer the narrow question of whether PFS or ORR can serve as surrogates in given settings. It is to establish oncology as a model for how a therapeutic field can discipline itself around evidentiary standards without stifling innovation. If achieved, such leadership would not only stabilize regulatory decision-making but also demonstrate that precompetitive scientific collaboration can solve structural problems that no individual sponsor or agency can resolve alone.
Seen this way, surrogate validation is a test of institutional maturity. It asks whether the oncology ecosystem can elevate its practices from convenient, yet uncertain, use of intermediate measures to the rigorous construction of validated surrogates that command universal confidence. Temple’s question endures because it was never fully answered. But it can be transformed into a generative challenge: not who has validated these endpoints, but who will build the structures that make such validation possible. The answer, if we choose to supply it, will mark the difference between an era defined by linguistic compromise and one defined by scientific clarity.