The Three Waves of AI in Biomedicine: From Promise to Proof
How the next inflection point will make adoption irreversible
I firmly believe that artificial intelligence (AI) will fundamentally reshape biomedicine. However, it was clear to many of us that such transformation would not occur on the compressed timetable initially promised by what I call the first wave of AI innovation. During this first wave (approximately 2015–2022), companies capitalized on a surge of enthusiasm, presenting a future dominated by software-driven platforms while the operating reality was deeply rooted in services-dependent models. The bulk of revenue came from capital-intensive efforts, data extraction disguised as diagnostics, manual data curation, and human-driven operations. Investor presentations nevertheless depicted an imminent, high-margin “AI platform” revolution. In truth, prospective clinical validation of these tools was limited, randomized trials were essentially absent, and integrating algorithms into real-world clinical workflows proved far more challenging than optimistic projections suggested.
Since 2023, a second, more pragmatic wave has emerged. Confronted by market realities, founders and investors increasingly acknowledge that evidence generation is essential—though many still stop short of committing to true RCT-like validation and struggle to shed the mindset that they are building software companies rather than regulated medical tools. That identity problem slows the hard work of embedding AI into existing (or clinically meaningful new) workflows.
I anticipate lessons from this second wave will set the stage for a third wave in roughly five years—a true inflection point. In that era, AI systems will be evaluated with the same rigor as any drug, device, or diagnostic assay; integrated into care as a matter of course; and assessed on patient outcomes and cost-effectiveness rather than narrative multiples. It will also be a period of consolidation, absorbing the remnants of failed first- and second-wave firms. Leadership will shift to “bilingual” founders and investors—fluent both in AI’s technical realities and in clinical evidence generation and R&D—who accept that success often entails several years of pre-revenue development followed by prospective, and in most cases randomized, trials.
Wave one (2015–2022): how services masqueraded as platforms
The first wave of AI-in-biomedicine companies was born into one of the most favorable capital markets in recent memory. Venture capital was abundant, crossover investors were active earlier in the funding cycle, and the IPO window was wide open. Ultra-low interest rates and a glut of global liquidity drove investors toward long-duration growth stories, rewarding the promise of future scale over proof of present-day economics. In this environment, anything that could plausibly be framed as a scalable “platform”—especially one powered by AI—commanded attention and premium valuations.
The narrative arc was consistent: assemble proprietary datasets; train models that match or outperform human specialists on carefully curated benchmark tasks; package these models into products spanning decision support, drug discovery tools, real-world evidence generation, and “pharma analytics;” and reinvest the revenue into the data “flywheel” to deepen the moat. Early funding rounds embraced this vision, and for the lucky few that approached the public markets, the story became even more polished.
Underneath, however, the economics told a different story. The vast majority of revenue came not from scalable software, but from labor- and capital-intensive services: wet-lab sequencing and extraction, whole-slide imaging and storage, data science talent-as-service, and manual clinical abstraction, harmonization, and curation across fragmented, idiosyncratic EHR systems. The AI “platform” sat atop this operational scaffolding, producing a smaller, aspirational revenue stream that was easy to narrate but difficult to scale at the speed the market had already priced in.
The macro climate made this misalignment easy to overlook. Between 2015 and 2021, crossover funds and later-stage investors underwrote “optionality”—the belief that today’s services would inevitably evolve into tomorrow’s high-margin software engine—without demanding rigorous prospective evidence. Valuation step-ups between private rounds were treated as validation in their own right. For companies with annual revenue growth and blended gross margins inching up as utilization increased, it was tempting to believe the mix would soon invert: services would feed data into a dominant, high-margin software layer. But medicine does not scale like consumer technology. It demands payment in the form of clinical evidence, regulatory approval (not just 510(k) clearance, and workflow integration—a toll the first wave had not yet paid.
Evidence that wasn’t there: benchmarks, drift, and failure to land in practice
Beneath the momentum of wave one was a conspicuous absence of prospective, real-world proof. The literature from the period reads like a catalogue of technical promise followed by translational disappointment. Models routinely posted high AUCs on internal test sets, only to see performance degrade when deployed elsewhere—a direct consequence of dataset shift, inadequate training practices, and the reality that scanners, staining protocols, patient populations, and care patterns differ across institutions. Large models—often misbranded as “foundation models”—broadened priors through a mix of learning strategies and fine-tuning, which improved robustness on paper. Yet robustness is not clinical utility, and in medicine, utility must be demonstrated prospectively, in patient outcomes, across sites, and over time.
Even when retrospective validation held, moving into practice revealed new fragilities. Early experiments with large language models showed promise in extracting key facts from free-text notes and aligning recommendations with guidelines, but they also produced confidently wrong answers often enough to be unsafe without guardrails. Retrieval-augmented designs that grounded outputs in curated, authoritative sources offered partial mitigation, yet regulatory-grade, prospective evidence in real patients remained rare.
Workflow integration was the second missing pillar. Clinical environments are finely tuned attention economies; anything that adds clicks, screens, or alerts functions as a tax. Without direct embedding into the systems where work already happens—EHRs, order sets, tumor board platforms, radiotherapy planners, documentation tools—even accurate models struggled to gain traction. Data pipelines were equally fragile. Multicenter, longitudinal datasets with linked outcomes were scarce, siloed, or encumbered by protracted access processes.
Biases persisted and, in some cases, deepened. Models trained on skewed datasets, without deliberate countermeasures, risked hardwiring disparities in access, diagnosis, or treatment.
These were the structural reasons why so many promising systems stalled after their first papers. Wave one too often equated “works in the lab” with “works in the world,” and the market allowed that misconception to persist far longer than medicine could tolerate.
Financial physics of the narrative multiple
Wave one’s capital structure was more fragile than its surface narrative implied. Services-heavy businesses can post impressive top-line growth when demand—often fueled by FOMO—is high and capacity is expanding, but they scale linearly at best: every new dollar of revenue requires more people, instruments, and lab space. Software economics are different—fixed costs dominate, and marginal cost per additional customer approaches zero. Wave one sold the latter while operating the former.
The mismatch persisted because the macro environment rewarded it. Valuation step-ups between rounds became self-reinforcing signals—a $500 million Series D valuation suggested a $1 billion IPO was not only plausible but near.
This created a form of deferred gravity. With liquidity abundant, markets could overlook the drag of labor intensity, regulatory friction, and long evidence-generation cycles. When macro conditions turned—rates up, risk capital scarce—the gap between narrative and operating truth could no longer be bridged by fresh capital. Multiples compressed, and in some cases, entire business models collapsed.
Winners and losers: who captured value in wave one
The outcome distribution from wave one followed a familiar innovation-market arc: those closest to the liquidity events and least exposed to operational drag captured the most value; those holding risk after the narrative broke bore the losses.
The winners were, above all, the earliest capital providers and strategic counterparties. Venture and growth funds that marked up valuations in the private markets and exited into IPOs or secondary sales realized returns before the mismatch between narrative and economics became apparent. Founders and early employees who sold equity during these liquidity windows crystallized gains, even if the company’s long-term trajectory later faltered. M&A counterparties who sold service-heavy businesses into wave-one platforms at premium multiples—then hedged or rapidly sold their stock—locked in the benefit of narrative-driven pricing. Underwriters and advisory firms earned transaction fees indexed to deal size, not future performance. Convertible lenders and structured credit providers extracted interest, fees, and optionality without tying returns to long-term market adoption.
The losers were those who entered late or held through the turn. Retail investors and generalist mutual funds bought technology multiples on service-economics businesses, discovering only after guidance resets that the AI “platform” was still a small, aspirational line item. Later-stage employees saw equity value evaporate, often followed by layoffs as companies retrenched. Clinical customers and partners—particularly those in long-integration pilots—watched deployments stall or dissolve when cash preservation took priority over expansion.
This bifurcation reflected the underlying duration risk of betting on rapid software leverage in a sector where evidence and regulatory clearance operate on multi-year timelines. Those who understood that mismatch and positioned themselves to exit before it was resolved did well; those who held on in expectation of an imminent inflection faced the other side of the trade.
Wave Two (2023–Present): maturity, demystification, and the pull of the practical
The market correction after wave one did more than tighten capital—it recalibrated expectations. The mystique of AI in healthcare, once amplified by opaque model architectures and aspirational platform decks, began to fade as large language models (LLMs) matured and entered everyday use. Their broad availability demystified the technology: what once required specialized teams and years of development could now be prototyped with off-the-shelf components. This transparency shifted the core question from “Can AI do it?” to “Is this the best use of AI, and can it survive in the wild?”
From this maturity came a pivot toward the practical. AI in drug discovery, trial optimization, and precision diagnostics retained long-term potential, but these domains remained bound by slow regulatory cycles and the need for high-quality longitudinal data. In contrast, efforts such as ambient clinical documentation—the “scribe in the room”—proved both technically feasible and immediately valuable. In this setting, LLMs could operate within bounded contexts, reduce clinician administrative burden, and generate measurable ROI without the years of evidence-building required for high-stakes diagnostic tools.
This pragmatic turn is defining the second wave. Many companies are targeting problems where the evidence bar is attainable within venture timelines and where integration paths already exist. AI-driven coding assistants, prior-authorization automation, patient-triage chatbots, and speech-to-structured-note pipelines have become the core product set for many companies today. Service-heavy scaffolding persists—data annotation teams, domain-specific prompt engineering, workflow integration consultants—but it is increasingly aligned to use cases that can scale without waiting for randomized-trial-grade validation.
Capital strategy has evolved in parallel. Investors are funding smaller, milestone-tied rounds, with milestones anchored not in speculative TAM projections but in more concrete deployment metrics: active user counts, clinician time saved per encounter, reductions in administrative backlog. The companies that have gained the most traction are those blending off-the-shelf LLM capabilities with proprietary data and deep domain tuning—creating defensibility without overpromising on wholesale AI transformation of biomedicine.
For all its commercial logic, the second wave remains a transitional era. The maturity of LLMs has stripped away much of the mystery, but it has not produced the kind of regulatory-grade, prospective clinical evidence needed for AI that aspires to directly influence patient outcomes. High-stakes diagnostic or treatment-support algorithms are still largely untested in randomized or pragmatic trials. The gap between “AI for the back office” and “AI as medicine” remains wide—and crossing it will require the multi-year R&D programs, prospective studies, and validated integration pathways that—if all goes well—will define the third wave.
Wave three (late 2020s–early 2030s): consolidation and technoclinical bilingualism
The third wave will mark the point where AI in biomedicine is developed, validated, and adopted with the same rigor as drugs, devices, and diagnostics. It will build directly on the trust, technical literacy, and operational discipline established during the second wave—when LLM-powered tools became as routine as EHRs and clinicians stopped asking whether AI worked and started asking how it should be regulated, reimbursed, and integrated into care.
Several forces will converge. Consolidation will transfer data assets, model architectures, and integration tooling from failed or stagnant first- and second-wave companies into the hands of teams with capital and clinical expertise. The talent pool will expand to include “bilingual” leaders fluent in AI systems engineering and the evidentiary demands of biomedicine. Regulatory precedent will be well established, with De Novo and PMA clearances providing templates for study design, labeling, and post-market surveillance. Lower compute and storage costs will free budgets for prospective trials and longitudinal follow-up.
Third-wave companies will be built from inception to sustain multi-year, pre-revenue development. Randomized controlled trials will be part of the product roadmap from day one. Endpoints will extend beyond accuracy to include clinical outcomes and cost-effectiveness. Integration will be deep and invisible: AI embedded natively in tumor board platforms, ICU monitoring suites, PACS viewers, and surgical navigation tools—not as a parallel app, but as part of the clinical fabric.
When this wave crests, “AI in healthcare” will no longer be marketed as AI at all. It will be sold, reimbursed, and regulated as medicine—judged by its ability to improve lives, reduce costs, and expand access. The hype cycles of the past will give way to durable adoption curves, anchored in evidence that meets the same standard as any other intervention touching a patient.
The work to reach that point is not in some distant future. It begins now, in the discipline and focus of the second wave, with the explicit goal of making the third wave inevitable.
Why the third wave is inevitable
The constraints on AI in biomedicine have never been about the math. They have been about will, sequence, and structure. The first wave mistook momentum for validation and branding for integration. The second wave is rebuilding on the slower clock medicine respects: workflow fit and regulatory clarity. The third wave will be the one the future will remember most because the question will have shifted from “Does it work?” to “How much, for whom, and at what cost?” At that point, valuations will no longer depend on narrative premiums. Durable value will rest where it always should: in demonstrable, reproducible benefit to patients.
For founders, the playbook begins with structuring the business so it can sustain the multi-year path to clinical validation. This means being prepared for years of pre-revenue R&D followed by prospective evidence generation, often randomized, in the same way we develop drugs and high-risk medical devices. Evidence generation must be treated as a core operational function, not a marketing event and solutions must integrate into clinician time rather than ask for more of it.
A properly validated AI system will sell itself once it can replace an existing gold standard with the right prospective evidence.
For investors, it means underwriting that same long-arc development—accepting the timelines and capital intensity that come with transformative medical innovation. That includes funding years of pre-revenue R&D, interrogating segment-level unit economics, modeling reimbursement and policy downside, and valuing what exists today before paying for what might exist tomorrow. Importantly, it’s worth remembering that, to date, no software alone has ever fundamentally transformed the practice of medicine—true impact has always required rigorous evidence, integration into clinical workflows, and alignment with regulatory and reimbursement realities.
For systems and regulators, it means smoothing the path to pragmatic and adaptive trials; funding multicenter studies and datasets with governance and representativeness built in; and treating monitoring and bias mitigation as continuous processes rather than static claims.
I think the third wave’s arrival is not a question of if but when. The underlying technologies are already capable; the capital, regulatory precedent, and clinical expertise are converging. Once proof of patient benefit is established at scale, adoption will be irreversible—as inevitable as the shift from film to digital imaging or from open to minimally invasive surgery.
Our opportunity today is to lead with clarity and vision as AI in biomedicine evolves from promise to permanent infrastructure, ensuring that transformation delivers better, more personalized outcomes for every patient it touches.