AI in biomedicine, 2026 to 2036

Five and ten years inside the proxy machine

May 18, 2026

0. Pre-conditions: three things that must hold

This essay is about projections, and every projection of this kind smuggles in assumptions. And more than me, it is the assumptions that do the actual work. In writing this essay, I have smuggled in three assumptions, pre-conditions that need to be met. The timelines I outline only hold if they hold. If they fail, what follows may not be slower or harder. It may be a different essay entirely.

My first assumption is that the instruments we use to measure clinical benefit continue to be repaired in plain sight. Tumor response by RECIST in oncology. Depression scales in psychiatry. Six-minute walk distance in cardiology. These are not facts about the world by any means. They are conventions we trained ourselves to agree on, and in the case of RECIST, independent expert readers disagreed on whether a patient had progressed in roughly 30% of more than 13,000 paired assessments in an FDA-led analysis. An AI trained on those endpoints does not transmute their noise; it inherits it, and reproduces it at scale, with more confidence than is warranted. My first precondition is that novel endpoint development becomes regulatory-grade work, with appropriate clinical validation, publishable noise floors, and audit trails, and is not run within firms competing on the same flawed reference frame.

The second assumption is that human data accumulates the way infrastructure accumulates, not the way leftovers accumulate. Longitudinal, multimodal, reuse-consented data on actual patients across actual disease courses is the binding input. It is the thing models still cannot generate on their own. If the field treats it as the residue of clinical trials, never to be shared for secondary use research, the field will keep training on poor proxies and call them “ground truth.” There are no ground truths and no gold standards, only better approximations. If the field treats data liquidity as deliberate infrastructure, with consent frameworks that hold across decades and storage that survives platform turnover, the next ten years are productive. The fork is happening now, and it is not technical. It is contractual and architectural.

My third assumption is that the public trust, which permits clinical research at all, does not collapse during the transition. Trust is older than every tool stacked on top of it, and it is replaceable in the way load-bearing walls are, which is to say, slowly and with care. A trust failure in this decade, whether through a high-profile AI-generated mistake affecting a patient, or through opacity in how synthetic controls and digital endpoints are incorporated into evidence packages, can cost the field a decade of optionality. The precondition is that the people running the most ambitious programs are also the people most attentive to the slow social work of explaining, disclosing, and refusing to oversell.

AI capex is not AI capability. Saying so out loud, repeatedly, while building, is part of the build.

If my three assumptions hold, the timeline that follows is achievable. If they do not, the failure mode is the one we recognize: another decade in which capability claims outrun the scaffolding built to validate them. The argument that follows is conditional on the scaffolding being repaired in time.

I. Where AI stalls in biomedicine

AI is decisively powerful wherever a problem is bounded and the space of possible answers is already densely mapped. This is why it has functionally solved protein structure prediction, why it is compressing molecular design, and why retrieval and summarization in regulatory and pharmacovigilance work are no longer hypothetical. The problem space is finite. The data is dense. The verification loop is short.

A blunter way to put this: most of what we currently call AI in drug discovery is engineering, not biology. Protein structure prediction, molecular design, pattern recognition in pathology and radiology, and retrieval and summarization across regulatory documents are engineering tasks performed on dense priors with short verification loops. They are real work, and they are real progress, and the speed they introduce is genuine. They are not, however, the same activity as understanding why a disease behaves the way it does in a body. Conflating the two is the move that produces the cycle in which capability claims outrun the scaffolding.

AI is strikingly less useful wherever the question is open and the human data is thin, which is most of what determines whether a drug actually works inside a body. The biology of disease, the heterogeneity of response, the interaction between a therapy and a person in motion across years: these are not bounded problems with dense priors. They are sparse, partly observed, and partly unknowable. Models trained on what we have so far hit a ceiling that comes from us, not from them. And the emergent properties of AI can only stretch so far from the training data into new territories of scientific discovery, thus not a reliable hedge.

The strategic consequence of this asymmetry is that value shifts toward the inputs the model cannot supply itself. The argument that follows is an argument about location, and not about pace: AI is fast but the constraints move faster and can be fatal.

II. Discovery in 2031: abundance without advantage

By 2031, candidate molecules, targets, and mechanistic hypotheses will be generated faster than we can plausibly test them. Discovery platforms will be cheaper, broadly available, and increasingly indistinguishable across organizations. The position that depends on owning a discovery engine ages poorly, because abundance usually confers no advantage. You cannot win at a thing everyone has.

This is the first reordering, and it is worth naming clearly, because it inverts a decade of strategic instinct. The companies that bet their next decade on proprietary discovery platforms are betting on a scarcity they themselves will help dissolve. The molecules that emerge will be high-volume, sometimes elegant, occasionally truly novel, and often undifferentiated in origin. The question that will matter in 2031, more than which model produced a candidate, is which organization can test that candidate most informatively in actual patients, on a timeline a competitor cannot match.

III. Clinical development in 2031: where the bottleneck moves

Once hypotheses are cheaper and more ubiquitous, the binding constraint becomes validation. This is the move worth dwelling on, because it is not how the press currently frames the field. AI-discovered molecules in the clinic so far have cleared early-phase safety testing at unremarkable rates and have not been more successful in efficacy trials. Many still fail in humans because the underlying biology of the disease remains poorly defined. The model can write a faster answer to a question we are still asking poorly.

The trial of 2031 may start to look different from the trial of 2026 in ways that compound. AI-enabled patient selection. External and synthetic control arms where regulators permit. Biomarker-rich adaptive designs. Digital endpoints, where regulators permit. Real-world data integrated rather than appended. The organizations that run those trials best, smaller, faster, and more informatively, will define the next decade in the way the organizations that ran the first kinase and checkpoint inhibitor trials defined a previous one.

The point worth naming, because it runs counter to the prevailing framing, is that clinical development becomes more central, not less, as AI matures upstream. The bottleneck does not disappear when discovery becomes abundant. It relocates. Value follows it. And the bottleneck does not arrive in clinical development alone. It arrives, more deeply, in the biology the clinical trials are testing.

IV. Disease biology: the constraint that cannot be owned

Adjacent to the clinical bottleneck is a problem that it makes visible. When AI-discovered molecules fail in humans, the failure is usually not in the molecule, and usually not in the trial. It is that the biology of the disease was incompletely understood when the molecule was designed. The engineering layer of AI in drug discovery, the protein structures and molecular designs that the field has begun to produce at scale, sits above this absent layer. It cannot replace it. The model could not have known what was not in its training data, and the field had not yet captured the biology in a form the model could read.

This is the upstream constraint and it is the one most easily mistaken for a problem AI will solve on its own. By 2031, the binding question across many therapeutic areas will not be “can we generate a candidate” but “do we understand this disease well enough for any candidate to work.” Multimodal data on actual patients, genomics linked to transcriptomics linked to proteomics linked to imaging linked to longitudinal clinical experience linked to the patient’s own account of what changed, is the substrate from which understanding accumulates. It is also, again, the input AI cannot manufacture on its own.

I must point out that a caveat lives inside my argument: even when disease biology is finally understood, the insight may not confer a durable competitive advantage. Fundamental biological breakthroughs are hard to own. The PD-1/PD-L1 checkpoint axis emerged from shared academic work in Kyoto, Houston, and Boston, with the canonical checkpoint inhibitors distributed across multiple sponsors in close sequence (albeit initially concentrated in a single company that happened to sit on the right therapy at the right time following a multiasset acquisition). CRISPR-Cas9 mechanism diffused into thousands of laboratories within a year of the foundational paper, with downstream therapies developed by several entrants in parallel and the underlying biology eventually litigated rather than monopolized. GLP-1 receptor biology was public for decades before the molecules that crystallized it into a commercial category were developed. The foundational breakthrough in protein structure prediction, the most consequential computational result of the last decade, was open-sourced into something close to public infrastructure within a year of its release.

The pattern is consistent. Breakthroughs in biology move the field forward, not single firms within it. The organizations that capture advantage are typically the first to translate the insight into a specific molecule, a specific trial design, or a specific commercial channel, and the advantage they capture is often in the implementation, not in the underlying biology. Speed and luck matter more than understanding, once the understanding exists. This is a bit uncomfortable to write, because it implies that the most ambitious scientific work, the work whose value to patients is largest in absolute terms, is also the work whose value to a particular company is the most contingent on timing, and sometimes luck.

The strategic reading here is the same as the rest of this essay, with one addition:

invest in the biology, because the biology is the binding constraint on whether anyone’s molecule works

I think it is also critical not to assume that investment alone will yield a proprietary position. The position, when it comes, may be at the next layer down: which organization could assemble the multimodal data to see the insight first, which organization could expeditiously design the study and conduct the trial that converts the insight into proof, and which organization already holds the channel into the patient population the insight reorders. By 2036, the insight itself may belong to everyone, at an accelerated pace that will surpass today’s publication cycles in peer-reviewed journals and abstract embargoes at scientific meetings. The implementation, if you were fast or lucky, will be yours for a window.

V. Human data in 2036: what cannot be cloned

If the five-year story is about relocation, the ten-year story is about fidelity.

By 2036, the models will have converged. The performance gap between the best and the second-best discovery model, the best and the second-best regulatory drafter, and the best and the second-best biomarker pattern recognizer will narrow toward a band in which the model itself is no longer the differentiator. What will differ is what each was trained on and what each can continue to be trained on.

Here, I think, is the scarce input of the next decade: high-fidelity human biology. Longitudinal, multimodal, reuse-consented data on actual patients across actual disease courses. Imaging linked to molecular data linked to outcomes linked to patient experience, persisted long enough to support questions that cannot be answered inside a single traditional clinical trial. An organization that has assembled this kind of position cannot be cloned by a competitor with the same access to off-the-shelf models, because the inputs cannot be re-derived from public data or purchased after the fact.

Treating data as residue, rather than as infrastructure, compounds across a decade the same way poor measurement compounds across a single trial. The errors propagate. Models trained on the result magnify them. The magnification reads, to readers and to regulators, as confidence. Models trained on proxies for biology learn the proxies. Models trained on real biology, captured deliberately, learn the biology. The choice between residue and infrastructure has always been available. The window in which making it correctly still confers a durable competitive position closes inside this ten-year frame.

VI. Commercial interaction in 2036: the agent at the door

The commercial and medical interaction layer of the industry, the way a therapy reaches a prescriber and a patient, is being rebuilt around AI agents on a horizon shorter than ten years. Field force economics, medical information delivery, payer engagement, patient support, formulary access: each of these can move toward AI-mediated interaction at scale. A trusted channel, once established, is itself scarce.

The risk inside this transition is the same as the risk in every previous transition. Trust mediated through a new layer is easier to lose than to build. Patients and clinicians who feel routed rather than supported, who feel informed by something that is selling rather than serving, withdraw. The first generation of mediated channels will be remembered by the people who lived through them, in the same terms as the first generation of patient portals: as either the thing that worked, or the thing that proved the field had not yet earned the right to mediate.

VII. Coda: capex is not capability

The capability that defines biopharma in 2036 will not be ownership of the best model, because the models will converge, and because the model itself is an abundant complement to a scarce input. The capability will be ownership of the highest-fidelity view of human biology that any organization has assembled, and the proven ability to turn a prediction made by any reasonable model into regulatory-grade robust proof, into a patient outcome, and into a trust relationship, faster than competitors with the same model access can do the same work.

The tactical move, available to every competitor and conferring no advantage, is to buy AI in order to do today’s work faster. The strategic move, available now and not later, is to take a position in what AI will still depend on in 2036 and cannot supply for itself: deliberately captured human data, evidence generation as a core competence rather than a vendor relationship, and trust as infrastructure rather than as a campaign.

The reason to write this down in 2026, and not at the end of the decade, is that the most consequential decisions on each of these are decisions about contracts, about consent frameworks, about institutional posture, and about which functions are held close and which are released. These decisions have long lead times. The window in which they are still cheap is narrow.

My closing observation belongs to a longer argument: AI is not what finally lets biomedicine outrun its measurement infrastructure. It is the thing that finally makes the measurement infrastructure load-bearing in a way it was never designed to be before. A poor instrument used by hand produces a poor reading once. A poorly instrumented model produces a poor reading at industrial volume, every day, with rising confidence. My preconditions in the preamble are either what the next decade is built on or what many of our hopes and dreams crack against.

If we play our cards right, AI does not save the field. The field saves itself, using AI as one of several instruments, and emerges in 2036 with measurement scaffolding worth the name and a generation of treatments developed on it. If we do not, AI is the loudest decade in a long sequence of decades in which the field promised more than its instruments could verify, and the consequence of that, this time, reaches further than the consequence has reached before.

The shape of 2036 is being written in 2026, in the contracts being signed, the consents being framed, the endpoints being defended in regulatory meetings, and the willingness or unwillingness inside organizations to keep saying out loud that capex is not capability.

The writing is a slow choice and a quiet one. And I believe it is also the only one.