
The development of modern artificial intelligence, particularly large neural networks, is accelerating the pace of scientific analysis across domains. But despite the remarkable achievements in pattern recognition, prediction, and simulation, a more foundational opportunity remains underleveraged: the use of AI to generate new scientific abstractions.
This opportunity is epistemic, not operational.
Historically, many of the most consequential scientific breakthroughs have stemmed not from experimental advances alone, but from conceptual reformulations that redefined the objects of inquiry. Claude Shannon’s reframing of communication as a problem of entropy, rather than semantic content, and Warren McCulloch and Walter Pitts’ abstraction of neuronal activity into formal logic circuits were not products of increased data availability or computational power. They were paradigm-setting theoretical shifts—compact abstractions that introduced entirely new ways of reasoning about complex systems. Crucially, these ideas were articulated decades before the technological means existed to operationalize them. It was only when computational capacity eventually caught up to the level of abstraction proposed by McCulloch and Pitts that deep neural networks became a practical reality. Their legacy illustrates a key principle: the right abstraction, even in the absence of immediate application, can anticipate and enable future epochs of discovery.
The transformative potential of AI in science lies not in the refinement of existing procedures, but in its capacity to provoke new modes of abstraction. Yet its current application remains predominantly instrumental—focused on accelerating entrenched workflows, reifying legacy ontologies, and operating within inherited epistemic frameworks. These uses, while operationally valuable, are unlikely to yield fundamental breakthroughs. Scientific revolutions do not arise from faster iterations on known ideas; they emerge when the underlying terms of inquiry are reformulated: when cancer is redefined not as a disease of tissue origin, but as a molecularly stratified set of dysregulated programs across histologies, enabling pan-cancer frameworks; when cell types are reconceived not as fixed anatomical categories but as dynamic transcriptional states across continuous trajectories, as revealed by single-cell atlases; when immune response is modeled not as a binary switch, but as a distributed dynamical system subject to ecological constraints; or when therapeutic response is reimagined not as a fixed RECIST-based classification, but as a probabilistic and temporally resolved continuum embedded in each voxel of DICOM images. In each case, progress results not from scale alone, but from a redefinition of what is being measured, modeled, or explained.
The distinctive promise of AI is its ability to expose latent structure, to generate alternative representations that defy conventional classification, and to challenge the implicit assumptions embedded in our models, labels, and causal narratives. Absent this capacity for abstraction, ambitions to reshape drug discovery or decode biological complexity will remain epistemologically bounded—advancing only within the limits of the very frameworks that require rethinking. Realizing this deeper potential demands a cultural realignment, particularly within academia, where the pursuit of theoretical innovation should be a primary function but is increasingly marginalized by the imperatives of translational proximity and short-horizon utility.
I. From Partial Models to Emergent Systems
Artificial neural networks originated as highly simplified abstractions of biological neurons. McCulloch and Pitts proposed in 1943 that a network of binary threshold units could emulate logical inference. This early formulation ignored the biochemical complexity of actual neurons, focusing instead on a minimal computational principle. Over time, successive layers of approximation—sigmoid activations, backpropagation, distributed representations—accumulated into today’s deep learning architectures. These systems, while still biologically naive, now exhibit sophisticated behavior across tasks involving language, vision, protein structure, and more.
This progression reveals a fundamental asymmetry: AI systems can exhibit intelligent-seeming behavior without being grounded in detailed mechanistic understanding of the phenomena they emulate. We do not understand precisely how LLMs represent concepts, or how transformer architectures encode latent structure. Yet their behavior suggests emergent internal logic, developed through exposure to massive datasets and shaped by architectural constraints.
This situation mirrors our limited understanding of the brain. We observe human cognition without having a unified theory of how meaning, reasoning, and generalization are implemented biologically. In both cases—the brain and the AI model—we are confronted with systems whose internal dynamics elude simple interpretation, despite high-quality outputs.
This epistemic parallel between brain and machine should be seen not as a limitation, but as an opening: AI may offer ways to reason about cognition, biology, and complex systems not by imitating them directly, but by exploring alternative abstractions that yield insight.
II. The Abstraction Gap in Biology
Biology presents a unique challenge to AI systems. Unlike language or vision, biological data are sparse, noisy, context-dependent, and fundamentally multi-scale. Processes span from molecular to organismal levels, and are governed not only by deterministic rules but by stochasticity, redundancy, and evolutionary constraint. Yet the conceptual scaffolding we impose—gene, pathway, disease, biomarker—is often static and reductionist.
Many of these constructs were defined prior to the availability of large-scale molecular data and continue to structure the biomedical literature, clinical trials, and regulatory frameworks. When AI is applied to biological questions—e.g., to predict drug response, classify cancer types, or identify novel targets—it is frequently trained on these legacy abstractions, which may not reflect underlying biological reality.
Consider supervised learning tasks based on the concept of a “responder.” Such models are only as valid as the label itself, which is typically derived from arbitrary thresholds (e.g., RECIST-based size reduction) and applied uniformly across heterogeneous patient populations. Similarly, “biomarkers” are often defined through statistical associations without mechanistic clarity. In both cases, the abstraction being modeled may itself be inadequate.
AI systems trained to replicate these abstractions risk encoding their limitations. But in principle, these same systems can also help identify where such abstractions break down.
III. AI as an Instrument of Conceptual Reframing
The capacity of AI systems to generate high-dimensional representations opens the possibility of using them not to validate existing categories, but to propose new ones. Several distinct mechanisms support this possibility:
Latent representations: Embeddings produced by deep models often reveal structure not encoded explicitly. For example, unsupervised clustering of transcriptomic data can yield groupings that correlate more strongly with clinical outcomes than conventional subtypes. This suggests that models are detecting patterns misaligned with prevailing biological taxonomies—an invitation to reframe those taxonomies.
Generative exploration: Generative models trained on molecular structures or protein sequences can produce novel variants that satisfy multiple constraints (e.g., activity, stability, synthetizability). More importantly, they can also generate candidates that defy human heuristics—revealing design spaces that were previously inaccessible or conceptually incoherent within current frameworks.
Model discrepancies: When models fail in systematic ways—e.g., predicting toxicity where none was expected, or misclassifying inputs that are biologically similar—they may be surfacing mismatches between data structure and label ontology. These failures should not always be treated as engineering errors; they may be epistemic signals.
Counterfactual simulation: Causal inference frameworks and simulation-based models can be used to test alternate hypotheses at scale—e.g., what if a gene is perturbed earlier in development? What if an immune pathway is selectively activated? These computational interventions allow exploration of theoretical landscapes, not just empirical ones.
In all cases, AI is not a replacement for biological reasoning, but a scaffold for abstraction. The value lies not in producing better predictions within current frames, but in suggesting alternate frames that deserve empirical investigation.
IV. Abstraction as a Cultural and Institutional Function
The capacity to generate abstractions is not a technical function—it is a cultural one. And the scientific ecosystem today is poorly optimized for it.
In academia, the pressures of publication, funding, and professional advancement have aligned around predictability and output. Data production and incremental model improvement are favored over speculative, theory-forming work. Papers must be legible within existing paradigms. Interdisciplinary synthesis is often undervalued. Abstraction—when it cannot be reduced to metrics—becomes academically risky.
In parallel, the incentive structures of industry—especially early-stage AI/biotech startups—favor narrow framing and product-oriented goals. Startups must produce data packages that align with regulatory pathways, investor expectations, and pharma-compatible endpoints. While some companies gesture at platform science, most operate within established conceptual bounds, optimizing for tractability rather than epistemic novelty.
This convergence between academia and industry—where both pursue translational proximity, assetization, and immediate applicability—has eroded the distinction between exploratory and exploitative science. As a result, no domain is structurally positioned to generate new abstractions.
This is especially problematic because, unlike startups, academic research has no structural excuse. The academy should be the site of conceptual experimentation, long-horizon theory development, and interdisciplinary synthesis. Yet it often mimics the operational mindset of biotech: milestone-driven, product-like research packaged for grant agencies or institutional investors.
Reclaiming abstraction as a central academic function is not a nostalgic aspiration—it is essential if the scientific system is to remain generative.
V. Toward a Framework for Epistemic AI in Science
To realize the abstraction-forming potential of AI, several shifts are needed:
Epistemic framing
AI projects should begin with clear statements of their epistemic assumptions. What conceptual framework is being inherited? What theoretical alternatives are being considered? What abstractions are being reproduced? For example:
In genomics, when AI is used to predict gene function based on sequence homology or expression correlation, it often inherits the assumption that function is linearly associated with sequence similarity or expression co-variation. This reproduces the assumption of genes as independent units with discrete functions—ignoring epistasis, multi-functionality, or context-dependence. Alternative framings might involve network-based representations or causal graphical models that incorporate structural and environmental dependencies.
In drug discovery, models trained to optimize lead compounds often assume additivity of physicochemical properties—e.g., that potency, selectivity, and solubility can be optimized independently. This reflects a reductionist approach, when in reality these properties are often entangled. An epistemically explicit project might question whether this framing holds in relevant chemical space, and explore generative models that treat molecular design as a constrained optimization across non-orthogonal objectives.
In real-world disease classification, many clinical NLP or EHR-based models inherit ICD coding hierarchies as ground truth. These codes are billing/administrative constructs, not biological entities. Without acknowledging this, models encode the bureaucratic logic of healthcare systems rather than biological patterns of disease. This is fundamentally why real-world evidence has failed to fulfill its promise. An alternative approach might explore symptom-based embeddings or unsupervised phenotypic clusters derived from patient trajectories, offering a reframing of disease as a dynamical system rather than a fixed label.
Model interpretation as theory development
Instead of focusing solely on saliency or attribution, model interpretation should be used to explore what internal representations imply about structure in the domain—e.g., what latent axes correspond to clinically meaningful distinctions?
Examples in this domain include:
In cancer subtyping, unsupervised or semi-supervised deep learning models trained on transcriptomic or multi-omic profiles frequently organize samples along latent dimensions that do not align with histopathological tumor types. For instance, latent embeddings may separate tumors based on immune infiltration or metabolic reprogramming—dimensions that are orthogonal to histology but directly relevant to prognosis and therapy selection. Interpreting these axes reveals a biologically grounded alternative to classical subtyping.
In single-cell analysis, variational autoencoders or manifold learning techniques often produce lower-dimensional embeddings that capture developmental trajectories—e.g., from progenitor to differentiated cell types. Here, latent dimensions may reflect gradients in cell cycle state, lineage commitment, or signaling exposure. Rather than merely attributing variance to known markers, interpreting these axes can reveal regulatory structure and lineage dynamics that redefine what constitutes a “cell type.”
In pharmacogenomics, models predicting drug sensitivity across cell lines or patient-derived models may learn latent features that correspond to mechanisms of resistance—such as upregulation of efflux transporters or epigenetic silencing of apoptotic pathways. If these features emerge without being explicitly annotated, they can suggest novel, interpretable mechanisms that would otherwise remain hidden in traditional feature importance maps.
In clinical risk prediction, transformer models trained on longitudinal EHR data may organize patients along latent dimensions that correlate not with a single disease, but with patterns of multimorbidity or frailty. These embeddings can expose nonlinear, temporally informed patient states, prompting reconsideration of static risk scores in favor of trajectory-based phenotyping.
In each of these cases, internal representations are not merely statistical artifacts—they are structural hypotheses about the domain. Model interpretation, when epistemically motivated, becomes a tool for theory generation, not just for explaining outputs.
Model failure as signal
Systematic errors or inconsistencies should be examined for what they reveal about the limitations of training labels, ontologies, and frameworks. These breakdowns are sites of possible abstraction failure.
For example:
In polygenic risk scoring, predictive performance for complex traits such as schizophrenia or type 2 diabetes often varies dramatically across ancestry groups. While commonly attributed to linkage disequilibrium decay or population structure, this disparity also reveals the fragility of trait definitions and label construction across heterogeneous populations. The model’s failure exposes how GWAS-defined classifications may lack cross-population validity, suggesting the need for ancestry-aware or environment-integrated formulations of genetic risk.
Theorist–engineer collaboration
Teams that pair deep modelers with domain theorists can better recognize when learned structure challenges existing concepts—and how to formalize emerging representations into new testable abstractions.
Conceptual portfolios
Just as companies maintain IP portfolios, scientific institutions should maintain and support conceptual portfolios—diverse, speculative representations that organize thinking beyond current categories and allow experimentalists to test multiple frames in parallel.
Beyond Computation
The promise of AI in science lies not in scale alone, but in reframing the structure of inquiry. It can act as a mirror to expose the assumptions embedded in our labels. It can act as a lens to detect structure we have not yet named. And it can act as a simulator to explore spaces of possibility that challenge our current maps.
But these capacities will remain underutilized if abstraction continues to be viewed as an indulgence rather than a primary scientific function. Academic institutions, in particular, must reassert their role in epistemic innovation—moving beyond the pursuit of immediate relevance, and returning to the construction of ideas.
AI will not solve biology by mimicking it. Nor will it reduce science to prediction. Its real value lies in helping us develop better ways to think—more generative abstractions, more robust theories, and ultimately, a more reflective and boundary-shifting science.
This imperative extends beyond academia. Startups and investors in biomedical AI must also recalibrate their relationship to abstraction. The prevailing emphasis on near-term milestones and familiar ontologies limits the transformative potential of the field. But abstraction need not come at the expense of commercial viability. In fact, it may be the strongest long-term differentiator.
Startups can participate in abstraction by structuring their platforms not only to optimize known tasks, but to identify and formalize novel representations—whether in how disease is defined, how response is measured, or how biological function is encoded. This means building models that not only perform but challenge, and dedicating resources to interpreting unexpected outputs as potential epistemic signals, not noise.
Investors, in turn, must be willing to deploy capital more discriminately—not just into the most visible use cases of the day, but into teams that demonstrate conceptual ambition and interdisciplinary fluency. Supporting abstraction requires greater patience, but the upside is considerable: companies that reshape the way biology is represented—not just processed—will hold structural intellectual property that defines new categories of therapeutics, diagnostics, and disease understanding.
The return on investment in such abstraction-first science may not follow conventional timelines, but it can be market-making, not just margin-improving.
In short: to unlock the full value of AI in biomedicine, we must think differently—not only about biology, but about what it means to do science, to build platforms, and to invest in knowledge itself.
I love the premise of this piece: AI’s ultimate value is not prediction, but provocation—helping us rethink the foundations of science. I like the way this article invites to reverse the value of how we do AI: if a model fails to predict a label, it's not that the model is bad, it may be that the label is based on reductionist categories. One thing that I find a little questionable: yes, we don't understand how our brain works, but we have a sense that we understand the world with the same abstractions that the article is looking for in AI. But we don't organize these abstractions from the inner working of the brain (spike trains or fMRI representations) as the article invites us to find the new abstractions in the clustering of the embeddings... I am not sure that the embeddings have any reality or "truth" to them. At any rate, this article is a wonderful and provocative invitation to rethink AI from a fresh perspective.
Great article Sean. Unless we follow many of your suggestions AI is in danger of just producing a good deal of expensive distracting noise or echos.