
Somewhere inside the billions of training steps that taught a large language model to write poetry or a neural network to detect cancer, there is — quietly, faithfully — Darwinian evolution running. Not as a metaphor. Not as loose inspiration. As the actual mathematics.
That is the central claim of a remarkable paper by Daniel Grimmer, a philosopher and mathematician at Yale, published on arXiv in May 2026. Grimmer proves that several of the most important optimization algorithms in modern machine learning are already — without any modification — scientifically valid simulations of Darwinian evolutionary dynamics. And for the one major algorithm that isn't quite there, he performs what he calls "minor but principled mathematical surgery" to fix it.
The implications run in two directions simultaneously. Machine learning researchers gain a deep biological grounding for tools they already use. Evolutionary biologists gain something they have long lacked: a rigorous, in silico laboratory for running controlled experiments on the fundamental principles of Darwinian adaptation.
The Science
To understand what Grimmer (2026) has done, it helps to understand what he's working against. The field of evolutionary computation has always had a dual mandate — to build good optimization algorithms and to faithfully simulate Darwinian evolution — but these two goals have rarely been pursued together. For decades, the engineering side dominated: researchers borrowed biological vocabulary (populations, fitness, selection) but replaced the actual mathematics of evolution with heuristics borrowed from physics, or simply dressed up existing algorithms in biological clothing. Critics have called this the "metaphor crisis," a proliferation of supposedly novel algorithms that are actually recycled methods under zoological names (Sörensen, 2015; Campelo and Aranha, 2023).
Grimmer's approach, called Darwinian Lineage Simulations (DLS), does the opposite: it starts with evolutionary biology's foundational equations and derives optimization algorithms from them, rather than retrofitting biology onto existing tools.
The starting point is a 100-year-old argument. Ronald Fisher and Sewall Wright — two founding giants of population genetics — held deeply opposing views about how evolution actually works. Fisher (1930) saw evolution as a deterministic process: a large, well-mixed population climbing a fitness landscape driven by selection acting on genetic variance. Wright (1931, 1932) disagreed. He argued that small, isolated sub-populations undergoing random genetic drift — the statistical noise that comes from sampling a finite population — were essential for evolution to escape local fitness peaks and explore new solutions. The two men argued bitterly for decades.
Grimmer proves they were both right, and that their theories are formally equivalent — at least for asexual reproduction. The key insight is that Fisher's deterministically-evolving total population can always be decomposed into Wright's randomly-drifting sub-populations, and those sub-populations can always be reassembled to recover Fisher's dynamics exactly. The magic that makes this work is what Grimmer calls the DLS noise relation: a precise mathematical constraint on what genetic drift must look like for the decomposition to be evolutionarily faithful.
The DLS noise relation links three quantities. Let be the mutation rate, be the population's genotype variance at generation , and be the covariance of genetic drift. The relation states, at leading order:
This is not a modeling choice — it is a constraint that must be satisfied if the simulation is to faithfully represent evolutionary dynamics. The genetic drift in any valid simulation must absorb exactly the difference between what mutations add to the variance and what selection removes. Crucially, as long as this relation is satisfied, the researcher is free to choose any bookkeeping arrangement they like for dividing the total population into sub-populations. That freedom, it turns out, is enormous — and it is precisely what allows DLS to encompass such a wide range of optimization algorithms.
What They Found
The paper's most striking result is what falls out when you examine well-known optimization algorithms through the DLS lens.
Stochastic Gradient Descent (SGD) — the workhorse of machine learning, which updates model parameters by following the gradient of a noisy estimate of the loss function — is already a faithful DLS simulation. Add evolutionarily faithful genetic drift (i.e., noise that satisfies the DLS noise relation) and SGD becomes a scientifically valid in silico experiment on Darwinian evolution.
Natural Gradient Descent — a more sophisticated algorithm that accounts for the geometry of the parameter space, pre-conditioning the gradient by the inverse Fisher information matrix — is also already DLS-compliant. This is not a coincidence: as Grimmer notes, the pre-conditioner in Lande's (1976) equation of quantitative genetics, , has a natural information-geometric interpretation as exactly this kind of natural gradient (Otwinowski et al., 2020). Evolution was doing natural gradient ascent all along.
The Damped Newton's Method — which uses second-order curvature information to take more efficient steps — also fits the DLS framework. In biological terms, the population's variance naturally acts as an anisotropic pre-conditioner: it is the spread of genotypes in the population that determines how far and in which directions selection can move the mean genotype in one generation. Fisher's "key insight," as Grimmer frames it, is that the current genetic variance is the learning rate.
The notable outlier is Adam, currently the most widely-used optimizer in deep learning. Adam combines gradient descent with two adaptive mechanisms: a momentum term that accumulates a running average of past gradients, and an RMSProp term that rescales steps based on the magnitude of recent gradients. Together, these make Adam extraordinarily effective in practice — but Adam's momentum term, as Grimmer shows, violates the DLS noise relation. It introduces correlations across generations that have no counterpart in evolutionary dynamics.
The fix — "Adam-DLS" — involves replacing Adam's additive momentum with a rank-1 extension of the population's variance in the direction of accumulated historical gradient. Geometrically, this elongates the tracked lineage's genotype distribution along the direction it has been moving, much as a population spreading through a fitness valley will stretch in the direction of travel. The update still achieves an Adam-like momentum effect, but now via a biologically meaningful mechanism: the shape of the population's variance, rather than an explicit memory term tacked on from outside.
Grimmer tests Adam-DLS on the Rosenbrock benchmark — a famously difficult optimization problem shaped like a curved, narrow banana-shaped valley — and it passes. Plain noisy gradient descent (which is evolutionarily compliant) cannot solve this benchmark because evolutionary fidelity forces its variance to remain isotropic, making the allowed step size tiny in the high-curvature direction. Adam-DLS, with its anisotropic variance, navigates the ridge successfully. It is, the paper notes, "the first strictly evolutionarily faithful gradient-based model to pass the Rosenbrock benchmark."
Algorithms and Their Evolutionary Compliance
Whether major optimization algorithms satisfy the DLS (Darwinian Lineage Simulation) evolutionary fidelity constraints, as determined by Grimmer (2026). Compliance score: 1 = fully compliant, 0 = non-compliant without modification.
| Label | Value |
|---|---|
| SGD | 1 |
| Natural Gradient Descent | 1 |
| Damped Newton's Method | 1 |
| Adam (standard) | 0 |
| Adam-DLS (surgically repaired) | 1 |
Adam-DLS Rosenbrock Benchmark: Key Hyperparameters
Hyperparameter settings used in the Rosenbrock benchmark comparison between Adam and Adam-DLS (Grimmer, 2026, Fig. 5). Both algorithms use identical settings; Adam-DLS additionally satisfies evolutionary compliance constraints.
| Label | Value |
|---|---|
| Solves Rosenbrock | 1 |
| Evolutionary Compliance | 0 |
| Anisotropic Variance | 1 |
| Momentum Term | 1 |
| RMSProp Scaling | 1 |
| DLS Noise Relation | 0 |
Why This Changes Things
The significance here operates at several levels, and it's worth separating them.
For machine learning researchers, the DLS framework provides something that has been lacking: a principled, biologically-grounded account of why certain optimizers work. The fact that natural gradient descent corresponds to Lande's equation from quantitative genetics is not decorative. It means that the geometrical intuitions behind information-geometric optimization have a direct evolutionary interpretation. The population's variance as a learning rate is not an analogy — it is the same mathematical object, playing the same mathematical role, in both systems.
For evolutionary biologists, the implications may be even more significant. One of the persistent criticisms of computational evolution has been that the best-performing algorithms — the ones you'd actually want to use to study evolution — have abandoned biological fidelity for engineering convenience. Grimmer's work dissolves that trade-off. Because SGD, Natural Gradient Descent, and Newton's method already satisfy the DLS constraints (once given evolutionarily faithful drift), researchers can now use these high-performance algorithms as scientifically valid experimental platforms, knowing that what they observe corresponds to real evolutionary dynamics.
This matters for questions that are genuinely hard to study in living organisms. How does the rate of genetic drift interact with the shape of a fitness landscape to determine the rate of adaptation? What happens when a population encounters a completely flat region — a fitness plateau — with no gradient to follow? Grimmer shows (with a maze-like fitness function in Figure 1) that Fisher's deterministic mass selection can actually solve such mazes without genetic drift, through mutation pressure acting as a deterministic diffusion process. This is a concrete, testable prediction that the DLS framework makes possible.
There is also a deeper conceptual realignment underway here. The persistent association of evolutionary computation with gradient-free optimization — the assumption that you only reach for evolutionary methods when you can't compute gradients — turns out to be a historical accident, not a mathematical truth. Grimmer is blunt about this: "The strict association of evolution with gradient-free methods is therefore an artifact of a particular computational tradition, not a reflection of evolution's own mathematical structure." Fitness gradients are what selection automatically computes. The population's variance is the learning rate. Evolution has always been doing gradient ascent; we just weren't looking at it that way.
This also resolves a long-standing tension in the field between what Grimmer calls the "metaphor crisis" and the genuine scientific mandate of evolutionary computation. Previous bridges between gradient descent and evolutionary dynamics — notably the work of Kucharavy et al. (2023) on Gillespie-Orr Evolutionary Algorithms and Frank's (2025) Force-Metric-Bias framework — either remained limited to basic SGD or provided a taxonomic classification of algorithms without being able to say whether any given algorithm was truly evolutionarily faithful. The DLS framework is both generative and diagnostic: it tells you not just that an algorithm resembles evolution, but whether it is evolution, and if not, exactly what to change.
What's Next
The paper is explicit about its primary contribution being interpretive and foundational rather than algorithmic. Grimmer is not claiming to invent new optimizers. What is new is the revelation — and the proof — that the mathematical skeleton of Darwinian evolution has been quietly present inside tools that the machine learning community built for entirely different reasons.
Several open directions follow naturally. The DLS framework is currently derived for asexual reproduction; extending it to sexual reproduction would require handling genetic recombination, which breaks the clean decomposition into independent sub-populations. That is a substantially harder problem, but it would open up the framework to modeling a far wider range of biological systems.
There is also the question of what the DLS noise relation implies for practical optimizer design. The constraint links the amount of noise in each step to the change in the population's variance. This is not just a biological nicety — it may have practical consequences for optimizer stability and generalization, since it couples exploration (genetic drift) to the optimizer's current state in a principled way. Whether Adam-DLS outperforms vanilla Adam on real deep learning benchmarks remains to be tested, and Grimmer has made an interactive implementation available at GitHub for exactly this purpose.
Perhaps most intriguingly, the DLS framework raises the question of what other algorithms, not yet examined, might already be evolutionarily faithful — and what undiscovered optimizers might fall out of the evolutionary equations if you look in the right places. The paper demonstrates that the fitness landscape of mathematical optimization and the fitness landscape of biological evolution are not just analogous. In a precise, provable sense, they are the same landscape. We have been climbing it all along — we just didn't know whose footsteps we were following.