The Hidden Math of Breakthroughs: Why Progress Happens in Bursts — and Why That's Universal

Progress doesn't flow. It lurches.
That intuition — that science and technology advance in sudden leaps separated by long, frustrating plateaus — has been with us since Thomas Kuhn described paradigm shifts in 1962, since evolutionary biologists named the phenomenon "punctuated equilibrium" in the 1970s. But intuitions about why this happens, or whether there are common mathematical rules beneath it, have remained elusive. The history of physics looks nothing like the history of protein biology. Formula 1 racing seems to have nothing to say to data science competitions. And yet, according to a sweeping new study by Yian Yin and Dashun Wang of Northwestern University and Cornell, all of these fields obey the same three laws of breakthrough dynamics — laws so robust that a single minimal model reproduces them all (Yin & Wang, 2026).
The scale of the evidence is striking. The researchers assembled 6.8 million solutions to 6,700 distinct tasks, spanning nine domains: superconducting materials discovery, protein structure determination in structural biology, AI benchmark performance, biomedical research competitions, Kaggle data science contests, TopCoder algorithmic challenges, NP-hard computer science programming competitions, Formula 1 lap records from 41 circuits between 1996 and 2024, and — as a remarkable control — a laboratory experiment where 280 participants built physical wheels in chains of cultural transmission. The diversity was deliberate. If you find the same pattern in superconductors and wheel-building, you've found something real.
The Science
The core question is deceptively simple: when you track a stream of competitive attempts at a task — thousands of AI algorithms attacking a benchmark, thousands of chemists discovering new superconductors — what statistical patterns govern when a new record gets set?
Yin and Wang formalized this precisely. Let be the performance score of the n$-th attempt. The frontier after $n attempts is . An attempt is "record-breaking" if it beats all previous attempts — and the study tracks three things: the waiting time between records (how many tries until the next breakthrough), the cumulative count of records set in the first attempts, and the temporal correlation between past and future record-setting.
Each dataset came with a full chronological sequence of all attempts, a well-defined performance metric, and enough entries to measure statistical distributions meaningfully. The researchers were careful to normalize across domains — comparing not raw scores but the shapes of distributions, rescaled by their own means, to ask whether patterns converge even when raw magnitudes differ wildly.
What They Found
Three regularities emerged with striking consistency. Together, they paint a picture of progress that is neither the smoothly compounding growth of popular imagination nor the completely chaotic churn of pure random chance.
Regularity 1: Waiting times are heavy-tailed.
The time between breakthroughs doesn't follow a bell curve. It follows a power law — a distribution with a "fat tail," meaning extreme waits are far more common than a normal distribution would predict. Specifically, the probability of waiting attempts for the next record decays roughly as with (Yin & Wang, 2026). When the researchers rescaled waiting times by their domain-specific averages, distributions from all nine datasets collapsed onto essentially the same curve.
This matters because it means most of the total effort across a field's history is consumed in a few very long dry spells. Think of a Lévy flight — the physics concept describing motion where the overall distance traveled is dominated by occasional giant jumps, not the accumulation of small steps. Scientific progress, the data suggest, works the same way.
Power-Law Exponent of Waiting Times (γ) Across Domains
The tail exponent γ of the waiting-time distribution P(Wn) ~ Wn^{-γ} for each of the nine domains studied. The model predicts γ ∈ [2, 3], reflecting a transition from incremental- to radical-dominated waiting regimes.
| Label | Value |
|---|---|
| Materials (D1) | 2.1 |
| Structural Biology (D2) | 2.3 |
| AI Benchmarks (D3) | 2.2 |
| Comp. Biomedicine (D4) | 2.4 |
| Kaggle (D5) | 2.5 |
| TopCoder (D6) | 2.3 |
| CS Competitions (D7) | 2.6 |
| Formula 1 (D8) | 2.2 |
Regularity 2: Records accumulate sublinearly — but faster than expected.
Classic record statistics — the math of, say, tracking the tallest person ever measured as you sample a population — predicts that the n$-th record becomes exponentially harder to beat. The cumulative count of records should grow only as $\langle S_n \rangle \sim \ln n: logarithmically slow, meaning the 10,000th breakthrough is barely more likely than the 1,000th. At the other extreme, models from cultural evolution predict a linear accumulation rate, , where each new attempt has a constant probability of being a record.
The data fall squarely between these poles, following — faster than logarithmic, slower than linear (Yin & Wang, 2026). The finding suggests that frontiers are neither "fixed targets getting harder to hit" nor "moving targets refreshed with every attempt." Something in between is happening: new records create fresh opportunities that renew progress, but those opportunities eventually exhaust.
Frontier Record Accumulation: Observed vs. Theoretical Benchmarks
Schematic of how cumulative frontier records ⟨Sn⟩ grow with attempts n. Real data fall between logarithmic (classical record statistics) and linear (cultural evolution) benchmarks, following the intermediate n/ln(n) regime predicted by the p model.
| Label | Value |
|---|---|
| n=10 | 2.3 |
| n=100 | 4.6 |
| n=1000 | 6.9 |
| n=10000 | 9.2 |
| n=100000 | 11.5 |
Regularity 3: Breakthroughs cluster — and that makes the future unpredictable.
The most counterintuitive finding concerns time-dependence. Standard models assume record-breaking events are memoryless: whether a breakthrough arrives today should tell you nothing about whether one arrives tomorrow. Empirically, the opposite holds across all nine domains. There is a systematic positive correlation between (attempts since the last record) and (attempts until the next record). If a field has been stuck for a long time, it's more likely to stay stuck. If it's been breaking records rapidly, more records tend to follow.
This clustering has a deeply consequential mathematical implication. If record events were independent, the variance in how many records accumulate — the spread of outcomes across different instances of the same type of competition or domain — should stay bounded by the mean. Instead, the variance grows as , meaning the standard deviation is as large as the mean itself (Yin & Wang, 2026). Two AI benchmark races that look identical on day one can end up wildly different by day 1,000 — not because of any identifiable difference in their setups, but because of how early fluctuations compound.
Dataset Scale: Solutions Analyzed Per Domain
Number of individual solutions analyzed in each of the nine frontier domains, totaling over 6.8 million. Kaggle dominates by volume; wheel-building provides the smallest but most controlled experimental baseline.
| Label | Value |
|---|---|
| Materials (D1) | 38,576 |
| Structural Biology (D2) | 143,343 |
| AI Benchmarks (D3) | 10,439 |
| Comp. Biomedicine (D4) | 15,529 |
| Kaggle (D5) | 5,792,702 |
| TopCoder (D6) | 188,807 |
| CS Competitions (D7) | 45,914 |
| Formula 1 (D8) | 589,081 |
Why This Changes Things
The empirical patterns are striking enough on their own. But what makes the paper genuinely significant is a model that explains why they arise — and does so with almost eerie parsimony.
Yin and Wang's insight is that existing models all share a hidden assumption: every new attempt is generated by the same mechanism. But in practice, two qualitatively different things happen in any innovative search. Sometimes an innovator makes a radical reset — draws a wholly new solution, essentially sampling a fresh point in the space of possibilities, untethered to whatever has come before. A chemist tries an entirely different family of compounds. An AI team proposes a fundamentally new architecture. Other times, an innovator makes an incremental refinement — takes the current best solution, tweaks one component, and checks whether performance improves.
This radical/incremental distinction appears constantly in the innovation literature under many names: exploration versus exploitation, disruptive versus sustaining technology, paradigm shift versus normal science. But it had never been embedded in a mathematical model of frontier dynamics.
The researchers' "$p$ model" formalizes exactly this. Each new attempt is a radical reset with probability or an incremental refinement with probability . In the incremental mode, the model improves one component at a time, sequencing through components of the solution, each refinement resetting the search to a new component. The model has exactly one free parameter — the balance between radical and incremental — and yet all three empirical regularities fall out analytically.
The key predictions are:
The heavy tail arises because incremental refinement creates "cascades": a breakthrough on one component shifts the search to the next component, making further breakthroughs likely in the near term — which is why records cluster. But once the current frontier is high enough, neither a local refinement nor a radical leap is likely to beat it, producing the long stasis. The intermediate growth rate emerges because each new frontier creates a runway of incremental improvement opportunities that compound before saturating.
What makes this a genuine mathematical result — and not merely a numerical simulation — is that the key predictions are parameter-independent. The parameter cancels out in the leading-order terms. As long as a system has any nonzero fraction of both radical and incremental attempts, it will exhibit these three regularities. This defines what physicists call a universality class: a set of systems that share the same large-scale behavior despite differing in microscopic details. The universality class here is "any search process mixing radical resets with incremental refinements."
That is why superconductor discovery and wheel-building look the same. They are both, at some level of abstraction, competitive searches with two innovation modes.
What's Next
The model also generates testable predictions about how to change the pace of progress — and the researchers checked two of them against data.
The first concerns knowledge openness. In the model, incremental innovations are only possible when participants can see the current frontier solution — you can't refine what you can't see. This means open environments, where the current best solution is publicly shared, should produce faster record accumulation. Yin and Wang tested this in theoretical computer science competitions, comparing periods when the leading solution was disclosed versus not. Open settings showed measurably faster record accumulation. In hybrid competitions that switched from closed to open at a specific date, the growth rate of records visibly accelerated at the transition point (Yin & Wang, 2026).
The second prediction concerns asymmetric advantage for record holders. In the model, whoever currently holds the frontier record has unique ability to perform incremental refinements — they know exactly what the best solution looks like. The model predicts their record-setting rate should scale as , faster than the rate of non-leaders. Checking against Kaggle competition data, the researchers confirmed this asymmetry empirically. Record holders set records at disproportionately higher rates than all other participants combined.
Both findings have direct implications for how science is organized and funded. The consistent finding that openness accelerates frontier growth aligns with decades of advocacy for open science, open-source AI benchmarks, and open data — but now with a mechanistic explanation for why. The asymmetric advantage of record-holders has a less comfortable implication: the current state-of-the-art lab or group is structurally positioned to compound its own advantage, not merely because of resources, but because of fundamental mathematical dynamics.
There are important caveats. The model is deliberately minimal — it captures the skeleton of frontier dynamics, not domain-specific details like funding cycles, team coordination, or the sociology of scientific credit. The nine domains, while diverse, all share the feature of having well-defined performance metrics and sequential attempts. Many important forms of scientific progress — theoretical unification, qualitative conceptual shifts — resist this kind of measurement. And the paper itself notes ongoing challenges in rigorously fitting power-law distributions to empirical data (Clauset et al., 2009 is cited as a methodological reference throughout).
Still, the finding that three quantitative patterns hold across all nine domains — from laboratory wheel-building experiments to Formula 1 circuits to 5.7 million Kaggle submissions — is hard to dismiss as coincidence. It suggests that beneath the domain-specific details of any competitive search for better solutions, there is a deep common structure: long plateaus, clustered bursts, and an unpredictable long run.
The most important implication may be the last one. Yin and Wang's variance result — that fluctuations grow as fast as the square of the mean number of records — means that fields which look identical in their early trajectories can end up vastly different. Some AI benchmark tasks might receive 10 times more breakthroughs than others over a decade, not because one was inherently easier, but because of how early accidents compounded. If that's right, then a great deal of what we interpret as differential talent, investment, or opportunity in the history of science may in fact be amplified luck — a provocative idea that the universality of these dynamics makes harder to dismiss.
Understanding progress means understanding its mathematics. Yin and Wang have given us the first equations that seem to govern it universally — and with them, the possibility of designing research environments that produce more bursts, and shorter plateaus, across every frontier we care about advancing.