Radical + incremental innovation

The Hidden Math of Breakthroughs: Why Progress

Progress doesn't flow. It lurches.

That intuition — that science and technology advance in sudden leaps separated by long, frustrating plateaus — has been with us since Thomas Kuhn described paradigm shifts in 1962, since evolutionary biologists named the phenomenon "punctuated equilibrium" in the 1970s. But intuitions about why this happens, or whether there are common mathematical rules beneath it, have remained elusive. The history of physics looks nothing like the history of protein biology. Formula 1 racing seems to have nothing to say to data science competitions. And yet, according to a sweeping new study by Yian Yin and Dashun Wang of Northwestern University and Cornell, all of these fields obey the same three laws of breakthrough dynamics — laws so robust that a single minimal model reproduces them all (Yin & Wang, 2026).

The scale of the evidence is striking. The researchers assembled 6.8 million solutions to 6,700 distinct tasks, spanning nine domains: superconducting materials discovery, protein structure determination in structural biology, AI benchmark performance, biomedical research competitions, Kaggle data science contests, TopCoder algorithmic challenges, NP-hard computer science programming competitions, Formula 1 lap records from 41 circuits between 1996 and 2024, and — as a remarkable control — a laboratory experiment where 280 participants built physical wheels in chains of cultural transmission. The diversity was deliberate. If you find the same pattern in superconductors and wheel-building, you've found something real.

Figure 1: Punctuated dynamics of science and technology frontiers. a-i, Performance dynamics based on randomly selected examples from each of our nine datasets. Each black dot represents an individual solution, plotted by its relative temporal position (n/nmaxn/n_{\max}, x-axis) and performance (xnx_{n}, y-axis) within the sequence of submissions. The performance value xnx_{n} is normalized as the percentile. The red curve highlights the frontier dynamics, showing the evolution of state-of-the-art performance over time. The grey curve indicates the median of xnx_{n} up to a certain time point. Source: Yian Yin, Dashun Wang

The Science

The core question is deceptively simple: when you track a stream of competitive attempts at a task — thousands of AI algorithms attacking a benchmark, thousands of chemists discovering new superconductors — what statistical patterns govern when a new record gets set?

Yin and Wang formalized this precisely. Let $x_{n}$ be the performance score of the $n$-th attempt. The frontier after $n$ attempts is $x_{n}^{*} \equiv max {x_{1}, \dots, x_{n}}$ . An attempt is "record-breaking" if it beats all previous attempts — and the study tracks three things: the waiting time $W_{n}$ between records (how many tries until the next breakthrough), the cumulative count $S_{n}$ of records set in the first $n$ attempts, and the temporal correlation between past and future record-setting.

Each dataset came with a full chronological sequence of all attempts, a well-defined performance metric, and enough entries to measure statistical distributions meaningfully. The researchers were careful to normalize across domains — comparing not raw scores but the shapes of distributions, rescaled by their own means, to ask whether patterns converge even when raw magnitudes differ wildly.

What They Found

Three regularities emerged with striking consistency. Together, they paint a picture of progress that is neither the smoothly compounding growth of popular imagination nor the completely chaotic churn of pure random chance.

Regularity 1: Waiting times are heavy-tailed.

The time between breakthroughs doesn't follow a bell curve. It follows a power law — a distribution with a "fat tail," meaning extreme waits are far more common than a normal distribution would predict. Specifically, the probability of waiting $w$ attempts for the next record decays roughly as $P (W_{n} = w) \sim w^{- γ}$ with $γ \in [2, 3]$ (Yin & Wang, 2026). When the researchers rescaled waiting times by their domain-specific averages, distributions from all nine datasets collapsed onto essentially the same curve.

This matters because it means most of the total effort across a field's history is consumed in a few very long dry spells. Think of a Lévy flight — the physics concept describing motion where the overall distance traveled is dominated by occasional giant jumps, not the accumulation of small steps. Scientific progress, the data suggest, works the same way.

Power-Law Exponent of Waiting Times (γ) Across Domains

The tail exponent γ of the waiting-time distribution P(Wn) ~ Wn^{-γ} for each of the nine domains studied. The model predicts γ ∈ [2, 3], reflecting a transition from incremental- to radical-dominated waiting regimes.

Power-Law Exponent of Waiting Times (γ) Across Domains
Label	Value
Materials (D1)	2.1
Structural Biology (D2)	2.3
AI Benchmarks (D3)	2.2
Comp. Biomedicine (D4)	2.4
Kaggle (D5)	2.5
TopCoder (D6)	2.3
CS Competitions (D7)	2.6
Formula 1 (D8)	2.2

Regularity 2: Records accumulate sublinearly — but faster than expected.

Classic record statistics — the math of, say, tracking the tallest person ever measured as you sample a population — predicts that the $n$-th record becomes exponentially harder to beat. The cumulative count of records should grow only as $\langle S_n \rangle \sim \ln n$ : logarithmically slow, meaning the 10,000th breakthrough is barely more likely than the 1,000th. At the other extreme, models from cultural evolution predict a linear accumulation rate, $⟨ S_{n} ⟩ \sim n$ , where each new attempt has a constant probability of being a record.

The data fall squarely between these poles, following $⟨ S_{n} ⟩ \sim n / ln n$ — faster than logarithmic, slower than linear (Yin & Wang, 2026). The finding suggests that frontiers are neither "fixed targets getting harder to hit" nor "moving targets refreshed with every attempt." Something in between is happening: new records create fresh opportunities that renew progress, but those opportunities eventually exhaust.

Frontier Record Accumulation: Observed vs. Theoretical Benchmarks

Schematic of how cumulative frontier records ⟨Sn⟩ grow with attempts n. Real data fall between logarithmic (classical record statistics) and linear (cultural evolution) benchmarks, following the intermediate n/ln(n) regime predicted by the p model.

Frontier Record Accumulation: Observed vs. Theoretical Benchmarks
Label	Value
n=10	2.3
n=100	4.6
n=1000	6.9
n=10000	9.2
n=100000	11.5

Regularity 3: Breakthroughs cluster — and that makes the future unpredictable.

The most counterintuitive finding concerns time-dependence. Standard models assume record-breaking events are memoryless: whether a breakthrough arrives today should tell you nothing about whether one arrives tomorrow. Empirically, the opposite holds across all nine domains. There is a systematic positive correlation between $Q_{n}$ (attempts since the last record) and $W_{n}$ (attempts until the next record). If a field has been stuck for a long time, it's more likely to stay stuck. If it's been breaking records rapidly, more records tend to follow.

This clustering has a deeply consequential mathematical implication. If record events were independent, the variance in how many records accumulate — the spread of outcomes across different instances of the same type of competition or domain — should stay bounded by the mean. Instead, the variance grows as $⟨ S_{n}^{2} ⟩ - ⟨ S_{n} ⟩^{2} \sim ⟨ S_{n} ⟩^{2}$ , meaning the standard deviation is as large as the mean itself (Yin & Wang, 2026). Two AI benchmark races that look identical on day one can end up wildly different by day 1,000 — not because of any identifiable difference in their setups, but because of how early fluctuations compound.

Dataset Scale: Solutions Analyzed Per Domain

Number of individual solutions analyzed in each of the nine frontier domains, totaling over 6.8 million. Kaggle dominates by volume; wheel-building provides the smallest but most controlled experimental baseline.

Dataset Scale: Solutions Analyzed Per Domain
Label	Value
Materials (D1)	38,576
Structural Biology (D2)	143,343
AI Benchmarks (D3)	10,439
Comp. Biomedicine (D4)	15,529
Kaggle (D5)	5,792,702
TopCoder (D6)	188,807
CS Competitions (D7)	45,914
Formula 1 (D8)	589,081

Figure 2: Quantifying the punctuated frontier dynamics.
a, Key quantities that characterize the frontier dynamics. SnS_{n} denotes the number of record-breaking events in the first nn attempts. Wn≡NSn+1−nW_{n}\equiv N_{S_{n}+1}-n measures the waiting time until the next frontier, NSn+1N_{S_{n}+1}, while Qn≡n−NSnQ_{n}\equiv n-N_{S_{n}} measures the time elapsed since the last frontier NSnN_{S_{n}}.
b, The waiting time features a fat-tailed distribution, approximately following a power-law tail P(Wn)∼Wn−γP(W_{n})\sim W_{n}^{-\gamma}. Dashed lines represent a power law tail with exponent −2-2, as a guide to the eye.
c, The growth of new frontiers SnS_{n} in real data feature a sublinear growth rate, lying in between the logarithmic and linear growth predicted by existing models.
d, We observe a consistent positive correlation between WnW_{n} and QnQ_{n} across all domains, suggesting recent record-breaking events are predictive of near-term occurrences.
e, The growth in variance ⟨Sn2⟩−⟨Sn⟩2>>⟨Sn⟩\langle S_{n}^{2}\rangle-\langle S_{n}\rangle^{2}>>\langle S_{n}\rangle is systematically higher than expected, highlighting the long-term unpredictability for the number of record-breaking activities in these systems. — Figure 2: Quantifying the punctuated frontier dynamics. a, Key quantities that characterize the frontier dynamics. SnS_{n} denotes the number of record-breaking events in the first nn attempts. Wn≡NSn+1−nW_{n}\equiv N_{S_{n}+1}-n measures the waiting time until the next frontier, NSn+1N_{S_{n}+1}, while Qn≡n−NSnQ_{n}\equiv n-N_{S_{n}} measures the time elapsed since the last frontier NSnN_{S_{n}}. b, The waiting time features a fat-tailed distribution, approximately following a power-law tail P(Wn)∼Wn−γP(W_{n})\sim W_{n}^{-\gamma}. Dashed lines represent a power law tail with exponent −2-2, as a guide to the eye. c, The growth of new frontiers SnS_{n} in real data feature a sublinear growth rate, lying in between the logarithmic and linear growth predicted by existing models. d, We observe a consistent positive correlation between WnW_{n} and QnQ_{n} across all domains, suggesting recent record-breaking events are predictive of near-term occurrences. e, The growth in variance ⟨Sn2⟩−⟨Sn⟩2>>⟨Sn⟩\langle S_{n}^{2}\rangle-\langle S_{n}\rangle^{2}>>\langle S_{n}\rangle is systematically higher than expected, highlighting the long-term unpredictability for the number of record-breaking activities in these systems. Source: Yian Yin, Dashun Wang

Why This Changes Things

The empirical patterns are striking enough on their own. But what makes the paper genuinely significant is a model that explains why they arise — and does so with almost eerie parsimony.

Yin and Wang's insight is that existing models all share a hidden assumption: every new attempt is generated by the same mechanism. But in practice, two qualitatively different things happen in any innovative search. Sometimes an innovator makes a radical reset — draws a wholly new solution, essentially sampling a fresh point in the space of possibilities, untethered to whatever has come before. A chemist tries an entirely different family of compounds. An AI team proposes a fundamentally new architecture. Other times, an innovator makes an incremental refinement — takes the current best solution, tweaks one component, and checks whether performance improves.

This radical/incremental distinction appears constantly in the innovation literature under many names: exploration versus exploitation, disruptive versus sustaining technology, paradigm shift versus normal science. But it had never been embedded in a mathematical model of frontier dynamics.

Figure 3: Modeling incremental and radical innovations. a, Each solution is viewed as a combination of components, where each component is associated with a score. The overall performance of the solution is a weighted sum of these components. To propose a new solution, an innovator can take either (i) a radical approach with probability prp_{r} or (ii) an incremental approach with probability pi=1−prp_{i}=1-p_{r}. In radical innovations, the innovator chooses to draw new random scores for all components, independent of previous versions. While in incremental innovations, the innovator focuses on one component each time, replacing this component with a random draw while keeping all other components unchanged. b-d, illustrative examples of the model with pi=p_{i}= 0 (b), 0.5 (c), and 1 (d), respectively. Green arrows represent new random versions of a component. Attempts highlighted with boldface scores and arrows represent new frontier solutions. Source: Yian Yin, Dashun Wang

The researchers' "$p$ model" formalizes exactly this. Each new attempt is a radical reset with probability $p_{r}$ or an incremental refinement with probability $p_{i} = 1 - p_{r}$ . In the incremental mode, the model improves one component at a time, sequencing through components of the solution, each refinement resetting the search to a new component. The model has exactly one free parameter — the balance between radical and incremental — and yet all three empirical regularities fall out analytically.

The key predictions are:

$P (W_{n} = w) \sim {w^{- 2}, w^{- 3}, w \sim O (ln n) w \sim O (n)$

$⟨ S_{n} ⟩ \sim \frac{n}{ln n}$

$⟨ S_{n}^{2} ⟩ - ⟨ S_{n} ⟩^{2} \sim \frac{n ^{2}}{( ln n ) ^{3}}$

The heavy tail arises because incremental refinement creates "cascades": a breakthrough on one component shifts the search to the next component, making further breakthroughs likely in the near term — which is why records cluster. But once the current frontier is high enough, neither a local refinement nor a radical leap is likely to beat it, producing the long stasis. The intermediate growth rate $n / ln n$ emerges because each new frontier creates a runway of incremental improvement opportunities that compound before saturating.

What makes this a genuine mathematical result — and not merely a numerical simulation — is that the key predictions are parameter-independent. The $p_{r}$ parameter cancels out in the leading-order terms. As long as a system has any nonzero fraction of both radical and incremental attempts, it will exhibit these three regularities. This defines what physicists call a universality class: a set of systems that share the same large-scale behavior despite differing in microscopic details. The universality class here is "any search process mixing radical resets with incremental refinements."

Figure 4: Predictions of the pp model. Despite its simplicity, the model systematically recovers all empirical patterns documented in Fig. 2. a-d, Same as Fig. 2b-e, but using model simulations with pi=p_{i}= 0.1 (Row a), 0.5 (Row b), and 0.9 (Row c), respectively. Source: Yian Yin, Dashun Wang

That is why superconductor discovery and wheel-building look the same. They are both, at some level of abstraction, competitive searches with two innovation modes.

What's Next

The model also generates testable predictions about how to change the pace of progress — and the researchers checked two of them against data.

The first concerns knowledge openness. In the model, incremental innovations are only possible when participants can see the current frontier solution — you can't refine what you can't see. This means open environments, where the current best solution is publicly shared, should produce faster record accumulation. Yin and Wang tested this in theoretical computer science competitions, comparing periods when the leading solution was disclosed versus not. Open settings showed measurably faster record accumulation. In hybrid competitions that switched from closed to open at a specific date, the growth rate of records visibly accelerated at the transition point (Yin & Wang, 2026).

Figure 5: Testing additional model predictions on empirical data. a, In theoretical computer science competitions, fully open settings show a faster rate of record accumulation than non-disclosure intervals. b, In hybrid competitions, record accumulation accelerates after the transition from the closed phase to the open phase at the end of day 2.
c, In data-science competitions, current record holders maintain substantially higher record-setting rates (sn∼1/ln⁡ns_{n}\sim 1/\ln n) than other participants (sn∼1/ns_{n}\sim 1/n). — Figure 5: Testing additional model predictions on empirical data. a, In theoretical computer science competitions, fully open settings show a faster rate of record accumulation than non-disclosure intervals. b, In hybrid competitions, record accumulation accelerates after the transition from the closed phase to the open phase at the end of day 2. c, In data-science competitions, current record holders maintain substantially higher record-setting rates (sn∼1/ln⁡ns_{n}\sim 1/\ln n) than other participants (sn∼1/ns_{n}\sim 1/n). Source: Yian Yin, Dashun Wang

The second prediction concerns asymmetric advantage for record holders. In the model, whoever currently holds the frontier record has unique ability to perform incremental refinements — they know exactly what the best solution looks like. The model predicts their record-setting rate should scale as $s_{n} \sim 1/ ln n$ , faster than the $1/ n$ rate of non-leaders. Checking against Kaggle competition data, the researchers confirmed this asymmetry empirically. Record holders set records at disproportionately higher rates than all other participants combined.

Both findings have direct implications for how science is organized and funded. The consistent finding that openness accelerates frontier growth aligns with decades of advocacy for open science, open-source AI benchmarks, and open data — but now with a mechanistic explanation for why. The asymmetric advantage of record-holders has a less comfortable implication: the current state-of-the-art lab or group is structurally positioned to compound its own advantage, not merely because of resources, but because of fundamental mathematical dynamics.

There are important caveats. The model is deliberately minimal — it captures the skeleton of frontier dynamics, not domain-specific details like funding cycles, team coordination, or the sociology of scientific credit. The nine domains, while diverse, all share the feature of having well-defined performance metrics and sequential attempts. Many important forms of scientific progress — theoretical unification, qualitative conceptual shifts — resist this kind of measurement. And the paper itself notes ongoing challenges in rigorously fitting power-law distributions to empirical data (Clauset et al., 2009 is cited as a methodological reference throughout).

Still, the finding that three quantitative patterns hold across all nine domains — from laboratory wheel-building experiments to Formula 1 circuits to 5.7 million Kaggle submissions — is hard to dismiss as coincidence. It suggests that beneath the domain-specific details of any competitive search for better solutions, there is a deep common structure: long plateaus, clustered bursts, and an unpredictable long run.

The most important implication may be the last one. Yin and Wang's variance result — that fluctuations grow as fast as the square of the mean number of records — means that fields which look identical in their early trajectories can end up vastly different. Some AI benchmark tasks might receive 10 times more breakthroughs than others over a decade, not because one was inherently easier, but because of how early accidents compounded. If that's right, then a great deal of what we interpret as differential talent, investment, or opportunity in the history of science may in fact be amplified luck — a provocative idea that the universality of these dynamics makes harder to dismiss.

Understanding progress means understanding its mathematics. Yin and Wang have given us the first equations that seem to govern it universally — and with them, the possibility of designing research environments that produce more bursts, and shorter plateaus, across every frontier we care about advancing.

The Hidden Math of Breakthroughs: Why Progress Happens in Bursts — and Why That's Universal

The Science

What They Found

Why This Changes Things

What's Next