Foundation Twins: AI Blueprint for Power Grid

Q: Timescale span

Microseconds to decades — The full range of physical and decision-making processes a power grid digital twin must cover simultaneously

Q: Architecture components

5 specialized models — Foundation Twins requires a Time-Series model, Power System simulator, State Estimator, Optimizer (Actor-Critic RL), and Memory module

Q: Key failure mode identified

Model collapse in VAEs & GANs — Vergara's group found that popular generative AI models fail to reproduce demand peaks and rare events critical for grid safety (Xia et al., 2024)

Q: Research status

Position paper — This is a roadmap, not a proof-of-concept — no trained Foundation Twin model exists yet

Q: Open research problems

5 core unsolved challenges — Including multi-timescale physics encoding, configuration-invariant state representation, and interpretable hierarchical RL

Every time the lights stay on during a summer heat wave — when air conditioners are screaming, solar panels are saturating the grid, and a cloud passes over a field of wind turbines — it is because thousands of engineers and automated systems are making decisions across wildly different timescales simultaneously. A protection relay might trip in microseconds to prevent a fault. A frequency controller adjusts generation in seconds. An energy market clears in fifteen-minute blocks. Long-term grid expansion is planned over decades. These processes don't just coexist; they interact, collide, and sometimes conflict.

For at least a decade, the power industry has dreamed of a "digital twin" — a live, virtual replica of the entire grid that mirrors physical reality in real time and can be used to test decisions before they are made. The vision is compelling: instead of a blackout teaching you what you did wrong, a digital twin shows you in advance. But that vision has remained stubbornly theoretical. The core problem is not computational power or data availability. It is the multi-timescale problem: no single model can faithfully represent phenomena spanning microseconds and decades at the same time without becoming computationally intractable.

Pedro Vergara, a researcher at Delft University of Technology's Intelligent Electrical Power Grids group, argues in a new position paper (Vergara, 2026) that recent advances in AI finally make this problem tractable — if you approach it correctly. His proposal is called Foundation Twins.

Figure 2: Foundation Twins architecture composed of several foundation AI models to enhance simulation capabilities, complemented by an RL architecture for decision-making. The Manager is responsible for coordinating all other models and modules, as well as for interfacing measurements and control actions with the power systems infrastructure. Measurements from the physical power system are fed into the Time-Series Foundation Model at different time resolutions. The Power Systems Foundation Model serves as a simulator, while the State Estimator Foundation Model uses time-series data to estimate the current system state. Based on the current state, the Optimizer Module proposes future actions using an Actor, while expected (accumulated) rewards are estimated via the Critic. Future system states are provided via the Power System Foundation Model. Finally, the Memory Model stores state-action transitions (and their rewards) for future reference by the Optimizer Module. Source: Pedro P. Vergara

Key Facts

Microseconds to decades Timescale span The full range of physical and decision-making processes a power grid digital twin must cover simultaneously

5 specialized models Architecture components Foundation Twins requires a Time-Series model, Power System simulator, State Estimator, Optimizer (Actor-Critic RL), and Memory module

Model collapse in VAEs & GANs Key failure mode identified Vergara's group found that popular generative AI models fail to reproduce demand peaks and rare events critical for grid safety (Xia et al., 2024)

Position paper Research status This is a roadmap, not a proof-of-concept — no trained Foundation Twin model exists yet

5 core unsolved challenges Open research problems Including multi-timescale physics encoding, configuration-invariant state representation, and interpretable hierarchical RL

The Science

This is, by the author's own admission, not a conventional research paper. It is a position paper: a carefully argued blueprint for a research program, not a report of experimental results. Vergara synthesizes work from several fields — foundation AI models, physics-informed machine learning, reinforcement learning (RL), and power systems engineering — and assembles them into a single architectural vision.

The conceptual starting point is the well-known modeling trick that power systems engineers use to handle the timescale problem. The full state of a power system — every voltage, current, rotor angle, tap position — can be split into three groups (Milano, 2010): slow variables $s_{I}$ (things like transformer tap positions or long-term load drift), variables of interest $s_{I I}$ (the dynamics you actually care about right now), and fast variables $s_{I I I}$ (sub-cycle electromagnetic transients). From the perspective of any one process, the other two can often be treated as frozen or averaged away. This is how engineers build separate models for separate phenomena — and why those models have never been properly stitched together.

Figure 1: Approximated power systems processes and their different timescales. Taking into account all power systems processes in a unique model would result in an untractable model. Source: Pedro P. Vergara

Vergara's question is sharper than "can we do better?" It is: how do we build a single model that transfers information, uncertainty, and decisions across all these timescales, without relying on separate physics-based models that exploit dynamics separation? His answer is an architecture of multiple specialized foundation models — each with a clearly bounded task — coordinated by a central Manager and paired with a reinforcement learning system for decision-making.

Foundation models, for readers unfamiliar with the term, are large AI models trained on enormous datasets through self-supervision — meaning the model learns by predicting masked or missing parts of its own training data, without human-labeled examples. The transformer architecture that underlies ChatGPT is the canonical example. What makes foundation models powerful is generalization: a single trained model can handle tasks it was never explicitly trained on. Vergara argues that this generalization property is exactly what power systems digital twins need, but that a single large model is not enough. The grid's physics are too diverse. Instead, he proposes an ensemble of specialized foundation models, each expert in a different aspect of the problem, orchestrated to work together.

What They Found

Because this is a position paper, "findings" here means something different: the paper's contribution is identifying precisely which problems need to be solved, and why current approaches fall short. That diagnostic precision is the intellectual substance.

The proposed Foundation Twins architecture has five main components (Vergara, 2026), illustrated in

1. The Time-Series Foundation Model ingests measurements from across the physical grid — SCADA systems sampling every minute, Phasor Measurement Units (PMUs) sampling 30–60 times per second — and generates coherent time-series data at whatever resolution the rest of the system needs. It also forecasts exogenous variables like solar generation and electricity demand.

2. The Power System Foundation Model is the architectural heart: a high-fidelity AI simulator that predicts future grid states across multiple timescales from a single model. This is where the physics-informed ML challenge is most acute. Current approaches, such as informing physics through penalty terms in the training loss function, work reasonably well for single-timescale problems but break down — either through non-convergence or over-penalization — when you try to span the full range from sub-cycle transients to multi-hour dispatch.

3. The State Estimator Foundation Model takes the time-series data and infers the current state of the grid at whatever timescale the task demands. State estimation — figuring out what the grid is actually doing right now from noisy, incomplete sensor data — is a classical power systems problem. Vergara proposes doing it with a graph neural network (GNN) architecture, since power grids are naturally graph-like (nodes are buses, edges are transmission lines). His group's own PowerFlowNet work (Lin et al., 2024) shows the promise of this approach, while also revealing a stubborn limitation: GNNs still struggle to generalize when the grid's topology changes — when lines are switched in or out, or new generation is added.

4. The Optimizer Module is the decision-making engine. It follows the actor-critic design from reinforcement learning: an Actor proposes actions (dispatch a battery, reconfigure a feeder, shed load), and a Critic evaluates how good those actions are by estimating future accumulated rewards. Crucially, the Critic's estimates are grounded in simulated futures provided by the Power System Foundation Model — this is model-based RL, where the agent learns partly by imagining outcomes before acting.

5. The Memory Module stores state-action-reward transitions for the Actor and Critic to draw on, functioning like experience replay in classical deep RL.

Figure 3: Representation of a hierarchical learning problem with two-level learning tasks: a low-level and a high-level. Here, actions must be taken at different time scales, with low-level actions also impacting the goal of the high-level task. An approach to coordinate both level tasks is to distribute the high-level goal task (r~Δ+1\tilde{r}_{\Delta+1}) with the low-level sub-goals. For instance, enforcing r~Δt+1=∑i3rΔt+i/3\tilde{r}_{\Delta t+1}=\sum^{3}_{i}r_{\Delta t+i/3}. Note also the issue of state synchronization for both level tasks: At time Δt\Delta t, state sΔt≈s~Δts_{\Delta t}\approx\tilde{s}_{\Delta t}. Similarly, at time Δt+1\Delta t+1, state sΔt+1≈s~Δt+1s_{\Delta t+1}\approx\tilde{s}_{\Delta t+1}. In power system applications, an example could be DERs dispatch in a market (high-level task) while accounting for voltage and frequency stability issues (low-level task). Source: Pedro P. Vergara

The paper's most detailed technical section concerns the multi-timescale decision problem in the Optimizer. This falls under hierarchical reinforcement learning — a research area where a long-horizon task is broken into a hierarchy of sub-goals assigned to controllers operating at different speeds. An intuitive power systems example: a day-ahead energy market (high-level task, 15-minute to 1-hour decisions) issues targets that a real-time voltage controller (low-level task, sub-second decisions) must satisfy while preventing local grid instability. In formal terms, the high-level reward $\tilde{r}_{Δ t + 1}$ must be distributed across the low-level sub-goals, for example by enforcing $\tilde{r}_{Δ t + 1} = \sum_{i}^{3} r_{Δ t + i /3}$ . Making this work without the sub-goals becoming incoherent or the hierarchy collapsing is, Vergara states plainly, "a largely unresolved problem in RL."

One finding that stands out as practically important — and somewhat alarming — is the collapse problem in generative AI for energy data. Vergara's group tested powerful generative models including Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs) for producing synthetic time-series energy data (Xia et al., 2024). Model collapse — a failure mode where the generator converges to producing only a narrow slice of the true data distribution — caused these models to miss demand peaks and other rare but high-consequence events. For a digital twin, missing a demand peak is not a minor accuracy issue; it is a safety failure. The paper's EnergyDiff (Lin et al., 2025) and FCPFlow (Xia et al., 2025) models represent the group's own attempts to address this, and a recent unified generative framework (Lin et al., 2026) validates the concept of a single model architecture for multiple generation tasks. But the author is candid that "a model that unifies all these features within a single ML architecture remains missing."

Power System Processes by Timescale

Approximate timescales of key power system physical phenomena and decision-making processes, spanning microseconds to decades. These processes must all be accounted for in a true digital twin.

Power System Processes by Timescale
Label	Value
Electromagnetic transients	0.000001 seconds
Protection systems	0.001 seconds
Frequency control	1 seconds
Voltage control	10 seconds
AGC / Load following	60 seconds
Economic dispatch	900 seconds
Day-ahead market	86,400 seconds
Long-term planning	315,000,000 seconds

Required Capabilities of the Time-Series Foundation Model

Six key capabilities the Time-Series Foundation Model must have, rated by the degree to which current AI research has addressed them (1 = largely unsolved, 3 = partial progress, 5 = well addressed).

Required Capabilities of the Time-Series Foundation Model
Label	Value
Physics-following	2
Multiple energy patterns	3
Extreme value modeling	2
Conditioning features	3
Multi-scale resampling	2
Data imputation	3

Why This Changes Things

The gap that Foundation Twins is designed to close is not merely academic. The energy transition is putting unprecedented stress on the coordination problem. In 2010, most grids were dominated by large, dispatchable generators whose slow dynamics were well-understood. Today, the same grids are absorbing millions of distributed energy resources (DERs) — rooftop solar panels, home batteries, electric vehicle chargers — each small but collectively capable of destabilizing the system in ways that millisecond-scale physics governs but day-ahead planning never anticipated.

Current digital twin implementations address this problem by building separate models for separate timescales and trying to interface them manually. This is both fragile and limited: the interfaces introduce latency and approximations, and decisions made in one model are invisible to another until an explicit handoff occurs. A single operator sitting in a control room cannot see the full picture because the full picture does not exist in any one model.

The Foundation Twins vision collapses this separation. A single orchestrated system would, in principle, allow a market dispatch decision to be automatically stress-tested against sub-second voltage stability simulations before being committed — something that is computationally impossible today. It would allow uncertainty from a weather forecast to propagate coherently from a 72-hour load forecast all the way down to a 100-millisecond frequency response decision.

The physics-informed ML challenge is worth dwelling on because it is where most AI-for-power-systems projects hit a wall. Simply training a neural network on historical grid data produces a model that interpolates well but extrapolates dangerously — it may look accurate for everyday conditions and silently fail for the edge cases (storms, equipment failures, unusual demand patterns) that matter most. Vergara discusses several promising directions: continuum representations of grid physics (Henkes, 2022) that avoid the need for separate ODEs and DAEs; differentiable power flow formulations (Muhammed & Debus, 2026) that allow physics to be embedded directly into the learning objective; and joint-embedded predictive architectures (JEPA, as discussed in Dawid, 2024) for coordinating multi-timescale decisions. None of these is a solved problem for large-scale power systems, but the paper makes a case that they are the right problems to be working on.

The graph neural network limitation is similarly important. Power grids change topology constantly — protection systems operate, maintenance takes lines out of service, new solar farms connect. A state estimator that only works on the specific network configuration it was trained on is not a digital twin; it is a lookup table. Vergara is explicit that GNN-based models, including his own group's PowerFlowNet (Lin et al., 2024), still lack the configuration-invariance needed for real deployment. Solving this is one of the central open research problems identified by the paper.

What's Next

Vergara identifies five concrete open research questions that the Foundation Twins agenda requires answers to:

How do we inform multi-timescale physics into a single ML model without making it untrainable? Choosing which physics to encode at which timescale — and how — is not a solved problem. Too many physics constraints, and the model fails to converge. Too few, and it extrapolates dangerously in edge cases.

How do we represent power system state in a way that is invariant to network configuration? Long-vector representations are too high-dimensional; GNNs generalize poorly to unseen topologies. A hybrid approach combining sequence and graph representations may be promising, but remains untested at scale.

How do we build a Time-Series Foundation Model that handles geographic as well as temporal correlations? Most current work handles time but ignores space — the correlation between solar generation in one region and another, for instance. This omission will degrade forecast quality for geographically distributed decisions.

How do we design a latent representation space for hierarchical RL that remains interpretable? Power system operators need to understand why the system recommended an action. A decision made in an opaque latent space is difficult to audit, certify, or override — a critical issue for safety-critical infrastructure. Vergara's own SAVGO work (Orfanoudakis & Vergara, 2026) provides a starting point, but interpretability remains unresolved.

How do we decompose long-term goals into coherent short-term sub-goals across many timescales? The hierarchical RL problem is open in general, and especially so for power systems where the number of interacting timescales can exceed four or five distinct bands.

This is a paper that matters precisely because it refuses to oversell. Vergara does not claim Foundation Twins exist or are imminent. He claims they are conceivable — and that the path to building them runs through a set of specific, named, tractable research problems. In an era when AI hype frequently outpaces engineering reality, that kind of honest roadmapping is its own contribution.

The stakes are high enough to justify the ambition. Grids that can make multi-timescale decisions coherently are grids that can absorb more renewable energy, prevent more blackouts, and respond more intelligently to the chaotic weather that climate change is already delivering. A digital twin that actually works — that sees the grid whole, from microseconds to decades — would be one of the most consequential pieces of software the energy transition could produce. Foundation Twins is the clearest map yet of what it would take to build it.

Foundation Twins: The AI Blueprint That Could Finally Make Power Grid "Digital Twins" Real

The Science

What They Found

Why This Changes Things

What's Next

Source articles

Comments (0)