Works under errors-in-variables with heteroskedastic noise

Finds number of differential & algebraic equations automatically

Each equation corresponds to a physical law

Requires no input on model order or delays

How AI Can Discover Physical Laws from Noisy Data

When engineers model the physical world—from power grids to chemical reactors—they rely on mathematical equations that capture both how systems evolve over time and the constraints that bind them. For decades, those models have been split into two categories: dynamic equations, which describe change, and static equations, which describe balance. But real systems don’t work that way. A distillation column doesn’t just obey differential equations for temperature and flow; it also obeys algebraic equations for mass balance—right now, at every moment. The full picture requires differential-algebraic equations (DAEs), a hybrid class of models that mix both types.

Yet until now, identifying these models from real-world data—especially noisy data—has been a blind spot. Most system identification methods assume either perfect inputs or only dynamic evolution. When both inputs and outputs are measured with error (as they always are), and when the system includes both dynamics and instantaneous constraints, existing tools fail.

Not anymore. In a breakthrough paper, Deepanjhan Das, Vishwesh Ramanathan, and Shankar Narasimhan introduce DISPCA—a new algorithm that can extract the full structure of linear DAE systems directly from noisy measurements, without prior assumptions. It identifies not just that a system has dynamics, but how many differential equations there are, what order each is, how many algebraic constraints exist, and all the coefficients—all while correcting for measurement noise that varies across sensors.

The result? A model that’s not only accurate but physically interpretable. Each identified equation corresponds to a real physical law—conservation of mass, energy balance, circuit laws—rather than a statistical abstraction. This is the difference between a black-box predictor and a scientific explanation.

And it works: in simulations of an RC circuit and a three-tank fluid system, DISPCA recovers the true underlying equations with near-perfect accuracy, even when measurements are corrupted by heteroskedastic noise—meaning different sensors have different noise levels, a common industrial reality.

This isn’t just a technical advance. It’s a step toward autonomous scientific discovery in engineering—where machines don’t just predict, but understand.

The Science

The challenge begins with what we measure. In real-world systems—chemical plants, power networks, biological circuits—sensors are imperfect. Inputs (like flow rates or voltages) and outputs (like temperatures or pressures) are all recorded with error. This is the errors-in-variables (EIV) setting, and it breaks most classical identification methods. Ordinary least squares, for instance, assumes only outputs are noisy; when inputs are also corrupted, it produces biased estimates.

Worse, most EIV methods assume the system is purely dynamic—governed by ordinary differential equations (ODEs). But many systems are descriptor systems, governed by DAEs. These combine:

Differential equations, which describe how variables evolve over time (e.g., how a tank’s fluid level changes with inflow and outflow), and
Algebraic equations, which describe instantaneous constraints (e.g., that the sum of flows into a junction must equal the sum out, at every moment).

The authors’ goal was to identify the full DAE model from noisy time-series data—without knowing in advance how many differential equations there are, what their orders are, or how many algebraic constraints exist.

Their solution, DISPCA (Dynamic Iterative-Sequential Principal Component Analysis), is a three-stage hybrid method:

Iterative error estimation: First, it estimates the measurement error variances for each sensor, assuming the noise is independent and heteroskedastic (i.e., each sensor has its own noise level). This is done iteratively, refining the estimate as the model improves.
Segregation of constraints: Using a scaled principal component analysis (PCA) on lagged data, it separates the total number of linear constraints into differential and algebraic components.
Sequential equation identification: It identifies each differential equation one at a time, using a novel partial stacking of lagged variables, ensuring that each equation is recovered in its minimal, physically meaningful form.

The key innovation is this sequential approach. Instead of trying to identify all equations at once—a problem that suffers from rotation ambiguity (where different linear combinations of equations fit the data equally well)—DISPCA isolates one output variable at a time, identifies its governing difference equation, and moves on. This avoids the need for symbolic manipulation or index reduction, which are computationally expensive and numerically unstable for complex systems.

The method assumes the system is linear and time-invariant, and that inputs are random binary sequences (a common excitation signal in system identification). But critically, it requires no prior specification of model order, delay, or noise levels.

What They Found

The authors tested DISPCA on two simulated systems: a simple RC electrical circuit and a three-tank liquid-level system. In both cases, the algorithm recovered the true model structure with high accuracy.

In the RC circuit (

Fig. 3: Schematic of the simple RC circuit, highlighting the manipulated input voltage UU and the measured variables: voltage VV and differential output XX. Source: Deepanjhan Das, Vishwesh Ramanathan

), the system has one differential variable (the capacitor voltage) and one algebraic variable (the resistor voltage). The true model consists of:

One first-order differential equation: $V (k) + a_{1} V (k - 1) = b_{0} U (k)$
One algebraic equation: $X (k) = U (k) - V (k)$

where $U$ is the input voltage, $V$ is the capacitor voltage, and $X$ is the resistor voltage.

From noisy measurements of $U$ , $V$ , and $X$ , DISPCA correctly identified:

$\overset{n}{^}_{d} = 1$ differential equation
$\overset{n}{^}_{a} = 1$ algebraic equation
The differential equation order $\overset{η}{^}_{1} = 1$
The delay $\hat{D} = 0$

And it estimated the coefficients with remarkable precision. For the differential equation, the true parameters were $a_{1} = - 0.8$ , $b_{0} = 0.2$ . DISPCA estimated $\overset{a}{^}_{1} = - 0.799$ , $\hat{b}_0 = 0.201$—an error of less than 0.5%.

The scree plot from the lagged data matrix (

Fig. 5: Scree plot associated with the scaled and lagged data matrix 𝐙𝐒4,5\mathbf{Z_{S}}_{4,5} of the RC circuit. The red dashed line (y=log⁡(1)=0y=\log(1)=0 line) indicates the threshold for unity eigenvalues used to determine the total linear constraints d^\hat{d}. Source: Deepanjhan Das, Vishwesh Ramanathan

) shows a clear drop in eigenvalues after the first two, indicating two total constraints. A second scree plot on unlagged data (

Fig. 6: Scree plot associated with the scaled, unlagged data matrix 𝐙𝐒4\mathbf{Z_{S}}_{4} of the RC circuit, used to identify the number of algebraic relations n^a\hat{n}_{a}. Source: Deepanjhan Das, Vishwesh Ramanathan

) shows one eigenvalue below the threshold, confirming one algebraic constraint. The difference gives the number of differential constraints.

RC Circuit: True vs Estimated Differential Equation Coefficients

RC Circuit: True vs Estimated Differential Equation Coefficients
Label	Value
True a₁	-0.8
Estimated â₁	-0.799
True b₀	0.2
Estimated b̂₀	0.201

In the three-tank system (

Fig. 7: Schematic of the non-interacting three-tank liquid-level system. The inlet flow rate q(t)q(t) is the manipulated input, while the tank level h3(t)h_{3}(t) and outlet flows (q1(t),q3(t)q_{1}(t),q_{3}(t)) represent the output variables which are measured around the nominal operating point for the current experiment. Source: Deepanjhan Das, Vishwesh Ramanathan

), the complexity increases: three tanks connected in series, with one input (inlet flow $q$) and three outputs (tank level $h_3$ and outlet flows $q_1, q_3$). The system has:

Two differential variables (tank levels $h_1, h_3$)
Two algebraic variables (flows $q_1, q_3$)
Two differential equations of different orders

DISPCA again correctly identified:

$\overset{n}{^}_{d} = 2$ , $\overset{n}{^}_{a} = 2$
Observability indices $\overset{η}{^}_{1} = 1$ , $\overset{η}{^}_{2} = 2$ (i.e., one first-order, one second-order differential equation)
All model coefficients within 1–3% of true values

Three-Tank System: True vs Estimated Observability Indices

Three-Tank System: True vs Estimated Observability Indices
Label	Value
True η₁	1
Estimated η̂₁	1
True η₂	2
Estimated η̂₂	2

The algorithm also estimated the measurement error variances iteratively. Starting from an initial guess, it converged to values within 5% of the true noise levels—even though the noise was heteroskedastic (different for each sensor). This is critical: without accurate noise estimates, EIV methods fail.

Why This Changes Things

The implications of DISPCA extend far beyond circuit diagrams and tank levels. It represents a shift from modeling as calibration to modeling as discovery.

Today, engineers build models by writing down physics-based equations and then fitting parameters to data. That works when the structure is known. But in complex, poorly understood systems—biological networks, climate subsystems, industrial plants with undocumented modifications—the structure itself is unknown.

DISPCA offers a way to reverse-engineer the laws from data. Not just approximate them, but recover them in a form that mirrors physical reality. Each identified equation can be mapped to a conservation law, a constitutive relation, or a kinematic constraint. This is not just predictive power—it’s explanatory power.

Consider chemical process monitoring. A reactor might have hundreds of sensors, but its internal dynamics are governed by a much smaller number of mass and energy balances. DISPCA could sift through noisy sensor data and extract the true differential and algebraic structure—revealing whether a constraint is being violated, or whether a new dynamic mode has emerged due to fouling or catalyst decay.

Or consider power grids. Voltage and current measurements are always noisy. Grid stability depends on both dynamic responses (inertia, damping) and algebraic constraints (Kirchhoff’s laws). DISPCA could help identify reduced-order models that preserve both, enabling better control and fault detection.

Even in machine learning, where black-box models dominate, DISPCA offers a path to hybrid AI—where neural networks handle uncertainty and complexity, but symbolic models provide interpretability and safety. Imagine a deep learning system that uses DISPCA to extract symbolic equations from its own hidden representations, then checks them against physical laws.

The method also solves a long-standing statistical problem: consistent estimation under EIV with heteroskedastic noise. Most EIV methods assume homoskedastic noise (same variance across sensors), but real sensors have different accuracies. DISPCA’s iterative covariance estimation handles this naturally, making it robust to real-world conditions.

And by identifying equations individually, it enables hypothesis testing—you can test whether a coefficient is zero, whether a delay exists, or whether a constraint is satisfied. This is impossible with global models that mix all equations together.

What’s Next

DISPCA is a major step forward, but it’s not the final word. The current method is limited to linear systems. Real-world systems are often nonlinear—think of fluid dynamics, enzyme kinetics, or transistor behavior. Extending this approach to nonlinear DAEs is the next frontier.

The authors note that their method assumes the inputs are random binary sequences. While this is common in system identification experiments, many real systems operate under closed-loop control or natural variability. Adapting DISPCA to handle correlated or feedback-driven inputs would broaden its applicability.

Another open question is scalability. The partial stacking procedure works well for small to medium systems, but for systems with dozens or hundreds of variables, the computational cost could grow quickly. Techniques like sparsity enforcement or distributed computation may be needed.

Finally, there’s the question of causality. DISPCA identifies statistical constraints, but physical laws are causal. Future work could integrate causal discovery methods—like conditional independence tests or intervention analysis—to ensure that the identified equations reflect true cause-effect relationships, not just correlations.

Still, the core idea—sequential, individual equation identification with iterative noise correction—could inspire new approaches across fields. In climate science, could we identify the minimal set of differential and algebraic equations governing ocean-atmosphere coupling? In neuroscience, could we extract the dynamic and static constraints of neural circuits from EEG data?

The dream of automated scientific discovery—machines that don’t just learn patterns but uncover laws—has always been tantalizing. DISPCA doesn’t get us all the way there, but it takes a concrete step: from data to equations, from noise to understanding.

As the authors write, “The identification of individual equations in their respective minimal forms directly recovers [structural invariants], providing a complete and non-redundant characterization of both the differential and algebraic structure.” That’s not just a technical achievement. It’s a vision of machines that don’t just predict the world—but comprehend it.

From Noise to Laws: How AI Can Reverse-Engineer the Physical World