Steps to derive the Kalman filter from LQR

2 — One homogeneous-coordinates embedding, one matrix partition

Extra state dimensions added by the trick

1 — A single scalar α(t) = 1 is appended to the state vector

DRE sub-equations after partitioning

3 — One for P (Riccati), one for p₁ (observer gain), one for p₀ (cost scalar)

Years since Kalman's original paper

65+ — Original stochastic formulation published 1960–1961

Uncertainty parameters minimized in deterministic filter

3 — Process noise w, measurement noise v, and initial state xᵢ

Kalman Filter Derived from LQR in Two Steps

The Kalman filter is one of the most successful algorithms in the history of engineering. It guided Apollo astronauts to the Moon, it stabilizes the camera in your phone, it reconstructs positions from GPS signals, and it sits inside every modern autopilot and self-driving car. It was introduced in 1960 by Rudolf Kálmán in a paper so compact and assured that it reads almost like a revelation. Yet for sixty-five years, students encountering it for the first time have faced the same stumbling block: the Kalman filter is traditionally derived from probability theory, while the closely related Linear Quadratic Regulator (LQR) — a method for designing optimal controllers — is derived from optimal control. The two share equations that look almost identical, a symmetry so conspicuous it has its own name, the "duality" between estimation and control. But the textbook proof that makes this duality rigorous has always felt like a detour through stochastic calculus, co-variance matrices, and Gaussian assumptions.

Bassam Bamieh, a control theorist at UC Santa Barbara, argues in a 2026 tutorial that the detour is unnecessary (Bamieh, 2026). The Kalman filter is not merely analogous to the LQR. It is an LQR problem in disguise. Reaching it requires exactly two steps — one trick borrowed from computer graphics, and one act of careful matrix bookkeeping — and neither step requires a single random variable.

The Science

The paper is a tutorial, which means its contribution is not a new theorem but a new route through well-worn terrain. The value lies in clarity of argument and in what the argument illuminates along the way.

The starting point is the standard LQR problem. In plain terms: you have a system whose state evolves over time according to linear differential equations, and you want to choose an input signal $u (t)$ that minimizes a cost $J$ — a weighted sum of how large the state gets and how much input energy you spend. Mathematically, this is

$J (u, x_{i}) = \int_{0}^{T} (x^{*} Q x + u^{*} R u) d t + ∥ x (T) ∥_{X}^{2}$

where $Q$ and $R$ are weight matrices chosen by the designer, and the final term penalizes the state at the end of the horizon. The solution comes from the Differential Riccati Equation (DRE) — a matrix-valued ordinary differential equation that propagates backward in time from $T$ to $0$ , producing a feedback gain that tells you the optimal input at every moment. This is $LQ R_{i}$ , the LQR with initial conditions specified.

Bamieh also carefully treats a less-common twin: $LQ R_{f}$ , the LQR with final conditions specified. Here the target is where the system ends up at time $T$ , not where it starts, and the DRE runs forward in time instead of backward. Its optimal value is called the cost-to-arrive function, a concept familiar in the Moving Horizon Estimation literature but not always connected cleanly to the filtering problem. This forward-running DRE is the natural home of the Kalman filter, and making that connection explicit is the paper's central achievement.

The estimation problem itself is posed in a deliberately non-probabilistic way, following arguments made by the late Jan Willems, one of the most philosophically careful figures in control theory (Willems, 2002; Willems, 2004). The system model is:

$\overset{x}{˙} (t) = A x (t) + w (t), y (t) = C x (t) + v (t)$

where $y (t)$ is the only signal you can actually measure. Everything else — the process disturbance $w (t)$ , the measurement noise $v (t)$ , and the initial state $x_{i}$ — is unknown. Given one observed signal $y_{[0, T]}$ , there are infinitely many triples $(w_{[0, T]}, v_{[0, T]}, x_{i})$ that could have produced it. Which one should you believe?

The answer Bamieh adopts is an engineering version of Occam's razor: choose the triple that is smallest. Formally, minimize the weighted $L^{2}$ cost

$J (v, w, x_{i}) := ∥ v_{[0, T]} ∥_{V}^{2} + ∥ w_{[0, T]} ∥_{W}^{2} + ∥ x_{i} ∥_{X}^{2}$

subject to the system equations. The weight matrices $V$ , $W$ , $X$ encode how much you trust your sensors versus your model versus your prior on the initial state. This is not a probabilistic statement. It is a least-squares statement — the same logical foundation Gauss used when he justified Legendre's method for computing planetary orbits.

What They Found

The problem as stated is not a standard LQR problem. When the measurement residual $(y - C x)$ is expanded, it produces terms that are quadratic, linear, and constant in the state $\overset{x}{^}$ :

$(y - C \overset{x}{^})^{*} V (y - C \overset{x}{^}) = quadratic \overset{x}{^}^{*} C^{*} V C \overset{x}{^} - linear y^{*} V C \overset{x}{^} - \overset{x}{^}^{*} C^{*} V y + constant y^{*} V y$

Bamieh calls such a cost affine-quadratic. The LQR machinery, as standardly stated, only handles purely quadratic costs. The linear and constant terms — which carry all the information from the actual measurement $y$ — seem to break the framework.

Step one is the fix. The paper introduces homogeneous coordinates, a technique anyone who has taken a computer graphics course will recognize. The idea is disarmingly simple: append a new scalar state variable $α (t)$ to the system, enforce that $α (t) = 1$ for all $t$ , and use this constant "1" to absorb the linear and constant terms. The enlarged state vector is

$ξ := [x α], \hat{A} := [A 0 00], \hat{H} := [H h_{1}^{*} h_{1} h_{0}]$

With this embedding, the affine-quadratic cost becomes $\int_{0}^{T} (ξ^{*} \hat{H} ξ + u^{*} R u) d t$ — purely quadratic in the enlarged state $ξ$ . The problem is now a standard LQR, just one dimension bigger. Crucially, the homogenized problem has a non-standard constraint: either the initial or final value of $α$ must equal $1$ , which selects whether to use $LQ R_{i}$ or $LQ R_{f}$ .

A consequence of this step deserves to be highlighted. The paper shows that for purely quadratic LQR problems, the optimal controller is memoryless — just a matrix gain applied to the current state. But for affine-quadratic problems, the optimal solution includes a dynamical system. The reason now becomes transparent: the homogeneous coordinate $α = 1$ must be tracked through time by an internal model, and that internal model is the observer dynamics. In the LQ-tracking (servomechanism) problem, this manifests as an anti-causal feedforward term. In the estimation problem, the same mathematics produces a causal, forward-running observer. Same algebra. Different direction of time.

Step two is the payoff. Having solved the enlarged LQR problem with the DRE for $\hat{P}$ , partition that solution conformably with the block structure of $\hat{A}$ :

$\hat{P} = [P p_{1}^{*} p_{1} p_{0}]$

Substituting this partition into the DRE for the enlarged system and expanding block by block yields three coupled equations — one for the $n \times n$ block $P$ , one for the $n$-vector $p_1$ , and one for the scalar $p_{0}$ . The equation for $P$ is precisely the Kalman filter's Riccati equation. The equation for $p_{1}$ is the differential equation governing the observer's correction term (the role played by the Kalman gain). The equation for $p_{0}$ evolves the minimum achievable cost and is typically discarded in practice. The optimal state estimate $\overset{x}{^}$ emerges as the $x$-component of the optimal $\xi$ trajectory, and the estimate's dynamics are exactly the familiar Kalman observer:

$\dot{\overset{x}{^}} = A \overset{x}{^} + L (y - C \overset{x}{^})$

where $L$ is determined by the Riccati solution. The Kalman gain is not postulated here — it is derived as the minimizer of a deterministic least-squares problem.

Two LQR Problem Variants and Their DRE Boundary Conditions

Structural comparison of the two LQR formulations introduced in the paper: LQR with initial conditions (LQRᵢ) runs its Riccati equation backward from a final-state penalty, while LQR with final conditions (LQRf) runs forward from an initial-state penalty. The Kalman filter emerges from LQRf.

Two LQR Problem Variants and Their DRE Boundary Conditions
Label	Value
LQRᵢ: DRE direction (−1 = backward)	-1
LQRf: DRE direction (+1 = forward)	1
LQRᵢ: Boundary condition at t = T	1
LQRf: Boundary condition at t = 0	1
LQRᵢ: Value function = cost-to-go	1
LQRf: Value function = cost-to-arrive	1

The paper also recovers the connection between deterministic and stochastic formulations (Bamieh, 2026, Sec. 4.1). The weight matrices $V$ , $W$ , $X$ in the deterministic problem correspond to the inverses of the noise covariance matrices in the stochastic Kalman filter: $V = R_{v}^{- 1}$ , $W = R_{w}^{- 1}$ , $X = P_{0}^{- 1}$ . The two formulations are mathematically equivalent; choosing between them is, as Willems put it, "a matter of taste rather than logical necessity."

Homogeneous-Coordinate Embedding: State Space Growth

The homogeneous-coordinates trick adds exactly one dimension to the state vector and one off-diagonal block to each system matrix, converting an affine-quadratic problem to a purely quadratic LQR. This chart shows the relative sizes of matrices before and after embedding for an n-dimensional system.

Homogeneous-Coordinate Embedding: State Space Growth
Label	Value
State vector dimension	4
A-matrix size (n×n → (n+1)×(n+1))	16
Cost matrix H size	16
Number of DRE sub-equations	1
Optimal control complexity (gain blocks)	1

Why This Changes Things

The significance here is partly pedagogical and partly philosophical, but pedagogy and philosophy have a way of shaping what engineers actually build.

Textbooks on control systems typically present the Kalman filter and the LQR in adjacent chapters, note the "remarkable duality" between them, and then derive each separately using its own set of assumptions. Students emerge understanding that the two are related but not why they must be. The standard stochastic derivation requires a machinery of conditional expectations and Gaussian distributions that, while beautiful, occludes the structural reason for the similarity: both problems minimize a quadratic cost over the trajectories of a linear system. One looks forward in time; the other looks backward. That is almost the entire difference.

By showing the two-step path, Bamieh makes several things visible that were previously hidden. First, it clarifies why estimation requires a dynamical observer while control with a purely quadratic cost uses only a static gain. The answer is homogeneous coordinates: when your cost function has a linear term (as it does in estimation, because the measurement $y$ enters linearly), the optimal solution must carry an internal memory of that term. Second, it reveals that the $LQ R_{f}$ problem — LQR with final conditions, which most textbooks treat as a footnote — is the natural home of the filtering problem. The cost-to-arrive function, running forward from time $0$ to time $T$ , is the filtering Riccati equation. Third, the construction generalizes cleanly: the LQ-tracking (servomechanism) problem, where a controller must track a reference signal, emerges from exactly the same homogeneous-coordinate embedding applied to $LQ R_{i}$ rather than $LQ R_{f}$ .

The deterministic framing also has practical implications for engineers who work in settings where "noise is Gaussian" is a hard assumption to defend — process control, structural monitoring, biological systems. The least-squares interpretation of the Kalman filter says: this is the state trajectory that requires the least amount of disturbance to explain your data. That statement needs no probability distribution to be meaningful or useful.

What's Next

The tutorial as published is explicitly introductory. Bamieh assumes the reader is familiar with the standard LQR and its DRE solution, and the derivation is carried through for finite-horizon, possibly time-varying systems. Several natural extensions are left as exercises in the literature rather than developed in the paper itself.

The infinite-horizon (steady-state) version of the Kalman filter — the form used in most real implementations, where the Riccati equation has been solved offline and the gain is constant — follows from taking $T \to \infty$ . The paper does not work through this limit explicitly, though the machinery for it is present. Similarly, the discrete-time Kalman filter, which is what actually runs on digital hardware, requires a parallel argument using difference equations rather than differential ones; the homogeneous-coordinates trick applies equally well there.

More consequentially, the framework opens a natural path toward Moving Horizon Estimation (MHE), a modern technique that solves a rolling-window optimization problem over recent measurements rather than running a fixed recursive filter. MHE is increasingly used in nonlinear systems precisely because it allows inequality constraints — physical bounds on states, actuator limits — to be incorporated directly. The cost-to-arrive function that appears in $LQ R_{f}$ is the same object that MHE uses as a terminal cost to summarize information from outside the current window. Understanding it as an LQR solution, rather than as an opaque penalty term, may simplify the design and analysis of MHE schemes.

There is also a deeper mathematical resonance here. The homogeneous-coordinates embedding Bamieh describes is an instance of a broader principle in geometric control theory: many apparently distinct control problems become structurally equivalent when viewed in an appropriately enlarged space. The same idea underlies the connection between port-Hamiltonian systems and variational mechanics, and between $H_{\infty}$ robust control and a certain deterministic minimax game. Each of these unifications, when worked out carefully, tends to produce not just a cleaner proof but a cleaner intuition — the kind of intuition that eventually produces better algorithms.

For now, the immediate value of this paper is for anyone who teaches or learns optimal estimation. The Kalman filter has sometimes been described as so powerful it seems almost magical — a recursive formula that assembles optimal state estimates from noisy data in real time, with no direct access to the true state. The two-step derivation strips away the magic and replaces it with something arguably more satisfying: a demonstration that the filter is not magic at all, but an ordinary least-squares problem, rendered tractable by one dimension-expanding trick and solved by reading off the blocks of a matrix equation you already knew how to write.

The universe, it turns out, did not need Gaussian noise to give us the Kalman filter. It only needed a quadratic cost and one carefully chosen constant equal to one.

The Kalman Filter Was an LQR Problem All Along — Here's the Two-Step Proof

The Science

What They Found

Why This Changes Things

What's Next

Source articles

Comments (0)