Inverse Game Theory for Constrained Multi-Agent

Q: State dimension reduction

r < n — Only r of n state dimensions are truly dynamic; the rest are algebraic constraints enforced instantaneously

Q: Solution set structure

Convex & rectangular — Each player's cost parameters can be identified independently — the sets decouple across players

Q: Equilibrium concept

Feedback Nash Equilibrium — No agent can unilaterally reduce their cost by deviating, given all others' strategies

Q: Key mathematical tool

Kronecker Canonical Form — Decomposes the descriptor system into dynamic and algebraic parts, enabling the reduction to a standard LQDG

Q: Identification method

Convex optimization — An admissible cost realization can be computed via semidefinite programming whenever the solution set is nonempty

Q: Setting

Infinite-horizon, model-based — System matrices E, A, Bᵢ assumed known; extension to model-free setting is future work

Imagine watching two self-driving cars navigate an intersection. Each one makes hundreds of micro-decisions per second — when to accelerate, when to yield, how much space to leave. Now imagine you're an engineer trying to figure out why each car behaves the way it does. Not in a physical sense, but in a strategic one: what objectives, weights, and preferences encoded in the car's software produce exactly the behavior you observed?

This is the inverse dynamic game problem, and it sits at the heart of some of the most consequential questions in modern AI and engineering. Get it right, and you can verify that autonomous systems pursue goals aligned with human values. Get it wrong, and you may mistake a car's physical inability to swerve — a constraint of road geometry — for a deliberate strategic choice. The misattribution compounds: every inferred objective downstream becomes corrupted.

A new paper from the Indian Institute of Technology Madras addresses precisely this failure mode. Kumar and Reddy (2026) develop a rigorous mathematical framework for identifying the full set of objectives consistent with observed multi-agent behavior, specifically for systems where physical constraints are not just background context but core ingredients of the dynamics. Their key result: that set of consistent objectives is always convex and has a particularly clean structure — a finding that makes the reconstruction problem tractable in a way it wasn't before.

The Science

The setting is what mathematicians call a linear-quadratic descriptor differential game (LQDDG) — a mouthful that breaks down into three conceptual pieces. "Linear-quadratic" means the dynamics are linear (the future state is a linear function of the current state and inputs) and the cost functions each agent minimizes are quadratic (penalizing squared deviations from desired states and control effort). "Differential game" means multiple self-interested agents interact over time, each trying to minimize their own cost while anticipating others' strategies. "Descriptor" is the crucial innovation: the dynamics are governed by differential-algebraic equations (DAEs) rather than ordinary differential equations (ODEs).

The difference between DAEs and ODEs matters enormously here. In an ODE model of a multi-agent system, every component of the state evolves freely according to differential equations. But real physical systems often enforce instantaneous constraints — conservation of energy in a power network, rigid-body contact forces in robotics, market-clearing conditions in an electricity exchange. These constraints are algebraic: they don't involve time derivatives, yet they link state variables and control inputs in ways that restrict what's possible. A DAE, written as $E \overset{x}{˙} (t) = A x (t) + \sum_{i} B_{i} u_{i} (t)$ , captures this by allowing the matrix $E$ to be singular (specifically, $\text{rank}(E) = r < n$). When $E$ is not the identity matrix, the system has both differential and purely algebraic components — the latter enforcing the constraints.

The inverse problem the paper tackles is this: given observed feedback strategies $F^{⋆}$ (each agent's control as a linear function of the current state), and given knowledge of the system matrices $E$ , $A$ , and $B_{i}$ , identify all cost function parameters ${Q_{i}, R_{ij}}$ — the state-penalty and control-penalty matrices — for which $F^{⋆}$ is a feedback Nash equilibrium (FBNE). A FBNE is the multi-player generalization of optimal control: no agent can unilaterally switch strategies and reduce their own cost, given what everyone else is doing (Kumar & Reddy, 2026).

The paper builds on prior work in inverse linear-quadratic differential games (IDGs) for unconstrained ODE systems, particularly the solution-set characterization of Inga et al. (2021), extending it to the richer and more realistic DAE setting.

What They Found

The central analytical challenge is that descriptor systems have a mixed character: $r$ "dynamic" state components evolve via differential equations, while $n - r$ "algebraic" components are instantaneously determined by the controls through the constraint relation

$x_{2} (t) = - i \sum \overset{ˉ}{B}_{2 i} u_{i} (t) .$

This means the constraint manifold $C$ — the set of admissible state-input pairs — is baked into the problem structure. Any inverse method that ignores this will conflate constraint-driven behavior with preference-driven behavior.

The authors' key move is a coordinate transformation using the Kronecker canonical form (KCF), a classical tool from descriptor systems theory. By finding nonsingular matrices $X$ and $Y$ such that the system decomposes into its dynamic and algebraic parts, they reduce the $n$-dimensional descriptor game to a standard $r$-dimensional linear-quadratic game on the constraint manifold. The map $\Omega(F) := F(I_n + X_2\bar{B}_2 F)^{-1} X_1$ translates any full-state feedback law for the original system into an equivalent reduced feedback law for this $r$-dimensional game (Kumar & Reddy, 2026). Crucially, the closed-loop eigenvalues — the poles determining system stability — are preserved under this map.

With the system reduced, the necessary and sufficient conditions for an FBNE reduce to two requirements. First, a set of coupled algebraic Riccati equations (CARE) must be satisfied — one for each player $i$ :

$\overset{ˉ}{A}_{cl}^{⊤} \overset{ˉ}{P}_{i} + \overset{ˉ}{P}_{i} \overset{ˉ}{A}_{cl} + [I_{r} \overset{ˉ}{F}^{⋆}]^{⊤} \overset{ˉ}{M}_{i} [I_{r} \overset{ˉ}{F}^{⋆}] = 0$

where $\overset{ˉ}{A}_{cl} = J + \overset{ˉ}{B}_{1} \overset{ˉ}{F}^{⋆}$ is the stable closed-loop dynamics matrix and $\overset{ˉ}{M}_{i}$ encodes player $i$'s cost. Second, the reduced feedback gains must satisfy a system of linear equations linking them to the cost parameters.

The paper's main theorem (Theorem 3.4) characterizes the solution set by reformulating these conditions. For each player $i$ , two sets are defined: $Γ_{i}^{1}$ , the set of cost parameters satisfying a linear matrix equation derived from the CARE and the optimality conditions, and $Γ_{i}^{2}$ , the set of parameters satisfying the positive-definiteness requirement $R_{ii} + \overset{ˉ}{B}_{2 i}^{⊤} X_{2}^{⊤} Q_{i} X_{2} \overset{ˉ}{B}_{2 i} ≻ 0$ (which ensures the optimization problem is well-posed). The full solution set is then:

$Θ (F) = i \in N \prod (Γ_{i}^{1} \cap Γ_{i}^{2})$

This Cartesian product structure — denoted "rectangular" in the paper — means each player's cost parameters can be identified independently of the other players'. This is a remarkably clean result. In principle, the coupling between players through Nash equilibrium conditions could create entangled constraints across all players' parameters. Instead, the constraints decouple: player $i$'s cost parameters are constrained only by player $i$'s own optimality conditions. The set $\Gamma_i^1$ is a linear affine subspace (hence convex), $Γ_{i}^{2}$ is a convex set defined by a positive-definiteness constraint, and their intersection is therefore convex. The product of convex sets is convex. The solution set is convex (Kumar & Reddy, 2026).

(a) Source: Aaditya Kumar, Puduru Viswanadha Reddy

The paper further provides a constructive algorithm — a convex optimization problem — to compute an admissible element of the solution set whenever it is nonempty. This is practically essential: knowing the solution set exists is theoretically satisfying; being able to compute a specific element of it is what engineers actually need.

(b) Source: Aaditya Kumar, Puduru Viswanadha Reddy

Key Facts

r < n State dimension reduction Only r of n state dimensions are truly dynamic; the rest are algebraic constraints enforced instantaneously

Convex & rectangular Solution set structure Each player's cost parameters can be identified independently — the sets decouple across players

Feedback Nash Equilibrium Equilibrium concept No agent can unilaterally reduce their cost by deviating, given all others' strategies

Kronecker Canonical Form Key mathematical tool Decomposes the descriptor system into dynamic and algebraic parts, enabling the reduction to a standard LQDG

Convex optimization Identification method An admissible cost realization can be computed via semidefinite programming whenever the solution set is nonempty

Infinite-horizon, model-based Setting System matrices E, A, Bᵢ assumed known; extension to model-free setting is future work

Numerical examples with a two-player system confirm the theory: the algorithm converges to cost parameters that, when plugged into the forward game, reproduce the originally observed feedback strategies as a Nash equilibrium (Kumar & Reddy, 2026).

Why This Changes Things

The significance of this work lies in what it prevents as much as what it enables. The inverse dynamic game problem appears in nearly every domain where we need to understand or verify the objectives of interacting agents: shared human-machine control systems (where we want to infer the human driver's intent), autonomous driving (where vehicles need models of other drivers' preferences), biological motion analysis (where researchers infer animal objectives from trajectories), multi-robot coordination, and electricity markets (where market designers want to check that participants' behavior is consistent with competitive equilibrium).

In all these domains, the systems are almost certainly constrained. Power networks obey Kirchhoff's laws. Robotic arms in contact with surfaces obey rigid-body constraints. Road vehicles obey kinematic constraints at low speeds. If you model these systems with ordinary differential equations — pretending the constraints aren't there — you will systematically misidentify the agents' objectives. Behavior that is forced by the constraints will look, to your model, like it was chosen strategically. The inferred cost functions will absorb this spurious attribution, producing parameters that are hard to interpret and may be actively misleading.

State Dimensions vs. Constraint Reduction in Descriptor Systems

In a descriptor system of dimension n with rank(E) = r, only r dimensions are truly dynamic (governed by differential equations), while n - r dimensions are algebraic (instantaneously determined by constraints). This chart illustrates the conceptual split for a representative example with n = 5 and r = 3.

State Dimensions vs. Constraint Reduction in Descriptor Systems
Label	Value
Total state dimension (n)	5
Dynamic (differential) dims (r)	3
Algebraic (constraint) dims (n−r)	2

The descriptor framework cuts this error off at the root. By explicitly representing the algebraic constraints in the dynamics, the model separates the two sources of behavior: what agents must do (constraints) and what they choose to do (strategy). The inverse problem then correctly identifies only the strategic objectives, leaving the constraint-enforced behavior where it belongs — in the structure of $E$ .

Key Properties of the Inverse LQDDG Solution Set

Qualitative comparison of key structural properties of the solution set Θ(F) established in Theorem 3.4 of Kumar & Reddy (2026). Each property is assessed on a 0–5 scale reflecting how strongly it is established or useful in the paper.

Key Properties of the Inverse LQDDG Solution Set
Label	Value
Convexity of Θ(F)	5
Rectangular (per-player) structure	5
Algorithmic computability	4
Handles DAE constraints	5
Scales to N players	4
Model-free applicability	1

The rectangular, convex structure of the solution set has additional practical implications. Rectangularity means a distributed identification algorithm is possible: each agent's cost can be inferred independently, potentially in parallel, without requiring global coordination. Convexity means the solution set is geometrically nice — no local optima to trap numerical solvers, no disconnected components to explore. Standard convex programming tools (semidefinite programming, in particular) can reliably find a valid solution or certify that none exists.

This connects to a broader agenda in the field. Single-agent inverse optimal control (IOC) — inferring an individual's objectives from observed behavior — has a long history in robotics and economics, and efficient algorithms exist. Extending IOC to multi-agent settings (inverse dynamic games, or IDGs) is harder because Nash equilibrium conditions couple all players' optimality requirements. The work of Inga et al. (2021) showed that for unconstrained linear-quadratic games, this coupling doesn't destroy the convex structure. Kumar and Reddy (2026) show the same is true when you add algebraic constraints — a nontrivial extension that required developing new analytical machinery around the Kronecker canonical form and the $Ω$ map.

It is also worth appreciating what the paper does not claim. The solution set can be very large — there will generally be many cost functions consistent with any observed behavior. Identifying the set is not the same as uniquely pinpointing the "true" objectives of the agents. For unique identification, additional assumptions or additional data would be needed. This is an inherent feature of inverse problems, not a deficiency of this particular approach. The paper is explicit about this: characterizing the full solution set is itself the goal, because it tells engineers exactly how much freedom remains in the cost reconstruction and what additional measurements or assumptions would be needed to narrow it down.

What's Next

The most immediate limitation the paper acknowledges is the assumption of a model-based setting: system matrices $E$ , $A$ , and $B_{i}$ are assumed known. In many real applications — particularly in human-machine interaction and autonomous driving — the dynamics must themselves be estimated from data. Extending this framework to the model-free (or model-estimated) case, where both the dynamics and the objectives must be inferred simultaneously, is the natural next step. Martirosyan et al. (2023) have developed such approaches for unconstrained LQDGs; adapting them to descriptor systems would require handling the additional uncertainty about the constraint structure.

A second open direction is the finite-horizon case. This paper focuses on the infinite-horizon setting, where the equilibrium conditions reduce to algebraic Riccati equations. Finite-horizon games — where the interaction has a definite endpoint — require solving differential Riccati equations, which are harder to analyze but more relevant for tasks with clear temporal boundaries, like merging onto a highway or executing a specific robotic manipulation.

Third, the paper assumes the observed strategy profile is already a Nash equilibrium of some cost function. In practice, observed human behavior may be approximately rational, or rational with respect to objectives that don't fit the quadratic structure assumed here. Robustness of the inverse approach to model misspecification — what happens when no cost function exactly rationalizes the data — remains to be studied.

None of these limitations diminish the paper's contribution, which is to lay rigorous theoretical foundations for a problem that has been systematically avoided in the IDG literature: the inverse game with constrained, descriptor dynamics. As autonomous systems become more prevalent and more physically integrated into networks with hard constraints — power grids, robotic manufacturing floors, connected vehicle platoons — the ability to correctly infer the objectives driving multi-agent behavior will become not just academically interesting, but practically essential. Knowing the solution set is rectangular and convex is exactly the kind of structural fact that makes the difference between an intractable problem and a solvable one.

The Reverse-Engineering Problem: When AI Needs to Infer Why, Not Just What

The Science

What They Found

Why This Changes Things

What's Next

Source articles

Comments (0)