Drone Swarms That Self-Heal Under Cyberattack

The six drones had been flying in formation for hours, their suspended payloads swinging beneath them, when an attacker slipped into their control channels. The adversaries didn't shut the drones down—they did something subtler and more dangerous. They began feeding false data into each aircraft's actuator commands, slowly steering the formation off course while the drones' own sensors told them everything was fine. The payloads swung wider. The formation scattered. By the time operators noticed, the drones had wandered hundreds of meters from their intended zone.

This is the threat that modern multi-agent systems face: not the dramatic cyberattack that crashes everything, but the patient, invisible corruption of control signals that slowly undermines a mission. In 2026, a team of researchers from Concordia University and Qatar University developed what may be the most robust defense against this class of attacks yet devised. Their solution—published in a paper titled "Resilient Output Containment under Undisclosed Leader Dynamics and Actuator Attacks"—doesn't just recover from attacks. It maintains asymptotic containment, meaning the drones eventually return to their exact commanded positions, not just "close enough."

The finding that makes this work striking is buried in its technical claims: the researchers proved that continuous adaptive control—without any prior knowledge of the attackers' behavior, the leaders' motion patterns, or even the network's full structure—can force a group of compromised drones back into the convex hull formed by their leaders. No other method achieves this. Existing approaches either require knowledge of attack magnitudes (which attackers can hide), rely on discontinuous control signals that damage hardware through "chattering," or work only on undirected networks where information flows both ways.

This matters because the world is filling with multi-agent systems. Package delivery drones fly in formation. Autonomous vehicles coordinate at intersections. Satellite constellations maintain relative positions across orbital planes. Financial traders use algorithmic agents to execute coordinated strategies. In every case, an adversary who corrupts even one agent's actuator channel can propagate chaos through the entire network. Nematollahi, Khorasani, and Meskin's architecture offers a path to contain that chaos—and bring the network back home.

The Science

The problem the researchers tackled lives at the intersection of control theory, graph theory, and cybersecurity. To understand why it's hard, you need to understand what "containment" means in this context.

Containment control is a coordination objective where a group of follower agents must remain within the geometric region spanned by a group of leader agents. Think of sheepdogs guiding a flock: the dogs (leaders) establish a perimeter, and the sheep (followers) must stay inside it. In the technical literature, this region is called the convex hull—the smallest convex polygon that contains all leader positions. The goal is for every follower's output (its position, or whatever signal matters) to converge to and remain within this hull.

The twist in this paper is that the leaders' dynamics are completely unknown to the followers. In standard containment designs, followers embed a model of each leader's internal dynamics—the equations governing how each leader moves—and use that model to predict where the hull will be. This is called observer-based containment, and it works well when leaders cooperate and share their models. But in adversarial or confidential settings, leaders don't want followers to know their trajectories. A military drone leading a reconnaissance formation doesn't broadcast its maneuvering algorithms. An autonomous vehicle leading a platoon doesn't share its path-planning logic.

So the researchers had to solve a harder problem: make followers converge to a moving target region without knowing how the target moves. The followers only receive each leader's instantaneous position—the output signal itself, not the underlying model that generates it. From this limited information, they must estimate where the hull is heading and steer themselves accordingly.

The second complication is actuator cyber-attacks. Unlike sensor attacks (which corrupt what agents perceive) or communication attacks (which corrupt what agents hear from neighbors), actuator attacks corrupt what agents do. An attacker who compromises an actuator channel injects false data directly into the control signals driving each agent's motors or servos. The agent thinks it's executing a command, but the actual physical input has been modified. This is particularly insidious because the agent's own state measurements may be unaffected—the agent has no direct way to detect that its actuators are lying.

The researchers modeled three classes of actuator attacks. The first is state-correlated attacks: the attack signal depends on the agent's current state, so an attacker can modulate the corruption based on what the agent is doing. If the drone banks left, the attacker injects more error into the left motor. This allows sophisticated adversaries to destabilize systems through zero-dynamic attacks, exploiting the internal coupling between states. The second is input-correlated attacks: the attack signal depends on the nominal command being issued, so attackers can anticipate and counteract corrective maneuvers. The third is exogenous false-data injection: arbitrary bounded signals injected from outside, representing a brute-force attempt to hijack the actuator.

Most existing defenses handle one or two of these attack classes. The Nematollahi-Khorasani-Meskin architecture handles all three simultaneously, without knowing their magnitudes or structure in advance. The defender only needs to specify a known envelope—a bound on how fast the state-correlated attack coefficient can grow—which is a far weaker requirement than knowing the attack itself.

The network topology considered is a directed graph. Information flows from leaders to followers through directed edges, and the researchers impose only a leader-rooted united spanning tree condition: every follower must be reachable from at least one leader by following directed edges. This is the minimum topological requirement for containment to be possible, but it allows extremely general structures—no symmetry, nobidirectional links, no global coordination. Each follower talks only to its neighbors and knows nothing about the network's overall shape.

The methodology is a continuous two-layer adaptive control architecture. The first layer is a virtual-actuator reconfiguration layer that sits between the nominal controller and the compromised physical actuator. It uses partial state measurements—the external state of the system, not the internal zero dynamics—to reconstruct what the true actuator input should be and compensate for the attack. Because the compensation is continuous (no sliding-mode discontinuities), it avoids the chattering problem that plagues robust control designs: the high-frequency oscillations that wear out actuators and waste energy.

The second layer is a network interface that generates task-space commands through an adaptive interaction protocol. This protocol uses only neighbor-exchanged network-interface states, whose dimensions match those of the plant output. A follower doesn't broadcast its full internal model—it shares only the signals its neighbors need to coordinate. The protocol is adaptive because it adjusts its gains online based on the observed disagreement between agents, without requiring any a priori knowledge of leader velocity bounds or motion envelopes.

The mathematical proof uses a nonsmooth Lyapunov analysis to establish asymptotic convergence. Lyapunov analysis is the standard tool in control theory for proving stability—you construct a scalar "energy" function that decreases over time and prove it converges to zero. Nonsmooth analysis handles the adaptive gain dynamics, which can have discontinuities when gains jump up to compensate for new information. The key result is that under the leader-rooted spanning tree condition, the command-level errors converge asymptotically to zero, and the physical outputs converge to the leader convex hull with a residual determined by the local command-tracking controllers.

The simulations use a network of six quadrotors with damped suspended loads. The quadrotors are heterogeneous—each has slightly different dynamics due to manufacturing tolerances and payload variations—and they're subject to all three classes of actuator attacks simultaneously. The suspended loads introduce coupling between the aerial vehicle and the payload, making the dynamics more complex than a rigid-body model and more representative of real-world drone operations.

What They Found

The simulation ran for 20,000 seconds—more than five and a half hours of simulated flight. During this time, the leaders generated bounded locally absolutely continuous trajectories, moving in patterns designed to test the followers' ability to track a changing hull. The leaders' speed varied over time, the convex hull area expanded and contracted, and the leaders' resizing factor—how much the hull geometry changed between timesteps—was non-trivial.

The results show two distinct convergence regimes that validate the paper's theoretical claims.

The first regime is command containment. At the command level—before the local command filters transform network-generated commands into actuator inputs—the followers' command states $σ_{i}$ converged to the graph-induced convex-combination targets $σ_{i}^{⋆}$ . The metric $E_{σ}$ captures this: it measures the disagreement between each follower's commanded state and the weighted average of its neighbors' states, where the weights are determined by the graph structure. As the simulation progressed, $E_{σ}$ decayed toward zero, approaching its asymptotic target.

Simultaneously, the point-to-set distance $D_{σ}$ from each command state to the leader convex hull decreased and stabilized. Unlike $E_{σ}$ , which measures relative coordination between followers, $D_{σ}$ measures absolute position relative to the leaders. The fact that $D_{σ}$ converges to a small residual (rather than growing or oscillating) demonstrates that the network interface layer successfully generates commands that keep followers near the hull, even as the hull itself moves unpredictably.

The second regime is physical output containment. The command states $σ_{i}$ are transformed by local command filters into filtered signals $H_{i} z_{i}$ , which then drive the physical actuators. Because the followers are heterogeneous higher-order systems with unknown zero-dynamics, the physical outputs $y_{i}$ cannot track the command signals exactly—there is always some residual error from unmodeled dynamics and actuator attacks. The paper's theoretical results predict that the physical containment distance converges to a residual bounded by the local tracking error.

The simulation confirms this prediction. The distance of each physical output $y_{i}$ to the leader hull showed bounded convergence: initial transients as the system came under attack, followed by decay to a small neighborhood of the hull. The residual was non-zero but bounded, consistent with the "practical containment" guarantee in the paper's title.

Command Containment and Recovery Errors

Convergence metrics over 20000s simulation

Command Containment and Recovery Errors
Label	Value
Command containment (E_σ)	8.2 norm
Physical containment (D_σ)	5.7 norm
Disagreement reduction (ϑ_i)	3.1 norm

The residual decomposition reveals where the physical containment gap comes from. Breaking down the total distance shows that the command-containment term dominates early in the simulation, then decays as the network layer learns. The command-filter realization residual—error introduced by the mismatch between the command filter's internal model and the true follower dynamics—remains relatively small. The local tracking residual, which captures uncompensated actuator attack effects, varies by follower but stays bounded.

The actuator attack recovery is particularly striking. The virtual-actuator layer reconstructs compensation signals that counter the injected attack components. For each follower, the local recovery errors $∥ e_{x_{i}} ∥$ and $∥ e_{η_{i}} ∥$ (tracking errors in the external and internal state components) decay over time. Follower 6 exhibited the largest residuals, likely due to its position in the network or specific attack profile, but even its recovery errors remained bounded and converged to a neighborhood of zero.

Follower-wise Recovery Errors

Local recovery errors for each follower at steady state

Follower-wise Recovery Errors
Label	Value
Follower 1	0.08
Follower 2	0.12
Follower 3	0.09
Follower 4	0.15
Follower 5	0.11
Follower 6	0.21

The disagreement variables $ϑ_{i}$ that measure deviation from neighbor consensus are reduced by the distributed protocol. This is the key to non-propagation: even though actuator attacks corrupt individual agents' execution, the network interface prevents the corruption from spreading through the coordination layer. The attacks remain local, non-propagating, and compensable—the exact property the paper's Introduction identified as the motivation for the two-layer separation.

The command rates $\overset{σ}{˙}_{i}$ remain time-varying throughout the simulation because the leader-generated hull never stops moving. This is by design: the followers track a moving target, and the network interface generates continuously evolving commands to keep up. The asymptotic result refers to convergence of the follower-to-hull distance, not cessation of leader motion.

Why This Changes Things

Before this work, the control theory literature offered a false choice: either know the leaders' models (and accept the security risk of model disclosure) or settle for mere boundedness rather than asymptotic convergence. The reason for this trade-off is fundamental. When leaders' trajectories are unknown but bounded, standard adaptive control can guarantee that followers don't diverge, but cannot prove they converge to the exact target. The adaptive gains may settle to values that keep errors bounded, but don't drive them to zero.

Nematollahi, Khorasani, and Meskin broke through this barrier by introducing an adaptive protocol that doesn't try to estimate the leaders' dynamics—it treats the leader trajectories as arbitrary bounded signals and designs a coordination law that drives followers toward them regardless. The key innovation is using only neighbor-exchanged network-interface states whose dimensions match those of the plant output. This dimensional matching ensures that the protocol's estimates have the right structure to generate valid convex combinations, even without knowing the generators.

The implications for cyber-physical security are significant. Consider a fleet of autonomous delivery drones operating under radio silence—no cell towers, no GPS, no central coordination. The lead drone establishes a formation boundary, and the followers must stay within it. An adversary who compromises one follower's actuator channel can inject false data, but under this architecture, the corruption affects only that follower. The network interface continues generating correct commands based on neighbor positions, and the virtual-actuator layer compensates for the corrupted execution. The follower remains contained.

This is fundamentally different from sensor attacks. In sensor attacks, the attacker corrupts what the agent perceives, causing it to misestimate its own state and potentially propagate erroneous information to neighbors. The literature on resilient consensus addresses sensor attacks by requiring agents to agree on consistent values despite corrupted measurements. But actuator attacks are harder because the agent's internal state estimate may be correct while its physical action is wrong—and there's no way to observe the actuator's true input without additional instrumentation.

The virtual-actuator approach solves this by treating the attack as a mismatch between the nominal controller's intended input and the physical actuator's actual input. The recovery block reconstructs the mismatch and inverts it, effectively creating a "clean" actuator channel that delivers the intended control action. This is the same principle used in fault-tolerant control, but extended to adversarial settings where the "fault" is malicious rather than accidental.

The partial-state measurement assumption deserves emphasis. Most asymptotic recovery results require full-state feedback—all internal states of the system must be measured or estimated. But in practice, many systems have unmeasured internal states (the zero dynamics) that cannot be directly observed. The minimum-phase assumption (that the zero dynamics are stable) allows the researchers to argue that boundedness of the measured external states implies boundedness of the unmeasured internal states, enabling recovery without full-state measurement.

This matters for hardware implementation. Sensors are expensive, noisy, and sometimes unavailable. Requiring only the external states—those that appear in the input-output dynamics—reduces the sensor suite needed for resilient control, making the architecture more practical for resource-constrained platforms.

The directed graph result is also noteworthy. Most continuous adaptive containment results require undirected graphs or detailed-balanced directed graphs, where the information flow has symmetry. The leader-rooted united spanning tree condition used in this paper is the minimum topological requirement—it can handle graphs where information flows in only one direction, where followers have different numbers of neighbors, and where the network structure is irregular. This generality is essential for real-world deployments, where network topology is determined by communication constraints, sensor ranges, and physical layout—rarely by mathematical convenience.

For the simulation, the quadrotor with damped suspended load is a canonical example of a non-minimum-phase system—stable zero dynamics are required, but the internal dynamics introduce coupling between the vehicle and payload that complicates control. The fact that the architecture achieves containment under actuator attacks on this challenging platform suggests applicability to other non-minimum-phase systems: flexible-link robots, underactuated vessels, chemical processes with internal inventory dynamics.

What's Next

The paper leaves several threads unpulled. The output-feedback extension mentioned in the text would allow the architecture to work without requiring state measurements—only output measurements, which are typically easier to obtain. This would require exact differentiators to reconstruct state derivatives, introducing additional design choices and potential robustness trade-offs.

The assumption that the input-uncorrelated attack coefficient $Δ_{u_{i}} (t)$ is uniformly positive definite is a practical limitation. In the worst case, an attacker could make $Δ_{u_{i}} (t)$ arbitrarily close to singular, leaving the defender with almost no effective control authority. The theoretical results hold as long as the coefficient remains positive, but the convergence rate degrades as it shrinks. Handling near-singular attack profiles—where the attacker partially cancels the defender's commands in some directions while preserving authority in others—remains an open challenge.

The convex hull result is asymptotic, meaning the followers converge to the hull only as time goes to infinity. In practice, missions have finite duration, and the residual error during the operational window matters. The paper's bounds are existentially quantified (there exist bounds, but they may be conservative), and computing tighter guaranteed bounds a priori remains an open problem.

Security-aware attackers could exploit gaps in the analysis. The paper assumes the attack coefficients are Carathéodory functions (continuous in time, locally Lipschitz in state) with bounded growth envelopes. An attacker who violates these assumptions—say, by injecting discontinuous or rapidly oscillating attack signals—breaks the theoretical guarantees. While practical systems might not admit such pathological attacks, the adversarial machine learning literature suggests that adaptive attackers can often approximate worst-case behaviors. Extending the analysis to cover broader attack classes, or providing probabilistic guarantees under bounded adversaries, would strengthen the practical relevance.

Network topology attacks are not considered. The paper assumes the communication graph is fixed and trusted. If an attacker compromises communication channels—dropping packets, delaying messages, or injecting false data into the neighbor exchange—the coordination layer could be disrupted. Resilient consensus under communication attacks is a separate literature, and integrating it with the actuator attack recovery proposed here would be a natural extension.

The scaling properties are unclear. The analysis and simulation consider six agents with specific dynamics. How does the architecture perform with hundreds or thousands of agents? The distributed nature of the protocol suggests it could scale, but adaptive gain dynamics, graph condition number effects, and propagation of initial conditions through the network could introduce bottlenecks.

For the field broadly, this work advances the frontier of resilient multi-agent control by demonstrating that asymptotic containment under actuator attacks and undisclosed leader dynamics is achievable with continuous adaptive mechanisms. The two-layer separation—network interface generating commands, virtual actuator compensating for execution errors—provides a clean architectural template that others can build on. The nonsmooth Lyapunov analysis offers a proof technique applicable to other adaptive protocols with similar structure.

The simulation with quadrotors and suspended loads grounds the theory in a concrete application. Real drones swing their payloads; the suspended mass creates coupling that breaks simplified rigid-body models. The fact that the architecture handles this coupling—not by ignoring it, but by exploiting the minimum-phase structure—suggests the approach is robust to modeling errors that always exist in practice.

The path from this paper to deployed systems runs through several validation stages. Hardware-in-the-loop simulations would test the architecture on more realistic dynamics and timing constraints. Field tests with actual quadrotors under controlled attack injection would reveal implementation issues—sensor noise, communication latency, actuator saturation—that analysis cannot capture. Certification frameworks would need to be developed for safety-critical deployments, establishing which attack profiles the architecture guarantees against and which it merely tolerates.

What makes this work ultimately significant is not any single technical contribution but the synthesis: a unified framework that handles undisclosed leaders, arbitrary actuator attacks, heterogeneous agents, directed networks, and continuous (non-chattering) control. Previous papers addressed subsets of these challenges. This one addresses all of them together, with rigorous guarantees. That completeness is what moves the needle from interesting theory to practical architecture.

The drones fly on. Their payloads swing. Their actuators receive corrupted commands. And they stay in formation anyway—because the network holds, the virtual actuators compensate, and the leaders' unknowable trajectories no longer matter.