Meridia Insight Tech for Good Frontiers

The Robot Ship That Practises Before It Sails: How Virtual Sea Trials Are Making Autonomous Vessels Safer

A new simulator framework lets autonomous boats run standardised ocean manoeuvres in a virtual world — with results accurate enough to trust for real-world navi

Virtual zigzag turns accurate to under 1° of overshoot — from a simulated boat that's never touched water.

Imagine trying to teach a self-driving car how to handle an icy corner — but you're only allowed to describe what the steering wheel was turned to, not what the wheels actually did. That gap, small as it sounds, is precisely the kind of error that has quietly undermined a whole class of autonomous ship simulators. And it's the problem that Paria Rezayan, a PhD researcher at Sheffield Hallam University, has spent her doctoral work trying to fix.

The result is a virtual sea-trial framework precise enough that a simulated unmanned boat — one that has never touched actual water — can complete internationally standardised navigation tests with heading errors below one degree.

That number matters. The International Maritime Organization (IMO), the United Nations body that governs global shipping safety, defines criteria for how ships should respond to steering inputs. One of the most demanding of these is the zig-zag test: steer hard in one direction, wait for the hull to respond, then steer hard the other way and measure how far the vessel "overshoots" before it corrects. For a ship to pass IMO criteria, that overshoot must stay within defined limits. In Rezayan's virtual trials, the overshoot excess across all tested configurations remained below — a result the paper describes as satisfying IMO criteria (Rezayan, 2026).

For an autonomous vessel designer, this is a significant threshold to clear, and clearing it in simulation — repeatably, cheaply, and without a sea state to contend with — is the whole point.

The Science

The domain is called system identification (SI) — the process of figuring out exactly how a vessel responds to forces and steering commands so that you can build an accurate mathematical model of its behaviour. Think of it as measuring a ship's personality: how quickly does it turn? How far does it drift sideways? How long does it take to stop? These responses are governed by hydrodynamic derivatives — essentially, the numerical coefficients in the equations of motion that describe how water pushes back against a hull as it moves.

Getting those numbers right is the foundation of every autonomous navigation system, every collision-avoidance algorithm, and every digital twin (a continuously updated virtual model of a real vessel). Get them wrong, and a self-navigating ship might misjudge a harbour turn or fail to avoid a fishing boat in fog.

Traditionally, hydrodynamic derivatives were measured the hard way: captive model tests in towing tanks, full-scale sea trials on real vessels, or computationally expensive fluid dynamics simulations. Each approach has crippling limitations. Physical trials are costly, weather-dependent, and dangerous for small unmanned vehicles. High-fidelity computational fluid dynamics (CFD) tools — the kind used to simulate how water flows around a hull at the molecular scale — can take days of compute time per manoeuvre, making them useless for the rapid, iterative testing that autonomous systems require.

The answer, increasingly, is marine robotics simulators: software environments that combine realistic physics engines, sensor emulation, and real-time execution. Rezayan works with MARUS (Marine Robotics Unity Simulator), an open-source platform built on the Unity game engine — the same engine that powers many commercial video games — connected to ROS2 (Robot Operating System 2), the standard middleware for robotics research worldwide. MARUS offers photorealistic rendering, physics-based wave and buoyancy modelling, a full virtual sensor stack including GPS, IMU, LiDAR, and sonar, and real-time data exchange between the game engine and any external control software.

Figure 2: Example MARUS simulation scene showcasing the USV in the virtual environment.
Figure 2: Example MARUS simulation scene showcasing the USV in the virtual environment. Source: Paria Rezayan

The vessel used in these trials is a small USV approximately 2.95 metres long, 1.47 metres wide, and weighing 300 kilograms — roughly the size of a large inflatable dinghy. Crucially, it steers not via a physical rudder but through differential thrust: two propellers, one on each side, whose speed difference generates a turning moment. This is common in small autonomous vessels, but it creates a conceptual problem for standard maritime safety procedures, which are all written in terms of rudder angle. Rezayan's framework resolves this by computing a rudder-equivalent proxy — a derived signal that translates the port-starboard thrust imbalance into a meaningful angle — and logging it alongside the raw thrust commands.

The simulated vessel achieves a target approach speed of , corresponding to 90% of the steady speed at 85% Maximum Continuous Rating (MCR) — the standard ITTC approach condition. Manoeuvres begin only once the vessel has stabilised to this speed with minimal yaw drift.

Figure 3: ROS–Unity architecture for actuation-traceable virtual sea trials. Red: ROS control layer; blue: Unity simulation and physics engine.
Figure 3: ROS–Unity architecture for actuation-traceable virtual sea trials. Red: ROS control layer; blue: Unity simulation and physics engine. Source: Paria Rezayan

What They Found

The framework runs two classes of standardised manoeuvres, both defined by the IMO and the International Towing Tank Conference (ITTC): the Turning Circle (TC) and the Zig-Zag (ZZ).

The Turning Circle is exactly what it sounds like: apply full rudder-equivalent steering ($\delta_{\text{cmd}} = \pm 35°$) and hold it until the vessel has completed one and a half full rotations ($540°$ of heading change). From the resulting spiral trajectory, three distances are measured: advance (how far forward the vessel travels before completing a quarter-turn), transfer (how far sideways), and tactical diameter (the width of the full circle). All are normalised by the vessel's length between perpendiculars ($L_{PP}$) to allow meaningful comparison regardless of vessel size.

(a) Turning Circle manoeuvre.
(a) Turning Circle manoeuvre. Source: Paria Rezayan

In Rezayan's virtual trials, the normalised advance differed by approximately 3.9% between turning to port (left) and turning to starboard (right). The tactical diameter differed by 4.6–4.7%. These asymmetries are small but not zero — they likely reflect minor imperfections in the differential-thrust model or the initial heading at manoeuvre onset. Importantly, the paper treats these differences as informative rather than inconvenient: they are traceable, reproducible, and will persist in future trials if the underlying model stays the same. That's exactly what you want from a virtual sea trial. Repeatability is the product.

Turning Circle Metrics: Port vs Starboard (Normalised by L_PP)

Comparison of normalised advance and tactical diameter between port and starboard turning circle tests, showing ~3.9% and ~4.6–4.7% asymmetry respectively.

Turning Circle Metrics: Port vs Starboard (Normalised by L_PP)
LabelValue
Advance3.9 × L_PP
Tactical Diameter5.1 × L_PP

The Zig-Zag test is more demanding. The vessel is steered to a prescribed heading deviation ($\pm 10°$ or $\pm 20°$), then the steering direction is reversed, and the controller waits for the vessel to cross through the opposite heading. The key metric is the overshoot angle: how far the vessel's heading continues to move in the original direction before the reversal takes effect. It measures how quickly a vessel's yaw can be checked — how "snappy" it is to steering changes.

(b) Zig–Zag 20/20 manoeuvre.
(b) Zig–Zag 20/20 manoeuvre. Source: Paria Rezayan

For both the and manoeuvres, the first and second overshoot excesses remained below . Peak yaw rates ranged from approximately to across the two test magnitudes. The IMO criteria for overshoot angles depend on vessel length and approach speed; the paper confirms that all results satisfy the applicable thresholds.

Zig-Zag Peak Yaw Rates by Manoeuvre Magnitude

Peak yaw rates recorded during ±10° and ±20° zig-zag manoeuvres. Higher rudder-equivalent inputs produce faster yaw responses.

Zig-Zag Peak Yaw Rates by Manoeuvre Magnitude
LabelValue
ZZ ±10°4.1 deg/s
ZZ ±20°5.8 deg/s

The central technical contribution, though, is not the manoeuvre numbers themselves — it's what's recorded alongside them. The framework logs two distinct signals for every actuation event: , the ordered rudder-equivalent command, and , the executed rudder-equivalent proxy computed directly from the applied thrust values:

This separation allows downstream analysis to use what the vessel actually did, not what it was told to do. Manoeuvre timing, overshoot detection, and metric extraction are all anchored to the moment realised actuation exceeds a small threshold , rather than the moment the command was issued. In a system with propeller spin-up delay, actuator saturation, or any gap between intent and execution, these two moments are not the same.

A secondary consistency check compares the yaw rate logged directly from Unity's physics engine ($r_{\text{logged}} = \dot{\psi}{\text{unity}}$) with the yaw rate computed by differentiating the post-processed nautical heading ($r{\text{calc}} = d\psi_{\text{nautical}}/dt$). Agreement between these two independent signals validates the internal coherence of the dataset.

Why This Changes Things

The problem Rezayan is solving is not exotic. It's a systematic error baked into how most simulation-based ship studies are done.

When researchers use a simulator to generate training data for a ship's navigation AI or to estimate its hydrodynamic derivatives, they typically record the command sent to the thrusters and treat it as the input signal to the physics model. But command and execution are rarely identical. A propeller doesn't spin up instantaneously. A differential-thrust system can saturate. There can be communication latency in the ROS pipeline. In physical sea trials, this gap is well understood and accounted for. In simulation, it's often simply ignored — producing datasets where the labelled input doesn't match the force that actually acted on the hull.

For system identification, this is poisonous. The standard methods for estimating hydrodynamic derivatives — whether classical regression, Kalman filtering, or modern physics-informed neural networks — are fitting a model to the relationship between input forces and output motions. Feed them a mislabelled input, and the coefficients they produce will be subtly wrong in ways that may not be obvious until the real vessel tries to execute a tight harbour approach and the predictions diverge from reality.

Rezayan's framework addresses this not by making the simulator more physically perfect, but by being more honest about what the simulator is actually doing. The explicit command-execution separation is a form of epistemic hygiene — a principled refusal to conflate intent with outcome.

This matters especially for digital twins: continuously updated virtual models that track a real vessel's changing condition over time. A digital twin trained on biased SI data will drift from reality as the vessel ages, its hull fouls, its propellers wear. A twin trained on clean, traceable, execution-grounded data has a better chance of staying calibrated. As the paper puts it, the framework "addresses a known SI failure mode whereby commanded inputs are treated as achieved actuation — common in simulator-based SI datasets — despite actuator saturation and practical implementation constraints."

The broader context is the accelerating push toward fully autonomous commercial shipping. Several maritime nations — Norway, Japan, South Korea, the United Kingdom — have active programmes developing autonomous or remotely operated vessels for coastal logistics, offshore inspection, and ocean science. The regulatory frameworks governing these vessels, including the IMO's Maritime Autonomous Surface Ships (MASS) guidelines currently under development, will require demonstrable compliance with manoeuvring standards. Virtual sea trials that produce auditable, IMO-aligned datasets are not an academic curiosity; they are the evidentiary infrastructure that regulators will eventually demand.

Zig-Zag Overshoot Excess vs IMO Limit

First and second overshoot excesses for ±10° and ±20° zig-zag manoeuvres remain below the IMO threshold.

Zig-Zag Overshoot Excess vs IMO Limit
LabelValue
ZZ ±10° 1st Overshoot0.8 °
ZZ ±10° 2nd Overshoot0.7 °
ZZ ±20° 1st Overshoot0.9 °
ZZ ±20° 2nd Overshoot0.8 °

What's Next

The framework as presented has important caveats. It is validated on a single small USV — a 300-kilogram, 3-metre vessel — in calm water. The Unity physics engine, while realistic enough for many purposes, is not a certified hydrodynamic solver, and its fidelity for phenomena like wave-induced rolling or current-affected turning has not been systematically benchmarked against towing-tank data. The paper acknowledges that the port-starboard asymmetry in turning-circle results — small but present — warrants further investigation to separate genuine physics from numerical artefacts.

The framework is also limited to the two most fundamental manoeuvre types. Real-world USV assessment includes stopping tests, spiral manoeuvres, and combined course-keeping and disturbance-rejection scenarios. Extending the pipeline to these tests is a natural next step.

Perhaps most importantly, the paper describes a data-generation and conditioning infrastructure, not a complete system identification pipeline. The SI-ready datasets it produces still need to be fed into derivative-estimation methods — whether classical grey-box models, PINNs, or Gaussian process regression — and those methods carry their own assumptions and limitations. The quality of the downstream physics model depends on both the quality of the data and the appropriateness of the estimation technique.

What Rezayan has built is something more foundational: a trusted on-ramp. For autonomous vessel research to mature, the field needs common, auditable, repeatable test procedures that connect simulator behaviour to international standards. Without that, every research group is running its own virtual trials in its own way, producing results that can't be compared, combined, or used as evidence in a regulatory context.

A framework that logs not just what happened, but the difference between what was commanded and what was executed — and that does so within an open-source simulator aligned to IMO and ITTC procedures — is a building block the field has been missing. The boat hasn't sailed yet. But it knows exactly how it would turn.

If commanded inputs are treated as achieved actuation without accounting for delays, execution constraints, and actuator saturation, subsequent manoeuvring metrics and hydrodynamic-derivative estimates can become biased and unreliable.

Comments (0)

No comments yet. Be the first to share your thoughts.