How a 2-Byte ID Slashed 28% of Protocol Overhead

StreamRTPS: A Smarter Way to Talk in Real-Time Systems

27.9%. That’s the maximum bandwidth saved by a new protocol tweak that slashes the overhead of a foundational communication standard used in self-driving cars, robots, and industrial control systems—without slowing anything down.

In a world where milliseconds matter and every byte counts, the Real-Time Publish Subscribe (RTPS) protocol underpins how machines coordinate in high-stakes environments. But like many legacy protocols, it carries decades of accumulated baggage: a 44-byte header on every packet, most of which never changes during a communication session. For small, frequent messages—like sensor readings or control signals—this overhead can dominate the actual data being sent.

Now, researchers at RWTH Aachen University have introduced StreamRTPS, a backward-compatible upgrade that rethinks how RTPS handles data in motion. By replacing bulky headers with a compact 2-byte stream ID, aggregating messages to amortize transport costs, and intelligently suppressing redundant control traffic, they’ve cut bandwidth use by nearly a third in some cases—while preserving the real-time guarantees that make these systems safe.

This isn’t just an academic exercise. It’s a quiet revolution for the invisible plumbing of cyber-physical systems, where efficiency translates directly into lower latency, reduced power consumption, and more room for innovation in bandwidth-constrained environments like in-vehicle networks or drone swarms.

The Science

The study, led by David Philipp Klüner, Stefan Kowalewski, and Alexandru Kampmann, targets the Data Distribution Service (DDS), a middleware standard developed by the Object Management Group (OMG) for distributed real-time systems. DDS is the default communication layer in Robot Operating System 2 (ROS 2), widely used in robotics, autonomous vehicles, and industrial automation.

At its core, DDS relies on the Real-Time Publish Subscribe (RTPS) wire protocol to exchange data between publishers and subscribers. Each RTPS packet carries a fixed 44-byte header containing metadata like protocol version, participant IDs, and sequence numbers. For small payloads—say, a 16-byte sensor reading—this means over 70% of the transmitted data is overhead.

StreamRTPS introduces three interlocking optimizations, all designed to reduce this inefficiency while maintaining full compatibility with existing RTPS implementations:

Stream Negotiation: During the discovery phase, static header fields are exchanged once and replaced at runtime with a 2-byte StreamID. This cuts per-packet header overhead from 44 bytes to as little as 2.
Payload Aggregation: Multiple small messages destined for the same receiver are bundled into a single UDP packet, reducing the per-message cost of IP and UDP headers.
Predictive Heartbeat Suppression: For periodic data streams, heartbeats—which confirm message delivery in reliable transport—are suppressed when the pattern is predictable, falling back only when packet loss is detected.

The authors implemented StreamRTPS as an extension to EmbeddedRTPS, an open-source RTPS library, and evaluated it against FastDDS (a reference implementation) and Zenoh (a modern, compact alternative). Experiments were conducted on a testbed of four Intel NUC-like machines connected via 1 Gbps Ethernet, with traffic measured using TShark and timing synchronized via PTP.

Crucially, StreamRTPS is backward-compatible: systems that don’t support the new features fall back to standard RTPS, ensuring seamless integration in mixed environments.

What They Found

The results are striking—especially for small, high-frequency messages, which are common in real-time control systems.

Stream headers alone reduce bandwidth by up to 27.9% in best-effort transport for 16-byte payloads. As payload size increases, the relative savings decrease—down to 4.4% for 1024-byte messages—but still represent meaningful gains in data-intensive applications.

Stream Header Efficiency by Payload Size

Bandwidth reduction from stream headers across payload sizes under best-effort transport.

Stream Header Efficiency by Payload Size
Label	Value
16 B	27.9
32 B	22.1
64 B	15.8
128 B	10.3
256 B	6.7
512 B	5.1
1024 B	4.4

This improvement scales with message frequency. At 10 Hz, the bandwidth reduction for 128-byte messages is 11.4%. At 100 Hz, it jumps to 19.5%—a linear relationship confirming that the savings accumulate with each transmitted packet.

Bandwidth Savings Scale with Frequency

Bandwidth reduction for 128-byte payloads at varying send frequencies.

Bandwidth Savings Scale with Frequency
Label	Value
10 Hz	11.4
25 Hz	14.5
50 Hz	17.3
75 Hz	18.8
100 Hz	19.5

When combined with heartbeat suppression under reliable transport, the gains compound. Here, the system reduces control traffic by omitting heartbeats when data arrives predictably, falling back only when loss is detected. This yields an additional 22.7% reduction on top of stream headers, slashing control overhead without compromising reliability.

Control Traffic Savings from Heartbeat Suppression

Additional bandwidth reduction from heartbeat suppression under reliable transport.

Control Traffic Savings from Heartbeat Suppression
Label	Value
With Stream Headers	0
With Heartbeat Suppression	22.7

Critically, these savings come with no measurable increase in transmission latency. In fact, the authors note that latency characteristics are nearly identical to the baseline EmbeddedRTPS implementation, confirming that the optimizations operate at the protocol layer without introducing processing delays.

The payload aggregation mechanism also performs as designed. By tying aggregation windows to Quality of Service (QoS) deadlines, the system maximizes bundling while ensuring timing constraints are never violated. For example, with a 10 ms deadline, multiple 128-byte messages from different publishers can be packed into a single UDP datagram, spreading the 28-byte IP+UDP header cost across several samples.

Figure 4: Illustration of Deadline-based Payload Aggregation. Samples are aggregated to the earliest deadline of a given sample, then sent and the buffer emptied. Source: David Philipp Klüner, Stefan Kowalewski

illustrates how samples are batched up to the earliest deadline, then flushed—ensuring real-time guarantees while improving efficiency.

One limitation: multicast traffic still uses standard RTPS headers, as a single packet cannot carry distinct StreamIDs for multiple receivers. The authors note this as a target for future work.

Why This Changes Things

At first glance, saving a few kilobits per second might seem trivial in an era of gigabit networks. But in the domains where DDS operates—automotive, robotics, industrial IoT—bandwidth is often constrained, shared, and mission-critical.

Consider a modern software-defined vehicle (SDV). Inside, dozens of electronic control units (ECUs) communicate over in-vehicular networks (IVNs), many of which still rely on CAN buses with bandwidths measured in kilobits per second. Even in high-end vehicles using Ethernet backbones, the sheer volume of sensor data—from cameras, lidar, radar, and IMUs—can saturate links, especially as autonomy levels increase.

Every byte saved by StreamRTPS is a byte that can carry more sensor data, enable higher update rates, or reduce the need for costly hardware upgrades. In electric vehicles, reduced network activity could even translate to lower power consumption—a small but meaningful contributor to range.

The same logic applies to robotics. Swarms of drones or warehouse robots must coordinate in real time, often over lossy wireless links. Reducing protocol overhead means more room for perception, planning, and control data—without increasing latency or jitter.

But perhaps the most profound implication is cultural. For years, the embedded systems community has treated protocol overhead as a fixed cost—a tax paid for interoperability. StreamRTPS challenges that assumption, showing that even mature, standardized protocols can be optimized without breaking compatibility.

It also highlights a shift in how we think about efficiency. Traditional approaches focus on faster processors or higher-bandwidth links. StreamRTPS takes the opposite path: do less, smarter. By moving static data to setup time, batching small messages, and eliminating redundant control traffic, it achieves gains that hardware upgrades alone cannot match.

Compare this to Zenoh, a newer protocol designed for compactness.

Figure 1: Results comparing FastDDS, StreamRTPS, EmbeddedRTPS and Zenoh in a 4:2-4:2 topology with two senders and two receivers, two topics per sender at 25 Hz25\text{\,}\mathrm{Hz}. Error bars denote one standard deviation. Source: David Philipp Klüner, Stefan Kowalewski

shows that even optimized implementations of standard RTPS lag behind Zenoh in bandwidth efficiency. StreamRTPS closes much of that gap—without requiring a wholesale migration to a new middleware.

That backward compatibility is key. In safety-critical systems, replacing core communication stacks is risky and expensive. StreamRTPS offers a path to modernization without disruption—like upgrading the engine of a car while it’s still driving.

What’s Next

StreamRTPS is not a panacea. It works best for unicast, periodic, small-to-medium payloads—the bread and butter of real-time control—but offers diminishing returns for large, infrequent, or multicast traffic.

The authors identify several open questions. Can multicast be supported efficiently? Could StreamIDs be shared across multiple readers in a group? What happens under high packet loss or variable network conditions? And how do these gains translate to real-world deployments, where CPU, memory, and thermal constraints also play a role?

The current implementation is limited to unicast, but the authors suggest extending it with shared StreamIDs for multicast groups—a change that could unlock even greater savings in broadcast-heavy topologies.

Another frontier is integration with emerging technologies like Time-Sensitive Networking (TSN) and 5G V2X. In these environments, predictable timing and low jitter are paramount. StreamRTPS’s deadline-aware aggregation and suppressed heartbeats could complement TSN’s traffic shaping, creating a more efficient end-to-end pipeline.

From a standardization perspective, the next step is clear: propose StreamRTPS as an official extension to the RTPS specification. If adopted by OMG, it could become a universal feature across all DDS implementations—from FastDDS to RTI Connext to Eclipse CycloneDDS.

For developers, the open-source implementation on GitHub offers a ready path to experimentation. And for system architects, the message is clear: protocol efficiency is no longer a trade-off between performance and compatibility. With StreamRTPS, you can have both.

In a world increasingly run by machines talking to machines, the way they talk matters. StreamRTPS doesn’t just save bandwidth—it makes real-time systems leaner, faster, and more scalable. And sometimes, the most powerful innovations are the ones you don’t see.