ST 2110 in the WAN – Part 2

SMPTE 2110 is far from our industry’s first attempt at moving production to IP. A bunch of proprietary solutions have been available for years, and on top of this we’ve seen standardized approaches such as 2022-6 mapping SDI onto IP. But with SMPTE ST 2110 we finally get a standard designed from the ground up to break apart video, audio and ancillary data, enabling truly flexible workflows.

At Net Insight we get tons of questions about NMOS and SMPTE 2110 and how to handle the shift from SDI to IP and ST 2110, some of which we’re trying to answer in this series of articles. In the first article we concluded that the shift to IP and more specifically to SMPTE 2110 is about more than just new technology. But that doesn’t mean we can ignore the technical aspects, because even if we stick to today’s workflows and just replace SDI with SMPTE 2110 there will be substantial technical changes and challenges. And that’s why we in this article cover the basics of SMPTE 2110 and the NMOS specifications, providing a foundation for figuring out what new challenges arise in the WAN.

The big picture – elementary streams and asynchronous behaviour

First of all, SMPTE 2110 is designed to be video format agnostic, handling 720, 1080, 4k, progressive, interlaced, HDR, HFR and more. And there are standards for both compressed and uncompressed audio and video workflows, even though the first round of work has focused on uncompressed workflows. Which is why the discussion in the industry so far has been very much oriented around studios and production facilities. We are now determined to fill the gap around SMPTE 2110 in the WAN.

Comparison between SMPTE 2110, SDI, and SMPTE 2022-6

Compared to SDI and SMPTE 2022-6, which was simply mapping SDI onto IP, the big news is that SMPTE 2110 breaks apart audio, video and ancillary data into separate elementary streams. This is done to provide flexibility, allowing you to route and work on different streams independently. Having said that, ST 2110 also describes how to carry SDI (i.e. SMPTE 2022-6) as is when that makes more sense.

SMPTE 2110 Elementary Streams

And like any IP based production format, ST 2110 takes into consideration that the underlying infrastructure is no longer synchronous. The enabler for separating audio, video and data streams on an asynchronous infrastructure is timing, making sure that each elementary stream is time stamped and that timing information is carried over the top. In the case of ST 2110 this is done using PTP (IEEE 1588).

In addition to timing, another challenge of moving to asynchronous infrastructure is burstiness. With a synchronous infrastructure the concept of burstiness does not exist, as traffic is delivered in one continuous flow. With IP, that’s no longer the case. Being packet based, each device along the traffic path contains buffers that are not synchronized. That means each device and buffer acts independently, resulting in traffic being delivered in bursts rather than as a continuous flow. For this reason, ST 2110 defines several sender and receiver profiles describing how big bursts are accepted in different environments.

Finally, SMPTE 2110 really only describes an IP based media data plane. The control plane is left to its sibling, the set of NMOS specifications. They describe how devices on a network can detect each other, understand what streams are available, and how to connect two devices. The NMOS specifications are really the glue that makes an ST 2110 based infrastructure manageable and are in many cases even more interesting than ST 2110 in itself.

ST 2110 for audio (ST 2110-30 and ST 2110-31)

In SMPTE 2110 audio transport is built on AES67, specifying how to carry uncompressed 48kHz PCM audio. Up to 8 channels can be bundled in one stream and both 16- and 24-bit depth is supported. In addition to this the ST 2110-31 standard specifies how to transport compressed AES3 (AES/EBU) audio over IP.

With elementary streams, a key challenge for audio transport over the WAN is how to protect against loss. This is typically done using Forward Error Correction (FEC) and/or 1+1 protection, but FEC on low-bandwidth services such as audio introduces too much delay. The solution is WAN architecture that can group together multiple streams into a high bandwidth bundle, on which FEC can be applied.

ST 2110 for video (ST 2110-20 and ST-2110-22)

Besides the RTP wrapper, another new thing about how uncompressed video is carried is that only the active part of the image, i.e. the pixels actually used, is sent. Compared to SDI and SMPTE 2022-6 this leads to bandwidth savings in the range of 15-30%.

Defined to support resolutions up to 32x32k pixels, ST 2110 is future proof with regards to supporting coming high-resolution formats and specifications. Support for color modes and color depths are flexible and include HDR.

ST 2110 for ancillary data (ST 2110-40)

Ancillary data has been used for various reasons over the years, some tightly coupled with the video stream and some not. ST 2110-40 describes a generic way to encapsulate ancillary data in IP so that it can be transported independently of audio and video.

ST 2110 for timing, metadata and controlling burstiness (ST 2110-10 and ST 2110-21)

All ST 2110 essence streams are based on RTP, which is a proven technology for transporting time-critical data over IP using UDP packets. Each packet contains a time stamp, used in order to align multiple essence streams at the edges so that live switching can be done.

To synchronize devices in frequency and time, PTP (ST 2059 / IEEE 1588) is being used. What’s challenging about PTP is that it demands very low jitter to reach the accuracy needed in broadcast environments. In a studio where PTP goes across one or a few switches that are dedicated to live media traffic this is less of an issue. Especially as switches in studio environments tend to provide PTP support to improve accuracy. But over the WAN, where distances are longer and the number of hops bigger, PTP accuracy becomes a challenge. Our experience shows that you need a PTP aware WAN, which means that you either cannot lease infrastructure or that you need an overlay solution providing PTP support on top of non-PTP aware infrastructure.

Besides the use of RTP and PTP for timing, ST 2110-10 also describes how each stream has a set of metadata that tells the receiver how to interpret it. Metadata is described using the session description protocol, SDP. But the metadata information is actually delivered by a separate control system, described by the NMOS specifications.

Finally, ST 2110-21 specifies how to deal with the fact that IP is a bursty by nature, and that software-based solutions will be more and more common now that we use standard transport infrastructure. It describes a number of timing profiles, specifying how large packet bursts a receiver must be able to deal with. Note that with multiple timing profiles defined by the standard you could end up in situations where your received only accepts the “narrow” profile (4 packets in a burst), while your sender complies with the “wide” profile (20 packets in a burst). Meaning your receiver will drop packets when bursts sent are too big.

But more importantly, in WAN environments, how your sending device behaves is less relevant than how much jitter your network introduces. Bursts accumulate, and with multiple hops and often leased infrastructure the WAN tends to make flows a lot more bursty. Meaning you want WAN technology that smooths out burstiness and reduces it down to the levels defined by ST 2110.

NMOS for discovering, registering and managing media flows (IS-04 and IS-05)

As mentioned before the NMOS specifications describe a control plane that makes ST 2110 based infrastructure manageable and simpler to operate. With the IS-04 specifications, NMOS describes how devices can register with a shared registry and how they can query the registry for information about other devices. It supports both central registries and peer-to-peer discovery to allow for smaller setups. Sending and receiving devices register their capabilities, and sending devices such as cameras or encoders register the available flows so that receivers can pick them up.

And picking up flows is what IS-05 describes. It specifies how media flows can be set up or removed between a sending and receiving device, independently of the actual protocol used to carry the flow. It supports both unicast and multicast flows, and connections can be immediately set up or scheduled for the future.

When distributing production across a WAN, it means that your NMOS registry needs to know about devices and flows available at multiple locations. The solution tends to be a multi-domain setup, where a registry is available at each location, and where an overarching broadcast controller is connected to each of those registries to control each individual device and flow.

NMOS for controlling network connections (IS-06)

To make sure that broadcast controllers can talk to network controllers and request connectivity in a standardized way, NMOS has created the IS-06 specification. It covers how to discover network resources, how using network resources can be authorized and secured and how to monitor usage.

The broadcast controller gets to know about the network topology from the network controller, and it can create, modify and delete flows as well as unicast and multicast receivers. Setting up flows also includes reserving network resources needed, such as bandwidth, and setting flow priorities.

Multiple broadcast controllers can connect to one single network controller and network fabric, allowing for multi-tenancy setups where a broadcast controller only has access to certain resources.

In a distributed environment you typically have one network controller per location, managing the local switch fabric. On top of this, for the WAN you will need a dedicated WAN controller, managing resources and bandwidth. To be able to use leased or shared infrastructure you need a WAN solution that can reserve bandwidth and isolate services end-to-end.


SMPTE 2110 is designed to support more flexible workflows than in the past. It does so by breaking apart audio, video and data into separate elementary streams. Streams that are timed and synchronized using the PTP protocol. It also manages challenges inherent with moving to IP, such as burstiness.

Its sibling, the NMOS specifications, describe how devices can detect each other and available streams, how they set up flows from sender to receiver and how a broadcast controller can manage network resources when needed.

And as outlined throughout this article, the shift from SDI to SMPTE 2110 does come with a set of challenges. Especially in wide area networks. Challenges which we will be looking closer at in the next post in this series on SMPTE 2110.





A suite of standards for handling digital media in an IP network. ST 2110 defines the separation of audio, video and ancillary data into elementary streams and is intended for used within broadcast production and distribution facilities

The Society of Motion Picture and Television Engineers is a global professional association of engineers, technologists, and executives working in the media and entertainment industry.


SMPTE 2110 is a standard designed for flexible workflows by breaking apart video, audio, and ancillary data for independent routing and processing.

SMPTE 2110 is video format agnostic, handling various formats like 720, 1080, 4K, and HDR in both compressed and uncompressed workflows.

NMOS specifications provide the control plane, managing device detection, stream availability, and connecting devices in ST 2110 infrastructures.

SMPTE 2110 uses PTP (IEEE 1588) for timing and synchronization, with metadata described using SDP and managed by NMOS specifications.

Challenges include ensuring PTP accuracy over longer distances and managing burstiness inherent in IP-based media data planes​​.