SMPTE 2022-1 2D Forward Error Correction in GStreamer
Various mechanisms have been devised over the years for recovering from packet loss when transporting data with RTP over UDP. One such mechanism was standardized in SMPTE 2022-1, and I recently implemented support for it in GStreamer.
TL;DR:
gst-launch-1.0 \
rtpbin name=rtp fec-encoders='fec,0="rtpst2022-1-fecenc\ rows\=5\ columns\=5";' \
uridecodebin uri=file:///path/to/video/file ! x264enc key-int-max=60 tune=zerolatency ! \
queue ! mpegtsmux ! rtpmp2tpay ssrc=0 ! rtp.send_rtp_sink_0 \
rtp.send_rtp_src_0 ! udpsink host=127.0.0.1 port=5000 \
rtp.send_fec_src_0_0 ! udpsink host=127.0.0.1 port=5002 async=false \
rtp.send_fec_src_0_1 ! udpsink host=127.0.0.1 port=5004 async=false
gst-launch-1.0 \
rtpbin latency=500 fec-decoders='fec,0="rtpst2022-1-fecdec\ size-time\=1000000000";' name=rtp \
udpsrc address=127.0.0.1 port=5002 caps="application/x-rtp, payload=96" ! queue ! rtp.recv_fec_sink_0_0 \
udpsrc address=127.0.0.1 port=5004 caps="application/x-rtp, payload=96" ! queue ! rtp.recv_fec_sink_0_1 \
udpsrc address=127.0.0.1 port=5000 caps="application/x-rtp, media=video, clock-rate=90000, encoding-name=mp2t, payload=33" ! \
queue ! netsim drop-probability=0.05 ! rtp.recv_rtp_sink_0 \
rtp. ! decodebin ! videoconvert ! queue ! autovideosink
Specification
SMPTE 2022
From Wikipedia:
SMPTE 2022 is a standard from the Society of Motion Picture and Television Engineers (SMPTE) that describes how to send digital video over an IP network. Video formats supported include MPEG-2 and serial digital interface. The standard was introduced in 2007 and has been expanded in the years since.
The work presented in this post is the implementation of the first part of that standard, 2022-1. 2022-5 is another notable part dealing with Forward Error Correction for very high bitrate RTP streams.
XOR
The core mechanism at the heart of SMPTE 2022-1 and other FEC mechanisms is
usage of XOR (^
). Given a set of N values, it is possible to recover any
of the values provided all the other values and the result of their xoring
together have been received.
It is logically equivalent and probably easier to think of to retrieving the missing value when the sum of all the values has been received, for example given 3 values 1, 2 and X, and their sum 6, we can see that X must be:
X = 6 - 2 - 1
X = 3
Usage of XOR is a neat trick that makes for a more computer-friendly mechanism: while an addition-based mechanism would require 9-bit to protect two 8-bit values, 10-bit to protect 4, etc., the required size with XOR remains a constant 8-bit.
An RTP payload is just a collection of 8-bit values, so it follows that the payload of FEC packets protecting N RTP packets consists of an equivalent amount of 8-bit values.
Other fields of the standard RTP header are protected similarly, such as the payload type or the timestamp, and the payload length of the media packets as well, allowing the mechanism to be applied to media packets of varying lengths.
Enter the (2D) matrix
A straightforward application of the mechanism presented above is to simply construct and transmit a FEC packet for each set of N consecutive media packets.
This works well enough when packet loss is truly random, but a common pattern of packet loss over UDP is burstiness, where packets may be transmitted without loss for some time, then suddenly a few consecutive packets go missing. It means that our mechanism will often fall short in such cases, as it relies on having at most one packet missing from a sequence of values.
A neat sophistication introduced in this standard and adopted in 2022-5 and flexfec is to think of packet sequences with an extra dimension, going from a linear approach:
M1 M2 M3 RF1 M4 M5 M6 RF2 ...
to a two-dimensional approach:
+--------------+
| M1 | M2 | M3 | RF1
| M4 | M5 | M6 | RF2
| M7 | M8 | M9 | RF3
+--------------+
CF1 CF2 CF3
Where M are the protected media packets, RF are the "row" FEC packets, applied to consecutive packets, and CF are the "column" FEC packets, applied to sets of packets separated by a fixed interval, in the example above 3.
Let's imagine some scenarios to see how this approach addresses bursty loss patterns:
If M2 and M9 are lost:
+--------------+
| M1 | X | M3 | RF1
| M4 | M5 | M6 | RF2
| M7 | M8 | X | RF3
+--------------+
CF1 CF2 CF3
They can both be recovered thanks to row FEC (RF1, RF3), but if M2 and M3 are lost in a burst, row FEC is now useless:
+--------------+
| M1 | X | X | RF1
| M4 | M5 | M6 | RF2
| M7 | M8 | M9 | RF3
+--------------+
CF1 CF2 CF3
That is where column FEC comes in handy, as M2 and M3 can still be recovered thanks to CF2 and CF3.
An interesting property of this scheme is that each dimension can complete the other:
+--------------+
| M1 | M2 | X | X
| M4 | M5 | X | RF2
| M7 | X | X | RF3
+--------------+
CF1 CF2 CF3
It appears that we have some heavy packet loss, and that some packets may simply not be recovered, for example M3 has its row FEC packet missing, and none of the media packets in its column have made it.
However all hope is not lost:
We first recover M8 thanks to column FEC, which means we can now recover M9 with row FEC. M6 is also recoverable with row FEC: M3 can now be recovered through column FEC! That's pretty neat.
As with many other "vague" problems, there isn't necessarily a perfect dimension for the matrix, it has to be determined empirically through trial and error, and potentially adapted depending on the particular network that data will be transported across.
For reference, AWS MediaConnect uses a 10 by 10 matrix, and in my testing with
the netsim element, a 5 by 5 matrix worked well to address a 5 percent packet
loss. netsim
isn't however a faithful representation of a typical unreliable
network, as when using its drop-probability
property packets will be randomly
dropped.
Repair window
As the intention behind column FEC is to recover from loss bursts, it would be counter-productive to send those FEC packets at the same time as the media packets they protect. SMPTE 2022-1 addresses this by specifying how to delay these packets, this is known in latter specs as the "Repair window".
Limitations
SMPTE 2022-1 requires FEC packets to have their SSRC field to zero, this makes multiplexing of multiple FEC streams impossible. As a consequence, it is often used with an MPEG-TS container, but nothing prevents from using it with other types of payload. SMPTE 2022-1 also prohibits usage of CSRC entries.
The maximum size of the 2D FEC matrix is limited to 255 by 255. This is of course more than sufficient for compressed formats, but too limiting for raw formats. SMPTE 2022-5 addresses this by turning the row and column fields into 10-bit values, making it suitable for usage with very high bandwidth formats (> 3 Gbps).
Implementation
Positioning in rtpbin
The decoder element is positioned upstream of rtpjitterbuffer in GStreamer's rtpbin. It exposes one always sinkpad for receiving media packets, and up to two request sink pads for receiving FEC packets.
All incoming packets are stored for the duration of a configurable repair
window (size-time
property).
My initial approach was to perform recovery upon retransmission requests
emitted by rtpjitterbuffer
, but this approach had multiple drawbacks:
-
do-retransmission
had to be set on the jitterbuffer, which would have been confusing when retransmission was not actually required. -
rtpjitterbuffer
will emit retransmission requests pretty agressively, and potentially multiple times for the same packet. This would have caused unnecessary processing in the decoder.
Instead, the approach I picked was to proactively reconstruct missing packets as soon as possible. When a FEC packet arrives, we immediately check whether a media packet in the row / column it protects can be reconstructed.
Similarly, when a media packet comes in, we check whether we've already received a corresponding packet in both the column and row it belongs to, and if so go through the first step listed above.
This process is repeated recursively, allowing for recoveries over one dimension to unblock recoveries over the other.
The encoder exposes one sink pad, one always source pad, and two sometimes source pads for pushing FEC packets. It is placed near the tail of rtpbin.
Configuration options
The only property exposed by the decoder is, as mentioned above, the duration for which to store packets, which should be at least as long as the repair window.
The encoder on the other hand is a bit more configurable, with properties
to set the size of the repair matrix that cannot be changed while PLAYING
,
and properties to selectively disable row or column FEC while PLAYING
,
allowing applications to adapt their packet loss / bandwidth usage strategy
dynamically, based on evolving network conditions.
Finally, properties have been added in rtpbin to allow specifying a per-session element factory for sending and receiving FEC from the command line. These come as a complement to the already existing signals, which are still used as a fallback.
Usage
The following pipelines put all this work together, with a sender side that can be started with:
gst-launch-1.0 \
rtpbin name=rtp fec-encoders='fec,0="rtpst2022-1-fecenc\ rows\=5\ columns\=5";' \
uridecodebin uri=file:///path/to/video/file ! x264enc key-int-max=60 tune=zerolatency ! \
queue ! mpegtsmux ! rtpmp2tpay ssrc=0 ! rtp.send_rtp_sink_0 \
rtp.send_rtp_src_0 ! udpsink host=127.0.0.1 port=5000 \
rtp.send_fec_src_0_0 ! udpsink host=127.0.0.1 port=5002 async=false \
rtp.send_fec_src_0_1 ! udpsink host=127.0.0.1 port=5004 async=false
and a receiver side with:
gst-launch-1.0 \
rtpbin latency=500 fec-decoders='fec,0="rtpst2022-1-fecdec\ size-time\=1000000000";' name=rtp \
udpsrc address=127.0.0.1 port=5002 caps="application/x-rtp, payload=96" ! queue ! rtp.recv_fec_sink_0_0 \
udpsrc address=127.0.0.1 port=5004 caps="application/x-rtp, payload=96" ! queue ! rtp.recv_fec_sink_0_1 \
udpsrc address=127.0.0.1 port=5000 caps="application/x-rtp, media=video, clock-rate=90000, encoding-name=mp2t, payload=33" ! \
queue ! netsim drop-probability=0.05 ! rtp.recv_rtp_sink_0 \
rtp. ! decodebin ! videoconvert ! queue ! autovideosink
Future prospects
More FEC!
Algorithmically-speaking, SMPTE 2022-1 is similar to flexfec. While it is based on RFC 2733, flexfec is based on RFC 5109 and lifts some of the constraints I listed earlier. Flexfec is not yet a final RFC, but it can already be used as a webRTC protection mechanism with Google Chrome, and should eventually obsolete ulpfec.
If you are interested in building upon my work to implement flexfec or SMPTE 2022-5 support in GStreamer, or are willing to sponsor me for doing so, don't hesitate to shoot me a mail at mathieu@centricular.com!
Network-aware heuristics
Adapting configuration and usage of the various packet loss recovery / mitigation mechanisms is a hard problem in and of itself, and GStreamer currently leaves this as an exercise to the reader. We are gathering all the pieces of the puzzle however:
-
Retransmission has been supported for quite some time already (courtesy of Julien Isorce, then working at Collabora)
-
Support for Transport Wide Congestion Control has been merged recently (courtesy of Havard Graff at Pexip)
-
Various mechanisms are available for Forward Error Correction
-
rtpbin
collects all sorts of statistics giving us a clear picture of current network conditions -
Many of our encoders support dynamically changing their bitrate
Designing and implementing a solution for tying all these features together would be a very interesting undertaking, and make for a more enjoyable out-of-the-box RTP experience.
I hope this was instructive, curious about comments / corrections (I don't give hexadecimal dollars, I'd get bankrupt real quick).
The results of the search are