Forward Error Correction in Real-time Video Streaming Applications

(1)

Forward Error Correction in

Real-time Video Streaming

Applications

Johan Westerlund

January 26, 2015

Master’s Thesis in Computing Science, 30 credits

Supervisors at CS-UmU: Thomas Nilsson and Jerry Eriksson

Examiner: Fredrik Georgsson

Ume˚

a University

Department of Computing Science

SE-901 87 UME˚

A

(2)

(3)

Abstract

The Internet is rapidly becoming the dominant platform for video contribution and distribution, but today’s Internet provides a best-effort service without any guarantee of quality. Internet video streaming has to cope with this lack of guarantees. Due to congestion and the heterogeneous infrastructure of the Internet a video transmission will be plagued by packet loss, variability in throughput, and latency. To mitigate some of these these problems f.o.r.w.a.r.d e.r.r.o.r c.o.r.r.e.c.t.i.o.n (F.E.C) can be employed.

The aim of this thesis is to present a theoretical introduction to error correcting codes, implement and integrate f.o.r.w.a.r.d e.r.r.o.r c.o.r.r.e.c.t.i.o.n into an existing video streaming applica-tion, and test the implementation in both simulated and real-world conditions. Parameters of the F.E.C algorithm will be tweaked and guidelines for parameter selection will be devel-oped.

Felkorrigering i realtidsstr¨

ommande videotill¨

ampningar

Sammanfattning

Internet är p˚a väg att bli den viktigaste kanalen för kontribution och distribution av videosignaler, men dagens Internet lämnar inga garantier när det gäller kvalitet utan lever-erar sina tjänster efter bästa förm˚aga. Detta är n˚agot som m˚aste beaktas när man använder Internet som kommunikationskanal för strömmande video i realtid. Överbelastning, avbrott i förbindelsen eller fel p˚a utrustningen skapar problem i form av paketförluster och variationer i genomströmning och fördröjning. Föregripande felkorrigering (f.o.r.w.a.r.d e.r.r.o.r c.o.r.r.e.c.t.i.o.n, F.E.C) är ett sätt att minimera effekterna av dessa problem.

(4)

(5)

List of Figures

2.1 The IPS model . . . 4

2.2 IEEE 802.3 Ethernet frame . . . 5

2.3 Details of the IEEE 802.3 Ethernet frame header and footer . . . 6

2.4 RTP header structure . . . 8

3.1 Venn diagram of the 7,4 Hamming code . . . 15

3.2 Rate 1 2 8-state convolutional encoder, from the UMTS (3G) specification [11] . . . 17

3.3 Rate 1 3 8-state convolutional encoder . . . 18

3.4 8-state trellis diagram . . . 18

3.5 State transitions when encoding the message (10110) . . . 19

3.6 Path through the Trellis for the decoding of symbols (101)(111)(011)(010)(001) . . 19

3.7 First branch . . . 20

3.8 Lower path chosen at first branch. Now at second branch . . . 20

3.9 Lower path chosen at first branch, upper path chosen at second branch. Now at third branch . . . 20

3.10 Lower path chosen at first branch, upper path chosen at second branch, upper path chosen at third branch. Now at fourth branch . . . 21

3.11 Upper path chosen at first branch. Now at second branch. . . 21

3.12 Upper path chosen at first branch, lower path chosen at second branch. Now at third branch . . . 21

3.13 Upper path chosen at first branch, lower path chosen at second branch, lower path chosen at third branch. Now at fourth branch . . . 22

3.14 The Turbo coding/decoding principle . . . 23

3.15 Reed Solomon codes as a Fourier series . . . 28

3.16 Reed Solomon codes as a Fourier series - one error . . . 29

3.17 RTP packet arrangement. . . 30

3.18 Row FEC packet arrangement . . . 31

3.19 Column FEC packet arrangement . . . 32

3.20 Matrix FEC packet arrangement. . . 33

3.21 Irrecoverable packet loss . . . 34

(8)

vi LIST OF FIGURES

3.23 Irrecoverable error when using naive packet ordering . . . 36

3.24 Burst loss resilient packet order . . . 36

3.25 RTP FEC header structure . . . 37

4.1 Video transmission and reception . . . 40

4.2 How FEC was integrated with existing components . . . 40

4.3 Simulating packet loss using the Netdestroyer . . . 41

4.4 Netdestroyer test: L = 5, D = 10, overhead = 30% . . . 42

4.5 3G network setup . . . 43

4.6 3G packet loss . . . 44

4.7 3G burst loss . . . 45

4.8 Ice.net network setup . . . 46

4.9 Ice.net packet loss . . . 47

4.10 Ice.net burst loss . . . 48

4.11 KA-SAT network setup . . . 49

4.12 KA-SAT packet loss . . . 50

4.13 KA-SAT burst loss . . . 51

4.14 Experimental column/diagonal FEC mode . . . 53

A.1 Netdestroyer test: L = 5, D = 10, overhead = 30% . . . 61

A.6 Netdestroyer test: L = 8, D = 5, overhead = 32.5% . . . 66

(9)

List of Tables

3.1 Hamming(7, 4) lookup table. . . 13

3.2 Reed-Solomon encoding of message (16, 13, 7, 1) with code rate 4 7 . . . 27

4.1 3G parameter selection tests. . . 52

4.2 Ice.net parameter selection tests . . . 52

4.3 3G experimental FEC test . . . 53

(10)

(11)

Chapter 1

Introduction

All communication channels contain some degree of noise. The noise could be interference from various sources such as crosstalk from neighbouring communication channels, deteri-oration of the equipment, etc. This noise can interfere with the data beeing transmitted, in the same way as a verbal conversation in a noisy room becomes more difficult the more noise is present in the room. In order to hold a conversation in a noisy room you either raise your voice or repeat yourself a lot. The first method is what this thesis will focus on, namely, adding some redundancy when transmitting a message which will make it easier for the receiver to correctly receive and interpret the message in the presence of noise. This method is called f.o.r.w.a.r.d e.r.r.o.r c.o.r.r.e.c.t.i.o.n (F.E.C).

1.1 Problem statement

The goal of this thesis is to implement a scheme for f.o.r.w.a.r.d e.r.r.o.r c.o.r.r.e.c.t.i.o.n (F.E.C) according to the Pro-MPEG Code of Practice #3 developed by the Pro-MPEG Forum and to test the implementation under various conditions, both simulated and real-world, with the final goal of coming up with recommendations for suitable configuration choices under various conditions. Packet loss recovery rate, latency and bandwidth overhead will be analyzed for every set of parameters evaluated, and a general recommendation how to select these parameters are to be developed.

1.2 Overview

This thesis consists of four main parts organized into the following chapters:

– Chapter2gives the basic background information required in order to understand the problems one will encounter when trying to transmit video in real-time through the Internet.

– Chapter3will explain the underlying theory behind f.o.r.w.a.r.d e.r.r.o.r c.o.r.r.e.c.t.i.o.n and pro-vide descriptions of different methods that can be used to achieve the goals established by the problem statement.

– Chapter 4 describes how the problems were solved, and how the effectiveness of the solution was evaluated.

(12)

2 Chapter 1. Introduction

(13)

Chapter 2

Background

2.1 Real-time video streaming

Real-time video streaming has been around since the dawn of television. Video signals has traditionally been transmitted over expensive and specialized communications channels such as dedicated circuit switched networks and satellite contribution systems. In recent years there has been an immense increase in the use of real-time video streaming applications utilizing the Internet. Thanks to its widespread deployment, scalability and price perfor-mance, Internet video distribution is becoming the preferred solution by video distributors and broadcasting companies. At a fraction of the cost associated with the traditional con-tribution and discon-tribution methods the Internet is lowering the barriers to entry for live video streaming. But the Internet is a packet switched network using a best-effort delivery approach; The network does not provide any guarantees that data is delivered or that a user is given a guaranteed quality of service or a certain priority. In a best-effort network all users obtain best-effort service, meaning that they obtain unspecified variable bit rate and delivery time, depending on the current traffic load. Data is transmitted in discrete chunks called packets as opposed to beeing transmitted as a continuous signal.

Utilizing packet switched networks for video distribution in real-time, or streaming, is a challenge. Packet switched networks are commonly plagued by variable latency (jitter) and occasional packet loss, which poses problems for real-time video transmission. As more and more video is beeing distributed through the Internet, and also in higher resolutions than before, the technical challenges associated with packet switched video transmission are becoming a major concern.

The drawbacks of video transmission using the Internet:

– Network latency - One of the advantages of the Internet is at the same time one of the disadvantages of real-time video transmission. The network routes are not static. If a route becomes unavailable for some reason, the packets might be redirected to a longer route, or a route with higher latency. The packets are then received after they were expected. Constant latency is crucial in real-time video transmission.

– Packet order - In the same way that packets might suddenly take a longer route to reach their destination, the opposite can occur. A shorter route or a route with lower latency might become available. This can lead to the unfortunate scenario where packets are received prior to their expected arrival time. A packet might even arrive before a

(14)

4 Chapter 2. Background

packet which was transmitted earlier, out-of-order. This is usually unacceptable in a video transmission stream.

– Packet loss - Faulty equipment, congested routers or other problems can lead to com-plete loss of a packet. In the widely used Ethernet protocol, the entire packet is dropped if a single bit of the packet is corrupted. Received data is guaranteed to be correct, but a lost packet will degrade video quality significantly.

Since low latency and predictable latency is considered more important than video qual-ity in most real-time applications, transmission of lost packets at the request of the re-ceiver is usually not an option since latency must be kept under control and re-transmission also requires a two-way communication channel, which might not always be available.

2.2 IP networks

Communication in IP networks uses the Internet Protocol Suite (IPS), a set of core proto-cols used on the Internet. This set of protoproto-cols is commonly just called TCP/IP, named from its two most important protocols, the Transmission Control Protocol (TCP) and the Internet Protocol (IP). The Internet Protocol Suite is typically visualized in the form of four abstraction layers as shown in figure2.1, or in seven abstraction layers called the Open Systems Interconnection model identified as ISO/IEC 7498-1.

Application layer Transport layer Internet layer Link layer Host A Application layer Transport layer Internet layer Link layer Host B Link layer Router Link layer Router

Internet layer Internet layer Application on host A transmitting to application on host B

Figure 2.1: The IPS model

Applications normally doesn’t need to bother about anything other than the application layer. The layers below are managed by the operating system and the network interface card.

2.2.1 Link layer

(15)

2.2. IP networks 5

(LAN). A commonly deployed type of link layer is Ethernet, developed at Xerox PARC between 1973 and 1974.

Ethernet - IEEE 802.3

Wired Ethernet, or IEEE 802.3, is a popular networking technology for local area networks. Ethernet encapsulats the data to be transmitted in discrete link layer packets, called frames. [12] An abstraction of a Ethernet frame is shown in figure2.2and a more detailed descrip-tion of the link layer fields of the frame is shown in figure 2.3. Two important things to observe is the maximum length of payload data that can fit within a frame, called the max-imum transmission unit (MTU), and the last field of the frame, the frame check sequence. The frame check sequence is a Cyclic Redundancy Check (CRC) code, which is an error-detecting code commonly used to detect data corruption in digital networks, compression and archiving file formats, and storage devices. If the receiving host detects a corrupted frame, the entire frame is discarded without any notification to applications waiting for data nor is any notification sent back to the transmitting host. This must be kept in mind when a transport protocol such as the User Datagram Protocol (UDP) is used; The transmitting application must keep the packets small enough to fit within the MTU, and the receiving application must be able to cope with packet loss.

Application layer Transport layer

header

Transport layer data

Internet layer data Internet layer

header Link layer

header

Link layer data Link layer

footer Frame

(16)

Preamble

Start Frame Delimiter (0xAB)

Destination address

Source address

Length/type

Frame Check Sequence 7 bytes 1 byte 6 bytes 6 bytes 2 bytes 46 - 1500 bytes Data... Pad... 4 bytes

Bytes within frame are transmitted top to bottom

Bits within frame transmitted LSB to MSB

Figure 2.3: Details of the IEEE 802.3 Ethernet frame header and footer

The MTU of an Ethernet frame is typically 1500 bytes, and that includes the lengths of the headers of the protocols at higher levels, but it does not include the Ethernet link layer header and footer. When designing a real-time video transmission application, care must be taken that the video data is split in appropriate places so that as much data as possible can be transmitted in one ethernet frame, while at the same time cause as few problems as possible for the receiver if a packet gets lost.

2.2.2 Internet layer

The Internet Protocol (IP) is the main communications protocol used on IP networks such as the Internet. It is the primary protocol used in the Internet layer and has the task to deliver packets from source hosts to destination hosts based on their IP addresses. The structure and contents of the IP header is not really important when designing real-time video transmission applications; The header contains source and destination addresses and other various flags and parameters typically maintained by the operating system. The only important thing to remember is that the overhead added by the IP header is 20 bytes. [3] This is important when calculating the packet size so that the MTU is not exceeded.

2.2.3 Transport layer

UDP

(17)

2.2. IP networks 7

can send messages, in this case referred to as datagrams, to other hosts on an Internet Protocol (IP) network without requiring prior communications to set up special transmission channels or data paths. [6]

UDP uses a simple transmission model without implicit handshaking dialogues for pro-viding reliability, ordering, or data integrity. Thus, UDP provides an unreliable service and datagrams may arrive out of order, appear duplicated, or go missing without notice. UDP assumes that error checking and correction is either not necessary or performed by the ap-plication, avoiding the overhead of such processing at the network interface level. Although, the UDP protocol does use a simple checksum to verify data integrity. Time-sensitive ap-plications such as video conferencing and games, often use UDP because dropping packets is preferable to waiting for packets to be re-transmitted. Re-transmission is almost never an option in a real-time communication.

The overhead added by the UDP header is 16 bytes. UDP Lite

UDP Lite is a variation of UDP that allows a partially damaged packet to be delivered to an application rather than beeing discarded in its entirety. [9] The amount of data that should be protected by the UDP checksum field can be specified by the transmitting application. A single bit error in a UDP packet results in the entire packet beeing dropped. The justification for UDP Lite is to let the the receiving application correct or conceal errors rather than discarding an entire packet of data since the application has a better understanding of the significance of the data.

Most modern link layers has a strong CRC protection mechanism and will discard dam-aged frames making the use of UDP Lite difficult. The link layer must be aware of the type of data carried by the transport layer in order to make effective use of UDP Lite, requiring specially modified device drivers. Support for UDP Lite was added to the Linux kernel in February 2007, but the overall level of UDP Lite support in equipment and other operating systems is currently very poor.

TCP

The Transmission Control Protocol (TCP) is also one of the core protocols of the Internet Protocol Suite. TCP provides reliable, ordered delivery of a stream of bytes from a program on one computer to another program on another computer. [4] TCP is used by almost all applications where simplicity and reliability is more important than low latency, such as HTTP/WWW, email, remote administration and file transfers. Unlike UDP, the TCP protocl will guarantee that data is received in the correct order, or not at all.

No real-time video application known to the author at this time uses the TCP protocol. Many near-realtime applications do, however, e.g. live TV streaming to consumers where a 10 second delay might be acceptable.

2.2.4 Application layer

RTP

(18)

and RTP also includes a 32-bit timestamp which the receiver can use to process the incoming packets at the correct pace. The use of RTP solves all of the challenges associated with UDP mentioned previously, except packet loss, which is what f.o.r.w.a.r.d e.r.r.o.r c.o.r.r.e.c.t.i.o.n is meant to solve. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

| synchronization source (SSRC) identifier |

+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+

| contributing source (CSRC) identifiers |

| .... |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 2.4: RTP header structure

The fields of the RTP header shown in figure2.4:

– Version - This field identifies the version of RTP. The version defined by RFC 3550 is two (2).

– Padding - If the padding bit is set, the packet contains one or more additional padding bytes at the end which are not part of the payload.

– Extension - If the extension bit is set, the fixed header must be followed by exactly one header extension, with a format defined in Section 5.3.1 of RFC 3550.

– CSRC count - The CSRC count contains the number of CSRC identifiers that follow the fixed header.

– Marker - The marker bit is intended to allow significant events such as video frame boundaries to be marked in the packet stream.

– Payload Type - This field identifies the format of the RTP payload and determines its interpretation by the application.

– Sequence number - The sequence number increments by one for each RTP data packet sent, and may be used by the receiver to detect packet loss and to restore packet ordering.

– Timestamp - The timestamp reflects the sampling instant of the first byte in the payload. The sampling instant is derived from a clock which increments monotonically and linearly in time to allow synchronization and jitter calculations by the receiver. – SSRC - The SSRC field identifies the synchronization source. This identifier should

be chosen randomly, with the intent that no two synchronization sources within the same RTP session will have the same SSRC identifier.

(19)

2.2. IP networks 9

MPEG transport stream

MPEG Transport Stream (MPEG-TS) is a standard format for transmission of interleaved video and audio, and is used by many commercial and widely deployed broadcast systems, such as DVB and ATSC. MPEG-TS consists of a stream of packets where each packet is 188 bytes long including the header and payload. [5] The video transmission equipment that will be used to test the f.o.r.w.a.r.d e.r.r.o.r c.o.r.r.e.c.t.i.o.n in a later chapter produces an MPEG transport stream.

(20)

(21)

Chapter 3

Forward Error Correction

F.o.r.w.a.r.d e.r.r.o.r c.o.r.r.e.c.t.i.o.n (F.E.C) is the process of systematically generating and adding redundant information to a message. If designed correctly, the redundancy will allow the receiver to detect and correct a limited number of errors occuring anywhere in the message without the need to ask the sender for additional data, hence the communications channel can be one-way.

3.1 History

The Bell Model V computer was an electromechanical relay-based computer with cycle times in seconds. Punched cards provided the input, which would invariably have read errors. The machine itself was capable of detecting errors, but not correcting them. When an error was detected, a light would flash so that the operators could correct the error. During after-hours periods and on weekends when there were no operators, the machine would simply stop executing the job if errors were found, and move on to the next job.

Richard Hamming was an American mathematician working at Bell Labs at the time, programming the Model V computer. He often worked on weekends and grew frustrated with having to restart his programs from scratch every time an error was found due to the unreliability of the card reader. He developed a vast array of powerful error-correcting algorithms over the next few years, and in 1950 he published what is now known as the Hamming Code, which is still widely used today in applications such as Error Checking and Correcting (ECC) computer memory (RAM) modules, because of its simplicity.

3.2 Error correcting codes

F.o.r.w.a.r.d e.r.r.o.r c.o.r.r.e.c.t.i.o.n is accomplished by adding redundancy to a message before trans-mitting the message over a noisy communications channel. The message itself may appear unmodified in the data beeing transmitted, in which case the error correcting code is said to be systematic, while error correcting codes that do not include the original message in its output are non-systematic. The proportion of the output data stream that is considered useful is called the code rate. That is, if the code rate is k

n that means that for every k

symbols (e.g. bits, bytes, words or pakcets) of data that the error correcting codes takes as input, it will generate n symbols of output, of which n − k symbols are redundant. All error correcting codes adds some overhead, and the overhead can be computed as n

−k

k . An

(22)

12 Chapter 3. Forward Error Correction

head of 0.25 means that 25% more bandwidth is required to transmit the original symbols together with the redundant symbols. The code rate is always less than or equal to 0, and the overhead is always greater than or equal to 0.

3.3 Repetition codes

Consider an alphabet {A, B, C, D}. We want to send a letter across a noisy channel that has a probability 10% probability of error. If we want to send C, for example, then there will be a 90% chance that the symbol received at the other end of the channel is C. This leaves too large a chance of error. To shift the odds to our advantage, we repeat the symbol three times, thus sending CCC. Suppose one error occurs and the received word is CBC, we can then assume that the symbol that occurs most frequently is the actual symbol, namely C. The probability of the correct message beeing found is the probability that all three letters are correct plus the probability that exactly one of the three letters is wrong:

(0.9)3_{+ 3(0.1)(0.9)}2_{= 0.972}

which leaves a significantly smaller chance of error; Only 2.8% instead of 10%. The code rate is 1

3 which means that there is an overhead of 200%, which is unacceptable in most

applications but might be acceptable in some.

The two most important concepts when designing a code are error detection and e.r.r.o.r c.o.r.r.e.c.t.i.o.n. If there are at most two errors, the repetition code will allow us to detect that errors have occured. If the received message is CBC, then there could be either one error from CCC or two errors from BBB, we cannot tell which. If at most one error has occured, then we can correct the error and deduce that the message was CCC. Note that if we used only two repetitions instead of three, we could detect the existence of one error, but we could not correct it; did CB come from BB or CC?

3.4 Hamming codes

The Hamming codes are an important class of error detecting and correcting codes. They can detect the presence of up to two bit errors, and correct one single bit error. The original Hamming code takes a message of 4 binary bits (s1, s2, s3, s4) and encodes them

as a codeword of 7 binary bits (p1, p2, s1, p3, s2, s3, s4) by arranging the message bits in a

vector and multiplying the vector by the matrix

(23)

3.4. Hamming codes 13

For example, the message 1100 becomes           1 1 0 1 1 0 1 1 1 0 0 0 0 1 1 1 0 1 0 0 0 0 1 0 0 0 0 1               1 1 0 0     ≡           0 1 1 0 0 1 1           (mod 2)

To speed up computations for short codes, a simple lookup table can be used to avoid matrix-vector multiplication. See table3.1.

Table 3.1: Hamming(7, 4) lookup table

Message Codeword 0000 0000000 0001 1101001 0010 0101010 0011 1000011 0100 1001100 0101 0100101 0110 1100110 0111 0001111 1000 1110000 1001 0011001 1010 1011010 1011 0110011 1100 0111100 1101 1010101 1110 0010110 1111 1111111

Decoding is performed using the Venn diagram shown in figure3.1by placing the code-word bits at the corresponding position in the diagram and then in every circle compute the parity bit from all four bits in that circle. Any circle with an even parity is an indication that the bits in that circle are assumed to be correct. [2]

An example; Let’s assume that codeword 0010110 was transmitted, but due to an un-known one-bit error the codeword received at the other end of the transmission reads 0000110 which means that

(24)

We put the read-out into the Venn diagram and see that: – the number of 1s in the bottom left circle is 1, odd parity. – the number of 1s in the top circle is 1, odd parity.

– the number of 1s in the bottom right circle is 2, even parity.

From these three circles we can conclude that the error must not lie in the bottom right circle, but lies in both the bottom left and the top circle. This leaves s1 as the only possibility,

since s1 is the only point lying in the bottom left and top circles, but not in the bottom

right circle. This means that s1(the third bit in the received codeword) is wrong and should

be corrected to a 1. For short Hamming codes one can also use a lookup table and simply search for the codeword in the table with the shortest Hamming distance from the received codeword. The binary Hamming distance between a and b is defined as the number of ones (population count) in aXORb and if an entry is found in the table where the Hamming distance to the received codeword is 0, we know that there were no errors. A Hamming distance of 1 means that there is a one-bit error, and a minimum Hamming distance of 2 or more to every entry in the table means that there were multiple errors and that they can not be corrected. The received codeword in the example has a Hamming distance of 1 from the entry on the 15th row in the table3.1.

Another example; The same codeword was transmitted again, but this time the read-out at the receiving end was 0000111 which gives us

s1= 0 s2= 1 s3= 1 s4= 1 p1= 0 p2= 0 p3= 0

Using the diagram we can see that the top circle has an even parity, the bottom left circle has an even parity and the bottom right circle has an odd parity. Hence we know that there are two bit errors but we cannot know where they’re located. Due to the limited redundancy that Hamming codes add to the data, they can only detect and correct errors when the error rate is low. This is the case in computer memory (ECC memory), where bit errors are extremely rare and Hamming codes are widely used.

The venn diagram can also be represented as a matrix H =   1 0 1 0 1 0 1 0 1 1 0 0 1 1 0 0 0 1 1 1 1  

and by multiplying H with the codeword we get

(25)

3.4. Hamming codes 15

which is the null vector. This means that no error was detected.

If we multiply H with the corrupted vector from the first example we get

H =   1 0 1 0 1 0 1 0 1 1 0 0 1 1 0 0 0 1 1 1 1             0 0 0 0 1 1 0           ≡   1 1 0   (mod 2)

which corresponds to the third column of H and consequently tells us that the error is located in the third bit of the received codeword. And when using the corrupted vector from the third example we get

H =   1 0 1 0 1 0 1 0 1 1 0 0 1 1 0 0 0 1 1 1 1             0 0 0 0 1 1 0           ≡   0 0 1   (mod 2)

which corresponds to the fourth column of H, but the fourth bit of the codeword is not a data bit and hence we can not correct the error.

The code rate of the original Hamming code is 4

7 which gives an overhead of 75%.

p1 p2 p3 s1 s4 s2 s3

(26)

3.5 Parity check

A popular method of error detection used by for example RS-232 serial communication is the parity check code. Suppose we want to send a message of 7 bits. An eight bit is then appended so that the number of nonzero bits is even. For example, the message 0110010 becomes 01100101, and the message 1100110 becomes 11001100. An odd number of corrupted bits during transmission is immediately discovered since the message received will have an odd number of nonzero bits. However, it is impossible to tell which bit is incorrect since an error in any bit could have yielded the odd number of nonzero bits, and two or any even number of corrupted bits won’t be detected at all at the receiving end.

Simple parity check codes can also be used to correct errors under certain circumstances. The RAID-4 and RAID-5 storage technologies are based on a parity check code, but since the hard drives used in a RAID system has error detection built-in which means that the location of faulty bits are known, the extra parity bits can actually be used to re-create the corrupted data.

3.5.1 Two-dimensional parity code

The parity check code can be used to design a better code that can detect more than one corrupt bit, and also correct an error of one bit. The two-dimensional parity code arranges the data into a two-dimensional matrix, and then parity bits are computed along each row and column. [15]

To demonstrate the code, suppose we want to encode the 20 data bits 10011011001100101011. We arrange the bits into a 4 × 5 matrix

1 0 0 1 1

0 1 1 0 0

1 1 0 0 1

0 1 0 1 1

and then calculate the parity bits along the rows and columns. This results in the 5 × 6 matrix 1 0 0 1 1 1 0 1 1 0 0 0 1 1 0 0 1 1 0 1 0 1 1 1 0 1 1 0 1

which is transmitted as a stream of 29 bits, 10011101100011001101011101101.

Suppose that a single bit error occurs at the bit in the third row and fourth column of the matrix, or the 16:th bit in the message. The receiver arranges the received bits into a 5 × 6 matrix and obtains:

1 0 0 1 1 1

0 1 1 0 0 0

1 1 0 1 1 1

0 1 0 1 1 1

0 1 1 0 1

The parity of the third row and fourth column is incorrect, so this locates the error as occuring at the third row and fourth column.

(27)

3.6. Convolutional codes 17

columns will indicate the existence of two bit errors. However, in this case it is not possible to correct the errors, since there are several possible locations for them.

For example 1 0 0 1 1 1 0 1 1 0 0 0 1 1 0 1 0 1 0 1 0 1 1 1 0 1 1 0 1 and 0 0 0 1 1 1 0 1 1 0 0 0 1 1 0 1 1 1 0 1 0 1 1 1 0 1 1 0 1

3.6 Convolutional codes

A convolutional code is a type of binary error-correcting code that generates parity symbols by feeding each input bit to a serial-in, parallell-out (SIPO) shift register where the outputs are added (modulo 2) to produce the output symbol. Convolutional codes can be recursive or non-recursive, and the recursive codes are typically systematic while non-recursive codes are usually non-systematic, although this is not a requirement. [14] Figure 3.2 shows a recursive systematic convolutional encoder with a code rate of 1

2. The boxes labelled D are

the internal delay lines of the shift register. Once a new input bit arrives the current bit in each delay line is shifted one step to the right.

D D D

Input

Output 1

Output 2

Figure 3.2: Rate 1

2 8-state convolutional encoder, from the UMTS (3G) specification [11]

Figure3.3shows a non-recursive, non-systematic convolutional encoder with a code rate of 1

(28)

D

Input

Output 1

Output 2

Output 3

Figure 3.3: Rate 1

3 8-state convolutional encoder

3.6.1 Decoding

A convolutional encoder is a finite state machine with 2n

states where n is the number of delay lines in the shift register. Decoding a convolutional code is equivalent to finding unknown parameters of a Hidden Markov Model, and to visualize how to decode the code produced by the encoder in figure3.3a Trellis diagram is used. See figure3.4.

00 01 11 10 00 01 11 10 00 01 11 10 00 01 11 10 00 01 11 10

Figure 3.4: 8-state trellis diagram

(29)

3.6. Convolutional codes 19

Solid lines represent a state transition from state s to state s + 1 when a 0 was the input value in state s, while a dashed line represents a transition to state s + 1 when 1 was the input value in state s. One can see that not all transitions are possible, e.g. it is not possible to transition from a state of ¨10¨to a state of ¨00¨or even stay in state ¨10¨since the bits are shifted one step to the right at each state transition. If a received sequence of symbols does not fit the Trellis diagram, then there were transmission errors and the decoder must find the nearest sequence that is fitting the graph.

Example: the binary message (10110) is encoded using the encoder shown in figure 3.3 to produce output symbols (101)(111)(011)(010)(001). The state transitions of the encoder is shown in figure3.5and the path through the Trellis is shown i figure 3.6.

D 1 D 0 D 0 1 1 0 1 D 0 D 1 D 0 1 1 1 0 D 1 D 0 D 1 0 1 1 1 D 1 D 1 D 0 0 0 1 1 D 0 D 1 D 1 0 1 0 0

Figure 3.5: State transitions when encoding the message (10110)

01 11 10 00 01 11 00 11 10 00 01 11 00 10 01 10 101 111 011 010 00 01 11 10 001 01 11 10 00

Figure 3.6: Path through the Trellis for the decoding of symbols (101)(111)(011)(010)(001)

(30)

20 Chapter 3. Forward Error Correction 01 11 10 00 01 11 00 11 10 00 01 11 00 10 01 10 101 000 011 010 00 01 11 10 001 01 11 10 00 111, d=3 010, d=1

Figure 3.7: First branch

The blue values are the received symbols, and the red values are the possible symbols to chose from in order to fit the Trellis graph. The Hamming distance d is also shown together with each choice. In figure3.7 the bottom path has a shorter Hamming distance, so let’s try that one first. Figure3.8illustrates the possibilities at the next step.

01 11 10 00 01 11 00 11 10 00 01 11 00 10 01 10 101 000 011 010 00 01 11 10 001 01 11 10 00 010, d=1 001, d=1+1 100, d=1+3

Figure 3.8: Lower path chosen at first branch. Now at second branch

The upper path has a Hamming distance of 2 while the lower path has a distance of 4. Let’s continue on the upper path as shown in figure3.9.

01 11 10 00 01 11 00 11 10 00 01 11 00 10 01 10 101 000 011 010 00 01 11 10 001 01 11 10 00 010, d=1 001, d=1+1 110, d=1+1+1 011, d=1+3+1

Figure 3.9: Lower path chosen at first branch, upper path chosen at second branch. Now at third branch

(31)

3.6. Convolutional codes 21 01 11 10 00 01 11 00 11 10 00 01 11 00 10 01 10 101 000 011 010 00 01 11 10 001 01 11 10 00 010, d=1 001, d=1+1 110, d=1+1+1 000, d=1+1+1+1 101, d=1+1+1+1

Figure 3.10: Lower path chosen at first branch, upper path chosen at second branch, upper path chosen at third branch. Now at fourth branch

In figure 3.10 we have two possible choices, each leading to a path with a Hamming distance of 4. Now it is time to back-track to the first branch from figure3.7since that path has a Hamming distance of 3 which is shorter than 4. By selecting the upper path instead we arrive at figure3.11and from there the upper path has a Hamming distance of 5 while the lower path has a distance of 3. Let’s go with the lower path to figure3.12.

01 11 10 00 01 11 00 11 10 00 01 11 00 10 01 10 101 000 011 010 00 01 11 10 001 01 11 10 00 111, d=3 110, d=3+2 011, d=3+0

Figure 3.11: Upper path chosen at first branch. Now at second branch

At figure 3.12the lower path is the best option with a distance of 3. The upper path has a distance of 5. 01 11 10 00 01 11 00 11 10 00 01 11 00 10 01 10 101 000 011 010 00 01 11 10 001 01 11 10 00 111, d=3 110, d=3+0 111, d=3+0+2 011, d=3+0+0

Figure 3.12: Upper path chosen at first branch, lower path chosen at second branch. Now at third branch

(32)

graph with a total Hamming distance of 3, which is the best option so far and it is also the correct path. 01 11 10 00 01 11 00 11 10 00 01 11 00 10 01 10 101 000 011 010 00 01 11 10 001 01 11 10 00 111, d=3 110, d=3+0 011, d=3+0+0 001, d=3+0+0+0 100, d=3+0+0+2

Figure 3.13: Upper path chosen at first branch, lower path chosen at second branch, lower path chosen at third branch. Now at fourth branch

3.6.2 Soft decision decoding

The encoder produces a sequence of binary symbols. When receiving a digital message that has been transmitted across an analog medium (e.g. radio, telephone line, etc) the received signal usually end up as an analog voltage level, and if the level is higher than some threshold value it is interpreted as a digital 1, otherwise it is interpreted as a 0. If the receiver instead converts the analog voltage to a range of values the efficiency of the convolutional decoder can be improved. If an 8-bit analog-to-digital converter is used the receiver could produce a range of values from -128 to 127, where:

– -128 means ”certainly 0” – -100 means ”very likely 0” – 0 means ”could be either 0 or 1” – 100 means ”very likely 1” – 127 means ”certainly 1”

etc. This is called soft decision decoding.

3.7 Turbo codes

(33)

3.7. Turbo codes 23 Encoder 1 Encoder 2 Interleaver Message Transmitter Decoder 1 Decoder 2 Decoded message Receiver iterate De-interleaver Interleaver

Figure 3.14: The Turbo coding/decoding principle

Turbo codes are not really error correcting codes, but rather a method of constructing a larger error correcting code from building blocks of smaller error correcting codes.

3.7.1 Encoder structure

The encoder consists of two basic building blocks: an interleaver, and the constituent en-coders. The interleaver is a device that changes and/or re-orders the input symbols in some predictable way for the purpose of passing a different set of data to the second constituent encoder so that it will produce a different codeword compared with the output of the first constituent encoder. A pseudo-random permutation is often used in the interleaver.

The constituent encoders are two independant recursive systematic convolutional en-coders. Output from the constituent encoders is multiplexed and transmitted together on the same communications channel. Usually, puncturing is performed before transmitting. Puncturing is the process of removing some of the parity bits to reduce redundancy. Both constituent encoders might also output the same data bits depending on what type of code they’re using. Redundant data bits can also be discarded.

3.7.2 Iterative decoding

At the receiver the coded messages are passed to their corresponding decoders. The first decoder will try to correct as many errors as possible and then pass the decoded data to the second decoder together with all available information about the possible locations of errors. The second decoder will also try to decode as many errors as possible, and then feed the decoded data back to the first decoder, along with any available information about the possible locations of errors. This is iterated until no more errors can be corrected.

(34)

3.8 Reed-Solomon codes

Mentioned earlier is the fact that transmission errors in an Ethernet link manifests them-selves as packet loss, since the Ethernet hardware has error detection build-in and simply discards any packet containing errors, unless e.g. UDP Lite is beeing used. The error cor-recting codes investigated so far tries to fix incoming bit errors, but since packets either arrive error-free, or not at all, we need to look at error correcting codes more suitable to deal with missing data (erasures) instead of corrupt data.

The Reed-Solomon codes, constructed in 1960 by Irving S. Reed and Gustave Solomon, are the error correcting codes used in a wide variety of commercial applications, most promi-nently in CDs, DVDs and Blu-ray Discs. [13] Reed-Solomon codes are non-binary codes; That is, a codeword is defined in terms of multi-bit symbols rather than single-bit symbols. The original idea was to create a systematic error correcting code by treating the data to be transmitted as function values of a polynomial sampled at a regular interval, and then create redundant symbols by oversampling the polynomial. Another way to construct RS codes is to instead treat the input data symbols as the low-order coefficients of a polynomial, and then evaluate the polynomial at n different points to create a non-systematic code.

3.8.1 The message as a sequence of values

If a message to be transmitted consists of k data symbols, the symbols are converted into numbers, and the numbers are treated as function values of a polynomial of degree k − 1 sampled at x = 0, x = 1, x = 2, ..., x = k − 1. This allows the transmitter to find the polynomial coefficients, synthesize redundant sample points of the polynomial and append those points to the message. By appending t extra symbols sampled from the polynomial, the receiver can re-create up to t symbols lost during transmission. It does not matter which symbols are lost since a polynomial of degree k − 1 can always be determined from k function values of that polynomial.

Suppose we want to transmit a message consisting of the four values (16, 13, 7, 1). First, we treat the values as sample values of a polynomial of degree 3, y(x) = Ax3_+Bx2_+Cx+D.

The polynomial coefficients can be found through simple algebra: x0= 0, y(x0) = Ax03+ Bx02+ Cx0+ D = 16 x1= 1, y(x1) = Ax13+ Bx12+ Cx1+ D = 13 x2= 2, y(x2) = Ax23+ Bx22+ Cx2+ D = 7 x3= 3, y(x3) = Ax33+ Bx32+ Cx3+ D = 1     0 0 0 1 1 1 1 1 8 4 2 1 27 9 3 1         A B C D     =     16 13 7 1    

This gives us the polynomial: y(x) = 1 2x

3_{− 3x}2₋1 2x+ 16.

The polynomial is sampled at three more points: x4= 4, y(x4) = −2

x5= 5, y(x5) = 1

x6= 6, y(x6) = 13

(35)

3.8. Reed-Solomon codes 25

If an error occurs and the first, second and fourth values are lost, the receiver can recover them by first finding the polynomial coefficients using the following equations

x2= 2, y(x2) = Ax23+ Bx22+ Cx2+ D = 7

x4= 4, y(x4) = Ax43+ Bx42+ Cx4+ D = −2

x5= 5, y(x5) = Ax53+ Bx52+ Cx5+ D = 1

x6= 6, y(x6) = Ax63+ Bx62+ Cx6+ D = 13

and then sampling the resulting polynomial at x0, x1, x3to recover the original data values;

It does not matter which three values are lost, since we can always determine the polynomial coefficients of a polynomial of degree k − 1 from k sample points. As we can see, if we add three redundant values, then three lost values can be recovered. If more than three values are lost the linear system will be under-defined and cannot be solved. This is not an efficient way to decode Reed-Solomon codes, but rather an example to show that it is indeed possible. A more efficient way for the transmitter to obtain the extra values is through Newton’s forward difference formula. (ref mathworld). To extend (16, 13, 7, 1) we simply calculate the difference between each sample value

16 13 7 1

13 − 16 = −3 7 − 13 = −6 1 − 7 = −6

and write the differences on a new row below the row of samples, and then repeat the process on that row and continue until we reach the bottom

16 13 7 1

−3 −6 −6

−3 0

3

The next sample value can then be constructed by computing the sum of the last value from each row,

16 13 7 1

−3 −6 −6

−3 0

3

3 + 0 + (−6) + 1 = −2, which matches the value obtained in the first example where a linear system was solved in order to calculate the polynomial coefficients. More sample values can be computed by repeating the process for as many times as needed:

13 7 1 −2

−6 −6 −3

0 3

3 3 + 3 + (−3) + (−2) = 1

(36)

The forward difference method can also be expressed as a linear equation. By converting kinput symbols to a column vector and multiplying that vector with the k × k matrix

       0 1 0 0 1 0 0 1 0 . .. (−1)k −1 k 1(−1) k −2 k 2(−1) k −3 k 3(−1) k −4 · · · k k₋₁        k

we will obtain the next k sample values of the polynomial. Note that the determinant of the matrix is always 1 regardless of the value of k which means that it will always be invertible. Example using the same input values as in the earlier examples:

    0 1 0 0 0 0 1 0 0 0 0 1 −1 4 −6 4     4    16 13 7 1     =     −1 4 −6 4 −4 15 −20 10 −10 36 −45 20 −20 70 −84 35         16 13 7 1     =     −2 1 13 37    

This principle can also be used to construct a more efficient decoder. Again using the values from previous examples where the original message (s1, s2, s3, s4) = (16, 13, 7, 1)

is extended with three redundant symbol to become (16, 13, 7, 1, −2, 1, 13) before beeing transmitted, and during transmission the first, second and fourth values were lost. The receiver has the following equations:

    1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1         s1 s2 s3 s4     =     s1 s2 7 s4         −1 4 −6 4 −4 15 −20 10 −10 36 −45 20 −20 70 −84 35         s1 s2 s3 s4     =     −2 1 13 x8    

where s1, s2 and s4 are the lost symbols, and x8 is the eight symbol generated by the

encoder (the fourth redundant symbol, 37), that was never transmitted. Both equations are underdetermined, but we can borrow the values from the third row of the first equation and use them on the fourth row of the second equation to obtain a fully determined system since we’re not at all interested in x8.

    −1 4 −6 4 −4 15 −20 10 −10 36 −45 20 0 0 1 0         s1 s2 s3 s4     =     −2 1 13 7    

Now we have an equation that can be easily solved. By borrowing values from one equation to the other, a fully determined system can always be obtained as long as the number of symbols lost is less than or equal to the number of redundant symbols generated and transmitted by the encoder.

The code rate of a Reed-Solomon code with t redundant symbols for every k input symbol is k

k+t and the code can recover t lost packets, which means that a packet loss rate

of up to t

(37)

3.8. Reed-Solomon codes 27

3.8.2 The message as a sequence of coefficients

To send the message (16, 13, 7, 1) using this construction we treat the input symbols as the coefficients of a polynomial, and then evaluate the polynomial as a function f (x) = x3_{+ 7x}2_{+ 13x + 16 for as many different values of x needed to achieve the desired code rate.}

Table 3.2: Reed-Solomon encoding of message (16, 13, 7, 1) with code rate 4 7 x f(x) 0 16 1 37 2 78 3 145 4 244 5 381 6 562

Table 3.2 shows the message (16, 13, 7, 1) encoded at a code rate of 4

7. To decode the

message in the precence (or absence) of errors, the receiver needs to find the polynomial of degree 3 that evaluates to the most correct values. Reed and Solomon described a decoder that would try all possible subsets of the received values to find matching polynomials, a process which requires finding the coefficients of up to n

k polynomials. This is impractical

in most cases. To decode the message in the simple example above the decoder would need to evaluate 7

4 = 35 different polynomials which is feasible, but to decode for example the

RS(208,192) code used in DVDs the decoder would need to evaluate 208 192 ≈ 10

23_polynomials

which is of course impossible.

Various tricks and optimizations have been developed over the years in order to create more efficient decoders.

One such trick is to evaluate the polynomial at x = (1, e2iπn , e 4iπ n , e 6iπ n , . . . , e 2kiπ n ) thereby

(38)

28 Chapter 3. Forward Error Correction 0 0 Frequency Amplitude Frequency domain 0 0 Time Amplitude Time domain

Figure 3.15: Reed Solomon codes as a Fourier series

(39)

3.9. The Pro-MPEG FEC standard 29

0 0

Time

Amplitude

Time domain − one error

0 0

Frequency

Amplitude

Frequency domain − one error

Figure 3.16: Reed Solomon codes as a Fourier series - one error

One single error in the time domain will produce a single sinusiodal in the high-frequency part in the frequency domain, as shown in red in figure 3.16. The n − k elements in the high-frequency part of the frequency spectrum is called the error syndrome. The green bars is the original input data shown for illustrative purpose, but the blue bars are the actual results of the DFT in the presence of an error, i.e. the blue bars are the green bars + the red sinusoidal. In order to recover the original data the frequency, phase and amplitude of the error syndrome must be found so that the sinusoidal signal can be subtracted from the k low frequency coefficients. By finding only the frequency of the syndrome the error location in the time domain can be determined, and by also finding the phase and magnitude of the error signal the error value in the time domain can be determined. If two errors occur the error signal will be composed of two sinusiodals.

To avoid having to deal with complex numbers and floating point values e2πn can be

replaced by α where α is a primitive n:th root of unity in some suitable finite field. This is called the Number Theoretic Transform (NTT), a transform which share many of the desirable properties of the Fourier transform but only uses positive integers.

3.9 The Pro-MPEG FEC standard

The Pro-MPEG Forum has published a set of interoperability standards for F.E.C in video distribution applications. These standards can be found in the document called Pro-MPEG Code of Practice (CoP) #3 release 2. [10]

(40)

be used to correct a single transmission error, and the two-dimensional variant can correct several errors. Suppose a 5 × 6 matrix is received where bits are missing at the second and third position of the second row and at the fourth position of the third row:

1 0 0 1 1 1

0 − − 0 0 0

1 1 0 − 1 1

0 1 0 1 1 1

0 1 1 0 1

These three missing bits can be easily recovered since we know that the parity of each row and column must be even. The missing bits at the second row can be recovered from the column parity bits, and the missing bit at the third row can be recovered either by the row parity bit, or the column parity bit.

It is not convenient to work with bits at the application level, so instead of computing the parity for individual bits, the parity is computed on entire RTP packets instead. Figure3.17 illustrates how data is arranged according to the Pro-MPEG specification. RTP sequence numbers are used to order the packets at the correct position in the matrix. The number of columns is called the L-parameter, and the number of rows is called the D-parameter.

RTP 1 RTP 2 RTP 3 RTP L+1 RTP L+2 RTP L+3 RTP L(D-1)+1 RTP L RTP 2L RTP LD RTP L(D-1)+2 RTP L(D-1)+3 L D

Figure 3.17: RTP packet arrangement

3.9.1 Row FEC

(41)

3.9. The Pro-MPEG FEC standard 31 RTP 1 RTP 2 RTP 3 RTP L+1 RTP L+2 RTP L+3 RTP (n-1)L+1 RTP L RTP 2L RTP nL RTP (n-1)L+2 RTP (n-1)L+3 L Row FEC 1 Row FEC 2 Row FEC n

Figure 3.18: Row FEC packet arrangement

The stream of packets are transmitted sequentially from the first to the last packet, followed by the F.E.C packet. The receiver must wait until the F.E.C packet has been received before it can begin processing the data.

With Row F.E.C the code rate is determined by the row length L, since there is one F.E.C packet per data row. The code rate of Row F.E.C is L

L+1 which means that if L = 5, the

code rate is 5

6 ≈ 0.83 which gives a bandwidth overhead of 20%, and an average packet loss

of up to 16% can be tolerated.

The latency introduced (in terms of packets) when using Row F.E.C is the size of the row plus the F.E.C packet, L + 1, since the receiver must wait until the entire row and the F.E.C packet has arrived before it can start to process the data. The latency is only present at the receiver, and if the receiver does not support F.E.C, or does not want to use it, the F.E.C packet can simply be discarded and the data packets can be processed immediately upon arrival.

3.9.2 Column FEC

The column F.E.C policy maps packets into columns rather than rows. In the same way, each row has a F.E.C parity packet added at the bottom. The advantage of using column F.E.C instead of row F.E.C is that multiple lost packets in a row can be recovered, but only a single lost packet in a column. This provides better protection against burst errors. The disadvantage is that it is a little bit more complicated to implement compared to Row F.E.C, and a buffer of L F.E.C packets must be maintained unlike Row F.E.C which never needs to deal with more than one F.E.C packet at any given time.

(42)

32 Chapter 3. Forward Error Correction RTP 1 RTP 2 RTP 3 RTP L+1 RTP L+2 RTP L+3 RTP L(D-1)+1 RTP L RTP 2L RTP LD RTP L(D-1)+2 RTP L(D-1)+3 L D Col FEC L Col FEC 1 Col FEC 2 Col FEC 3

Figure 3.19: Column FEC packet arrangement

When using column F.E.C, the code rate is determined by the column size D only: DL

(D + 1)L = D D+ 1

If L = 5 and D = 5 the bandwidth overhead is 20% and an average packet loss of up to 16% can be corrected.

The latency is increased by L(D + 1) packets, and a packet loss of up to 1

D+1 can be

tolerated which is the same amount of protection as given by row F.E.C percentage wise. However, column F.E.C can correct up to L consecutive packets beeing lost (burst loss), while row F.E.C will always fail if two or more consecutive packets are lost.

3.9.3 2D Matrix FEC

(43)

3.9. The Pro-MPEG FEC standard 33 RTP 1 RTP 2 RTP 3 RTP L+1 RTP L+2 RTP L+3 RTP L(D-1)+1 RTP L RTP 2L RTP LD RTP L(D-1)+2 RTP L(D-1)+3 L D Row FEC 1 Row FEC 2 Row FEC D Col FEC L Col FEC 1 Col FEC 2 Col FEC 3

Figure 3.20: Matrix FEC packet arrangement

Once again, the stream of packets is transmitted row by row, left to right. The maximum number of lost packets that can be recovered using 2D F.E.C is L + D packets for every (L + 1)(D + 1) − 1 packets transmitted, or L+D

LD+L+D expressed as a ratio, but the minimum

amount of packet loss that can be handled by XOR(L, D) is less than 3

LD+L+D since only

three lost packets can sometimes lead to irrecoverable errors.

(44)

34 Chapter 3. Forward Error Correction RTP 1 RTP 2 RTP 3 RTP 6 RTP 7 RTP 8 RTP 16 RTP 5 RTP 10 RTP 20 RTP 17 RTP 18 L=5 D=4 Row FEC 1 Row FEC 2 Row FEC 4 Col FEC 5 Col FEC 1 Col FEC 2 Col FEC 3 RTP 4 RTP 9 RTP 19 Col FEC 4 RTP 11 RTP 15 RTP 12 RTP 13 Row FEC 3 RTP 14

Figure 3.21: Irrecoverable packet loss

In figure3.21we can see a case where only three missing packets out of 29 (10.3% packet loss) has created a situation where the second data packet can’t be recovered. Even if the size of the matrix had been 100 × 100 (0.03% packet loss) only three missing packets can be enough to create irrecoverable errors. There are many more combinations where only three or four missing packets can lead to situations where packets can’t be recovered, and practical testing will determine just how likely such a situation is. This is a limitation with the 2D parity matrix code, a limitation which Reed-Solomon, for example, does not suffer from.

The code rate is LD

LD+L+D since we’re adding L + D redundant packets for every LD

data packets processed, and the latency is increased by LD + L + D packets. The overhead is L+D

LD . Pro-MPEG CoP places the following restrictions on the selection of L and D:

1 ≤ L ≤ 20 4 ≤ D ≤ 20 LD≤ 100

Let’s look at a Reed-Solomon code with a code rate of k

k+t (a code that generates t

redundant symbols for every k input symbols processed) that can tolerate a packet loss rate of 1 − k

k+t, and compare that with the 2D parity matrix code with code rate LD

LD+L+D that

can tolerate packet loss rates between 3

LD+L+D and L+D

LD+L+D. By substituting k with LD

and substituting t with L + D we obtain the same definition of code rate for both the Reed-Solomon code and the 2D parity matrix code, since t is the number of redundant symbols added for every k symbols by the RS code, and L + D is the number of redundant symbols added by the 2D parity matrix for every LD input symbols. It then becomes clear that Reed-Solomon codes can achieve better performance for the same code rate and overhead since RS can always recover from a packet loss rate up to t

k+t = L+D

LD+L+D while the 2D

(45)

3.9. The Pro-MPEG FEC standard 35

L+D

LD+L+D and under less favourable circumstances it can only recover from loss rates of up

to 3

LD+L+D.

Reed-Solomon certainly performs better in theory since the best-case performance of the 2D parity matrix is only as good as the worst-case performance of Reed-Solomon, but RS is also more complicated to implement, especially the decoder. Testing will determine whether the 2D parity matrix code encounter worst-case scenarios often enough to warrant the conclusion that Reed-Solomon is worth the extra time and complexity.

Transmission

The RTP packets will usually be transmitted as soon as they have been used to update the F.E.C packet at the end of the row and/or at the end of the column, but when should the F.E.C packets be sent? The naive answer is to send each F.E.C packet as soon as it has been completely updated with data from all of the RTP packets in its corresponding row or column, as shown in figure 3.22. This it the most simple order to implement obviously, but it has one major drawback; If four or or more consecutive packets are lost during the transmission of the final row of the matrix the packet loss will be irrecoverable, as illustrated in figure 3.23. A better option is to delay the transmission of the column F.E.C packets up to L packets as shown in figure3.24which will have a much better tolerance when it comes to bursts of lost packets, but it will also increase the latency slightly and it is a little bit more complicated to implement.

RTP 1 RTP 2 RTP 3 RTP L+1 RTP L+2 RTP L+3 RTP L(D-1)+1 RTP L RTP 2L RTP LD RTP L(D-1)+2 RTP L(D-1)+3 L D Row FEC 1 Row FEC 2 Row FEC D Col FEC L Col FEC 1 Col FEC 2 Col FEC 3

(46)

36 Chapter 3. Forward Error Correction RTP 1 RTP 2 RTP 3 RTP L+1 RTP L+2 RTP L+3 RTP L(D-1)+1 RTP L RTP 2L RTP LD RTP L(D-1)+2 RTP L(D-1)+3 L D Row FEC 1 Row FEC 2 Row FEC D Col FEC L Col FEC 1 Col FEC 2 Col FEC 3

Figure 3.23: Irrecoverable error when using naive packet ordering

RTP 1 RTP 2 RTP 3 RTP L+1 RTP L+2 RTP L+3 RTP L(D-1)+1 RTP L RTP 2L RTP LD RTP L(D-1)+2 RTP L(D-1)+3 L D Row FEC 1 Row FEC 2 Row FEC D Col FEC L Col FEC 1 Col FEC 2 Col FEC 3

Figure 3.24: Burst loss resilient packet order

(47)

dif-3.9. The Pro-MPEG FEC standard 37

ferent UDP port number which is the second consecutive port number after the port used by the data packets. The row F.E.C packets are sent using the same concept, on a UDP port that is the fourth consecutive port number after the port used by the data packets. If the receiver is unaware of F.E.C, it will only receive the data packets. If the receiver only supports column F.E.C, it will listen on two ports and join the data stream and the column F.E.C stream. If the receiver supports Matrix F.E.C it will join all three streams.

Example: The MPEG-TS/RTP stream will be sent to address 234.123.5.6 on UDP port 5004, the column F.E.C packets will be sent to address 234.123.5.6 on UDP port 5006, and the row F.E.C packets will be sent on UDP port 5008. The receiver only listens to port 5004 if it is unaware of F.E.C, hence old receivers can still function when used with modern F.E.C-enabled transmitters.

3.9.4 RTP FEC extension

RFC 2733 defines a payload format for generic f.o.r.w.a.r.d e.r.r.o.r c.o.r.r.e.c.t.i.o.n for media data encapsulated in RTP, and the Pro-MPEG CoP extends this with seven additional fields. [7]

0 1 2 3

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

| SNBase low bits | Length Recovery |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

|E| PT recovery | Mask |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

| TS recovery |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

|X|D|type |index| Offset | NA |SNBase ext bits|

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 3.25: RTP FEC header structure

The fields of the extended RTP F.E.C header are shown in figure 3.25:

– SNBase low bits - Minimum sequence number of the packets associated to the F.E.C packet.

– Length Recovery - This field is used to determine the length of any media packets associated with the F.E.C packet.

– PT recovery - This field is used to determine the Payload Type of any media packets associated with the F.E.C packet.

– TS Recovery - This field is used to recover the timestamp of any media packets asso-ciated with the F.E.C packet.

– E - In RFC2733 this shall be set to 0, but the Pro-MPEG CoP mandates that this shall be set to 1 to indicate that the header is extended.

– Mask - In RFC2733 this is used to select which packets the F.E.C packet is applied to, but the CoP mandates that this is set to zero.

(48)

– D - This bit is provided as an additional means of determining to which F.E.C stream the packets belong. It must be set to 0 for F.E.C packets computed on columns and set to 1 for F.E.C packets computed on rows.

– Type - This field indicates which error-correcting code is chosen. It can be XOR (type=0), Hamming (type=1), Reed-Solomon (type=2). Release 2 of the CoP man-dates that only the XOR type is used.

– Index - This field is used for more complex error protection codes. For the XOR method, the index field will always contain 0.

– Offset - This 1-byte field is the period chosen to select the media packets associated with this F.E.C packet, and corresponds exactly to the L parameter for packets com-puted over columns (the first F.E.C stream). For packets comcom-puted over rows (the second F.E.C stream) this parameter shall always be one. This field should be kept constant during a session for each F.E.C stream.

– NA - This 1-byte field indicates the number of media packets associated with this F.E.C packet, and corresponds exactly to the D parameter for packets belonging to the first F.E.C stream, and should correspond to the L parameter for packets belonging to the second F.E.C stream. This field should be kept constant during a session for each F.E.C stream.

– SNBase ext bits - This field is reserved for use with protocols which require extended sequence numbers longer than 16 bits.

In the original RFC document the mask field is a 24 bit field. If bit i in the mask is set to 1, then the media packet with sequence number N + i is associated with this F.E.C packet, where N is the SNBase field in the F.E.C packet header. The least significant bit corresponds to i = 0, and the most significant to i = 23. This puts restrictions on the size of the matrix used for the column and matrix mode F.E.C with 4 × 6 beeing the largest matrix possible which means 17% overhead if using column F.E.C and 42% overhead if using matrix F.E.C, and 4.2% for row F.E.C. The parity check code (XOR) is the only type of error correcting code allowed by the RFC.

(49)

Chapter 4

Testing

One goal of this thesis is to investigate the performance of the chosen F.E.C scheme in different real-world scenarios, and during the implementation it became clear that some sort of testing infrastructure was required where packet loss, latency, bandwidth, and other parameters could be simulated.

4.1 Implementation

Since the goal of this thesis is to test and evaluate the Pro-MPEG Code of Practice spec-ification the 2D matrix F.E.C algorithm was implemented since it’s the only one permitted by the current release of the CoP. The use-case is real-time video transmissions so in order to test that use-case all things F.E.C was implemented as separate external programs which can be attached to existing video transmission applications, and can also be unit-tested separately. For the sake of comparison an implementation of Reed-Solomon was also done. The F.E.C algorithm will be integrated into existing commercial video transmission soft-ware from S.w.e.d.i.s.h c.o.m.p.a.n.y I.n.t.i.n.o.r A.B. Figure 4.1 shows how their video transmitter (D.i.r.e.k.t-l.¨an.k) and receiver (D.i.r.e.k.t-m.o.t.t.a.g.a.r.e) units work. Video enters the sender as a raw video stream on a Serial Digital Interface (SDI), a video encoder encodes the video (and audio) as either an MPEG-2 or an H.264 elementary stream, which is then packetized into an MPEG Transport Stream (MPEG-TS). Finally the MPEG-TS is transmitted as a stream of UDP packets.

At the receiver the process is performed in reverse; UDP packets containing MPEG-TS are received, converted to an elementary stream, decoded, and output as raw video on SDI.

(50)

40 Chapter 4. Testing

H.264 encoder

Raw video Elementary

stream _{MPEG-2 TS} encapsulation MPEG-2 TS UDP Transmitter UDP packets H.264 decoder MPEG-2 TS expansion UDP Receiver

UDP packets MPEG-2 TS Elementary

stream Raw video Transmitter: Direkt-länk Receiver: Direkt-mottagare

Figure 4.1: Video transmission and reception

Figure4.2explains how the F.E.C implementation was performed. The UDP packets are captured before beeing sent over the wire and the contents of each packet is beeing placed in a RTP packet instead, which is then transmitted as a UDP packet. The Reed-Solomon implementation uses only one FEC stream instead of two.

Each RTP packet is also passed to the Row and Column FEC modules, where the parity packets are beeing constructed and sent on two separate channels. At this point, any RTP capable receiver should be able to receive and decode the RTP stream, and any RTP/FEC capable receiver that follows the PRO-MPEG specification should be able to use the additional FEC streams to recover missing packets. Row FEC and Column FEC packets can be constructed independently.

At the receiver the incoming FEC packets are used to recover any missing RTP packets (when possible) and the reconstructed packets are merged with the RTP stream. The RTP stream is then expanded into plain MPEG-2 TS and passed along to I.n.t.i.n.o.r.s receiver. Column FEC and Row FEC need to work together at the receiver, since the Column FEC decoder might not be able to recover packets until the Row FEC decoder has recovered a packet, and the other way around.

Direkt länk RTP encapsulation Direkt mottagare Column FEC encoder Row FEC encoder FEC decoder RTP expander FEC enabled Transmitter FEC enabled Receiver

Forward Error Correction in Real-time Video Streaming Applications

Forward Error Correction in

Real-time Video Streaming

Applications

Johan Westerlund

January 26, 2015

Master’s Thesis in Computing Science, 30 credits

Supervisors at CS-UmU: Thomas Nilsson and Jerry Eriksson

Examiner: Fredrik Georgsson

Ume˚

a University

Department of Computing Science

SE-901 87 UME˚

A

Felkorrigering i realtidsstr¨

ommande videotill¨

ampningar

Sammanfattning

Contents

List of Figures

List of Tables

Chapter 1

Introduction

1.1

Problem statement

1.2

Overview

Chapter 2

Background

2.1

Real-time video streaming

2.2

IP networks

2.2.1

Link layer

2.2.2

Internet layer

2.2.3

Transport layer

2.2.4

Application layer

Chapter 3

Forward Error Correction

3.1

History

3.2

Error correcting codes

3.3

Repetition codes

3.4

Hamming codes

3.5

Parity check

3.5.1

Two-dimensional parity code

3.6

Convolutional codes

D

D

D

Input

Output 1

Output 2

Output 3

3.6.1

Decoding

3.6.2

Soft decision decoding

3.7

Turbo codes

3.7.1

Encoder structure

3.7.2

Iterative decoding

3.8

Reed-Solomon codes

3.8.1

The message as a sequence of values

3.8.2

The message as a sequence of coefficients