Router-based Congestion Control through Control Theoretic Active Queue Management

(1)

Router-based Congestion Control through Control Theoretic Active Queue Management

A L B E R T O G I G L I O

Master's Degree Project Stockholm, Sweden 2004

IR-RT-EX-0421

(2)

(3)

Introduction

The Transmission Control Protocol (TCP) is the most widely spread transport level protocol in the Internet. It provides an end-to-end reliable transmission of packets and works on the shoulders of the IP protocol, which is mainly responsible for the addressing and fragmentation of the data. Its main feature is transmission of data without errors, which is fundamental for most of the data exchanges; on the other hand there is also a fast growing demand of best-eﬀort transmission, caused by the interworking of data and multimedia application on the Internet. The protocol that represents the transport level of this structure is the User Datagram Protocol (UDP). In Figure 1.1 the transport protocols are placed into the TCP/IP stack, and compared to the standard layers deﬁned by the Open System Interconnect (OSI) reference model. The OSI stack is an architectural model which describes the functions of data communication protocols: each layer is self-standing and performs services to the layer above using data from levels below.

1.1 TCP basics

TCP ensures a data transmission without errors by a feedback system based on acknowl- edgments of received packets. Each packet sent by the end-point A has a sequence number and is routed through the diﬀerent possible paths to reach its destination end-point B; the sequence number is needed as packets can arrive in mixed order, because of the routing algorithm through the network. When a packet is received, B sends back an acknowledgment (ACK) to A, conﬁrming the packet arrival. A timeout threshold is set when the packet is sent from A: if the ACK is not received after the time has elapsed, the packets is sent again.

The fundamental dynamics of TCP is window-size based. The congestion window is the number of packets the transmitting source can send at the same time without waiting for a cumulative ACK. This specification provides a high utilization of the available bandwidth and is at the basis of the ”congestion control” mechanism: when a bottleneck link discards some packets because of an overflow in its routing buffer or a packet is lost or arrives corrupted at the end-point, A (the sender) reduces its congestion window and hence decreases the load in the sender perspective. This dynamics is called ”additive

5

(6)

A P P L I C AT I O N PRESENTATION

S E S S I O N T R A N S P O RT

N E T W O R K D ATA L I N K

P H Y S I C A L P H Y S I C A L D ATA L I N K

N E T W O R K T R A N S P O RT A P P L I C AT I O N

I S O / O S I V S . T C P / I P P R O TO C O L S F T P, H T T P, T E L N E T

T C P, U D P

I P

Figure 1.1: OSI and TCP/IP layer structure with relative protocols

increase - multiplicative decrease” and it has satisﬁed the Internet requirements until the beginning of the ’90s.

However, the exponential growth of the Internet usage, together with the larger dimensions of transferred ﬁles and multimedia applications, have drastically increased the load in the network leading to a higher probability of congestion (creating longer queues at the router buﬀers), increasing Round Trip Times (RTT) and requiring several retransmissions of packets when queues saturate.

1.2 Active Queue management - AQM

To counteract the network performance deterioration caused by the increased load, new TCP speciﬁcations have been proposed (e.g. Tahoe, Reno, Vegas), but they still only provide end-to-end congestion control functionality, keeping a drop-tail policy for routing.

The new challenge has moved to the queue management: instead of waiting until congestion is present i.e. for timeouts or triple duplicate-acks to decrease the transmission window of A, the aim of Active Queue Management (AQM) is to prevent full queues by signalling congestion to the transmitter before it happens.

In [6] a short analysis of some proposals for AQM is presented: from the simple drop-tail modifications to the control-theoretic designs, through Random Early Detection (RED) proposals. Different points of view can be found in the scientific community, but it seems clear that some more complex implementations of control theoretic designs will lead to better results, also taking into consideration the increased computational power of routers and lower costs for memory.

1.3 Thesis objectives

The aim of this thesis is to analyze the possibility of improved control of existing models developed in [2] and to test alternative solutions, evaluated in network behavior simulation.

(7)

1.4. THESIS OVERVIEW 7

The discrete simulator NS-2 [23] is used to test the implementations of the models studied in theory: the challenge is to obtain an algorithm, which can ensure stable queue behavior and fast response to perturbations in the network.

1.4 Thesis overview

This thesis is organized as follows: in Chapter 2 the end-to-end congestion control performed by the TCP protocol and the improvements of the router-based schemes are described. The model of the TCP ﬂows competing for their routing is analyzed in Chapter 3 and improvement limitations are set. Chapter 4 describes the network simulator NS-2 and the experiment settings for testing the algorithms.

A Proportional-Integral-Derivative controller, improvement to the PI controller described in [2], is analyzed and tested in Chapter 5. Chapter 6 describes a static and an adaptive algorithm based on a common Internal Model Control design.

Internal disturbances are taken into consideration with a Linear Quadratic controller in Chapter 7, in order to balance the eﬀects of the state-feedback.

For each algorithm a theoretical description is given, then the simulation results together with the chosen benchmark (PI controller) are presented.

Chapter 8 summarizes the results, presents the ﬁnal conclusions of the thesis’ work and suggests future possibilities of improvement.

For a more complete description, in Appendix A a little guide for using the network simulator NS-2 is given; the OTcl code for the simulations is added in Appendix B.

(8)

(9)

Chapter 2

State of the art

The large use of the TCP transport protocol in the Internet and the growing volume of data sent through the Web has in the last years stimulated the interest of researchers to study the new problems of network congestion and increased performance of the standard protocols. Backward compatibility is a great challenge in this ﬁeld.

2.1 TCP Congestion Control

In order to be able to understand the dynamics of a router-based congestion control, the end-to-end TCP congestion control must ﬁrst be introduced, as it represents the basic layer of control in every TCP data exchange. As written in the introductive chapter, TCP is a Transport layer protocol, and is the ﬁrst in the stack working ”end-to-end” (see Figure 2.1).

Transport Transport

Network Network Network Network

Data Link Data Link Data Link Data Link

Application Application

Physical Physical Physical Physical

Router 1 Router 2

A B

Figure 2.1: End-to-end communication for the transport protocol TCP

The main modes in one of the standard TCP versions (TCP New Reno) are:

• Slow Start

9

(10)

• Congestion Avoidance

• Fast Retransmit

• Fast Recovery

The ﬁrst two are used by the TCP sender to control the amount of data being injected into the network. The congestion window (cwnd) is a sender-side limit on the amount of data the sender can transmit into the network before receiving an ACK, while the receivers advertised window (rwnd) is a receiver-side limit on the amount of outstanding data. The minimum of cwnd and rwnd governs data transmission.

The slow-start algorithm is used to probe the available capacity of an unknown link in the beginning of a transfer, or after a congestion avoidance round, in order not to congest the network.

In case of a timeout event or a duplicate ACK, the congestion avoidance algorithm halves the transmission window (cwnd). A variable is added to the TCP per-connection state to determine if either the slow-start or the congestion avoidance algorithms are used, in a fully cooperative environment.

A B

Time-out threshold

1

ACK 1

ACK 1 Retransmission

packet 1

A B

1 2 3 4

ACK 1

ACK 2 ACK 2

5 6

ACK 2 ACK 2

retransmit packet 3

ACK 6

Figure 2.2: Time-ﬂow events for retransmission causes: time-out and triple-duplicate ack

The fast retransmit algorithm is used by the TCP sender to react to losses on the chan- nel, and it is started in case of the arrival of 3 duplicate ACKs. It forces the retransmission of the supposed missing packet before the time-out occurs.

After the single retransmission, the fast recovery algorithm governs the sending of the new data, in order to increase the utilization of the available bandwidth: in case of a triple duplicate ACK, in fact, the use of the slow start algorithm would strongly decrease the sending rate, while the congestion on the link has already been solved. For a detailed description of the algorithms and the protocol, information can be found in [8].

(11)

2.2. AQM ALGORITHMS 11

time cwnd

(bytes)

single retransmission (fast retransmit) slow start

congestion avoidance fast recovery

Figure 2.3: Cwnd evolution for TCP New Reno

2.1.1 Limits of the end-to-end congestion control

An analysis of the behavior of TCP aggregate flows is presented in [9]: this research shows how the TCP behavior is dependent on the considered system, leading to a floating level of performance. In an environment which is constantly changing and growing as the modern Internet, there is need for standard behavior under different network conditions.

Moreover, TCP congestion control acts merely as an end-to-end system, not taking into consideration the network that broadcasts the packets. Its only aim is to perform a reliable error-free data exchange, and the algorithms that work to get this result are obliged to avoid link congestion. This policy performs ”late” corrections and can not ensure a stable steady-state load in the network.

AQM was developed from the network point of view, in order to perform in cooperation with the TCP protocol, but also as a distribute mediator for the throughput maximization performed by the TCP protocol. In the following sections some of the most interesting proposals for AQM are presented, in order to give a broader view of the ongoing research and to open the way for this thesis’ work.

2.2 AQM algorithms

In the following sections some of the main steps in the AQM research are described, in order to give a general idea of the problem.

• Explicit Congestion Notiﬁcation - ECN

• Random Early Detection - RED

• Control Theoretic Algorithms

2.3 Explicit Congestion Notiﬁcation - ECN

ECN is a functionality that has been studied to work together with AQM algorithms in order to improve their performance. It requires modiﬁcations in both IP and TCP packet headers.

(12)

ECT CE

0 0 Non ECT-capable

0 1 Non ECT-capable and packet dropped 1 0 ECT-capable and no congestion

1 1 Congestion Experienced

DSCP - Differenciated Services ECN

0 1 2 3 4 5 6 7

Figure 2.4: The ToS/TC octet

The AQM algorithm implemented in a router, instead of dropping incoming packets to prevent congestion, can set the CE (Congestion Experienced) codepoint in the IP header of the packet, and route it to the next hop. The routers that receive the packet with the CE bit, don’t modify the ﬁeld and treat the packet as a normal one. In order to avoid the loss of information, ECN-capable packets should have the DF (Don’t Fragment) bit set. The destination end-point acknowledges the received packet but sets the congestion bit into the TCP ACK, so that the sender can react in the same way as for a packet loss (halving the window size), with the advantage of not having lost the packet.

At the IP level, two bits are added in the packet header: the ECN-Capable Transport (ECT) and the CE. The ﬁrst indicates if the end-points of the transport protocol are ECN-capable, while the second signals a congestion problem in the network.

The two bits are placed inside the ToS (Type of Service) octet of IPv4 or inside the TC (Traﬃc Class) octet of IPv6 and occupy bits 6 and 7, as shown in Figure 2.4. According to the standardization of the IP header, bits 6 and 7 were deﬁned as ”Currently Unused”, and an experimental use of the ECN bits has been allowed to use that space (RFC2780).

The position of the field inside the IP header is shown in Figure 2.5. As from Figure 2.1, the IP protocol manages the routing operations at each hop, while TCP works on end-to-end communication. Following the classical TCP negotiation scheme (handshake), the end-points must first show if they are ECN-capable. The chain of events that follows a congestion notification is the following:

• If the sender is ECN-capable, the ECT codepoint is set in the IP packet headers

• An ECN-capable router, which decides to drop one incoming packet, looks at the ECT ﬂag and sets the CE, forwarding the packet

• The receiver checks the CE codepoint and sets the ECN-Echo - ECE (Figure 2.6) in the next TCP ACK sent to the sender

• The sender receives the packet and reacts as if a packet had been dropped

• The TCP header of the next packet has the CWR bit set in order to acknowledge the reception of the ECN notiﬁcation

(13)

2.4. RANDOM EARLY DETECTION - RED 13

source address destination address

options data

TTL

protocol header checksum total length fragment offset identification

IHL

0 4 8 16 18 32

ToS

Figure 2.5: The IP packet header

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Header length U

R G

A C K

P S H

R S T

S Y N

F I N C

W R

E C E Reserved

Figure 2.6: The TCP packet header

ECN has problems when there is a tunnel connection: in IP security protocols, as IPsec, the IP header is encapsulated in an outer packet, and the inner ﬁelds can not be changed for the hops in the tunnel (in order to prevent malicious modiﬁcation). If a router inside the tunnel wants to signal incoming congestion, the only way to do it is to drop packets.

One solution is to negotiate the use of ECN during the IPsec handshakes at the tunnel endpoints, so that its use can be disabled by a security administrator in case the risks outweigh the beneﬁts.

2.4 Random Early Detection - RED

The aim of RED design, which was presented in 1993 [12], is congestion avoidance through low delay and high throughput in the network. The average queue size is kept low in order to minimize the time delay and in order to let bursty traffic to enter the queue without overloading it. Another challenge is to break down synchronization of flows (that is what happens in drop-tail systems) and avoid periodic fluctuations. In order to avoid the bias against bursty traffic, a randomized marking function is used, with the effect of a dropping probability proportional to the bandwidth usage. Usually RED is described in cooperation with normal TCP implementation, but it can be applied along with ECN policy. The pseudo-code of the RED algorithm is the following:

for each packet arrival

(14)

K s + K

q p

max_th min_th

1 pmax p

q

Figure 2.7: RED dynamics

calculate the average queue size avg if min th ≤ avg ≤ max th

calculate probability p a with probability p a:

mark the arriving packet else if max th ≤ avg

mark the arriving packet end if

end for

The first action RED performs is to calculate the average queue size: it can be done either according to the queue size in bytes or in number of packets. The second option accounts different types of TCP connections (there is a big difference between packet dimensions for HTTP or FTP connections), and reflects the delay in the queue in a closer way. In order to average the queue length, a filter is used: the parameter w_q (0≤ w_q≤ 1), which is used to weight the actual queue measurement in relation to the previous average value, represents the time constant of the low-pass filter that performs the averaging (EWMA - Exponential Weighted Moving Average). This parameter is difficult to tune, as a high value makes the system more responsive while letting high transient fluctuation and bursty traffic affect the queue congestion. . The other two main parameters in RED are the thresholds min_th and max_th, which define the areas of random dropping and full dropping of the incoming packets. In Figure 2.7 the block diagram of RED structure is represented as a cascade of a low-pass filter and a non-linear gain element. Another important parameter is p_max, which represents the dropping probability at the maximal threshold. For the general RED, the curve is linear and its gradient L_RED can be easily derived from the other parameters. All the previously introduced parameters must be tuned in order to satisfy the performance requirements in different working conditions.

The difficulty to reach common criteria for RED tuning has lead researchers either to look for modifications on the main system or to avoid its use. Many proposals of modified RED have been studied and tested in simulations, with various results: some of them are Adaptive RED [13], [14], Gentle RED [15] and Blue [16].

RED can reach good performance compared to a traditional end-to-end congestion control, but its behavior is quite slow and the steady state behavior is oscillatory because of the two thresholds. Recently other solutions which use control theory to model and control the system have been proposed. Previous research in this ﬁeld are examined in the

(15)

2.5. CONTROL THEORETIC DESIGNS 15

following section and used as a starting point for further analysis and experimentation.

2.5 Control Theoretic Designs

In recent years, thanks to the development of good mathematical models of TCP dynamics, a control theoretical approach to congestion avoidance has been proposed and the debate on this subject is more and more alive in the scientiﬁc community. An introduction to the subject is presented here and then further developed in chapter 3.

2.5.1 The model of TCP dynamics

A mathematical model for the TCP behavior is proposed in [3], and simulations performed according to it showed an accurate match with TCP dynamics. The model is introduced here and a complete analysis can be found in Section 3.1. It takes in consideration the two possible retransmission causes: timeouts and triple duplicate ACKs:

W (t)˙ ≈ 1

R(t) + (1− Q(W ))

−W (t)W (t− R(t) 2R(t− R(t)

p(t− R(t)) +(1− W (t))Q(W )W (t− R(t)

R(t− R(t)p(t− R(t))

˙ q(t) =

N i=1

W (t)

R(t) − C, (2.1)

where

W = expected TCP window size q = expected queue length R = round trip time C = link capacity

N = number of TCP sessions p = probability of packet dropping t = time

The function Q(W ) determines the probability that one loss is caused by a timeout (rather than by a triple duplicate ACK), given that the window size is W at the time of the loss. A simpliﬁed expression is Q(W ) = min(1,_W³).

The model is simpliﬁed in [1], it ignores the timeout mechanism but is well suited for performing a small-signal linearization and to prepare the system for a control theoretical analysis. The resulting model is:

W (t)˙ = 1

R(t) −W (t)W (t− R(t))

2R(t− R(t)) p(t− R(t)) (2.2)

˙

q(t) = W (t)

R(t)N (t)− C (2.3)

(16)

s+

R C0 2

2N² 2N R C0 2

dW

N R₀ s+ 1

R₀

-

e-sR₀ dp

dq

Figure 2.8: Linearized system

In [1] equations (2.2) and (2.3) are linearized around the operating point (W₀, q₀, p₀), which leads to the following:

δ ˙W (t) = − N

R²₀C(δW (t) + δW (t− R0))−R₀C²

2N² δp(t− R0) (2.4) δ ˙q(t) = N

R₀δW (t)− 1

R₀δq(t) (2.5)

A further simplification is possible when the TCP transmission window is much larger than 1. This assumption is reasonable for typical network conditions and the time-delay that affects the window dynamic can then be ignored. The final system is represented as a block-diagram in Figure 2.8, in order to be able to show the plant of the feedback structure.

2.5.2 Feedback structure description

In control theory an output feedbacks into the input to compensate the errors. The feedback system is roughly made of two parts: the plant, that describes the system dynamics, and a compensator, which must ensure that the system is robustly stable and with good time-response to the inputs.

The AQM action can be seen as a compensator that works together with the TCP dynamic in order to stabilize it. The block diagram with the feedback network is shown in Figure 2.9.

The function P (s) is the plant transfer function and takes into consideration the queue dynamics (P_queue(s)) and the TCP behavior (P_tcp(s)). The plant P (s) is derived by taking the Laplace transform of equations (2.4) and (2.5) after the time-delay for the window size is cancelled:

P (s) = (_2N^C²)e^−sR⁰ (s + ^2N

R²₀C)(s +_R¹

0) (2.6)

From the control theoretical point of view, every AQM implementation can be tested on this structure: in [1] a thorough analysis of RED as feedback block is proposed, while in

(17)

AQM controller e^-sR⁰

PLANT

P‘(s) dq

dp

Figure 2.9: feedback block diagram

[2] the performance of a Proportional Controller and a Proportional-Integral Controller are tested against RED.

The experiments and implementations proposed in the two above mentioned articles are recreated and the results in frequency domain as well as in time domain are reported here to create a benchmark for the next improvements. The common test settings are the following:

• C = 15 Mbps = 3750 packets/sec (packets of ﬁxed length 500 Bytes)

• N⁻ = 60

• R⁺ = 0.246 sec,

where N⁻ is a lower bound on the number of ﬂows and R⁺ is an upper limit of the Round-trip time.

2.5.3 Stability analysis

The two main parameters to be checked in the frequency domain are the Phase Margin and the Gain Margin. The phase margin (PM) is deﬁned as (ω_pm− 180), where ωpm is the phase response at the frequency where the magnitude response is 0 dB: it can here be viewed as a measure of the quantity of uncertainty of time-delay a system can accept. The second (GM) is the magnitude response when the phase response is −180^o, and can be thought as the factor that can be applied to the open loop gain of a stable system to make it unstable. In the time domain the main parameters to be checked are the overshoot, the settling time and the rise time. No information is lost in the conversion between the frequency and time representations, so usually the time-domain is considered a proof of the results obtained in frequency domain.

2.5.4 Performance comparison

From the Bode diagrams in Figures 2.10, 2.11 and 2.12, the main diﬀerence among the three systems is the cut-oﬀ frequency(w_c): in RED implementation is 0.05 rad/sec, for the Proportional compensator 1.5 rad/sec, while for the Proportional-Integral is back to

(18)

Bode Diagram

Frequency (rad/sec)

Phase (deg)Magnitude (dB)

−150

−100

−50 0 50

Gm = 32.835 dB (at 1.0047 rad/sec), Pm = 88.844 deg (at 0.050222 rad/sec)

10⁻³ 10⁻² 10⁻¹ 10⁰ 10¹ 10²

−630

−540

−450

−360

−270

−180

−90 0

Step Response

Time (sec)

Amplitude

0 10 20 30 40 50 60 70 80 90 100

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Figure 2.10: RED: frequency and time response

Bode Diagram

Frequency (rad/sec)

−80

−60

−40

−20 0 20

10⁻¹ 10⁰ 10¹ 10²

−540

−450

−360

−270

−180

−90 0

Step Response

Time (sec)

Amplitude

0 2 4 6 8 10 12 14 16 18 20

−0.2 0 0.2 0.4 0.6 0.8 1 1.2

Figure 2.11: Proportional compensator: frequency and time response

(19)

Bode Diagram

Frequency (rad/sec)

−80

−60

−40

−20 0 20

10⁻¹ 10⁰ 10¹ 10²

−540

−450

−360

−270

−180

−90

Step Response

Time (sec)

Amplitude

0 2 4 6 8 10 12 14 16 18 20

−0.2 0 0.2 0.4 0.6 0.8 1 1.2

Figure 2.12: Proportional-Integral: frequency and time response

0.5 rad/sec. The cross-over frequency is proportionally inverse to the response time of the system, so it means that to get a faster system w_c must be as high as possible.

The fastest system (the one with lowest settling time) is the Proportional, but the fact that the plant has no poles in zero causes the system to have a steady-state error in case of a constant reference signal, which means that the diﬀerence between the desired output and the closed-loop output does not converge to zero. This fact does not seem to be a problem because of the fast variations in the network conditions, that should require a fast system which can provide a quick response to the load changes; in [2] the results of a worst-case simulation in NS-2 are reported and the Proportional controller causes overload of the queue. This simulation does not seem to be meaningful in the analysis of the problem, as the implementation of the Proportional controller is used together with the RED non-linear gain function, which has been reported before not to be easy to tune for a general case. In order to avoid this kind of problem, a Proportional-Integral (PI) compensator is proposed in [2]: it has the capacity (if used with the considered plant) to lead the steady-state error to zero, as it has a pole placed at the origin. The diagram of the PI feedback system is shown in Figure 2.13, emphasizing the role of the desired operating point, which is the reference value of the output to be reached.

The resulting system performs well, but its step response is slow. In chapter 5 a PID (Proportional-Integrative-Derivative) controller is tuned and analyzed, in order to verify the increased performances of a more complex system applied to the TCP dynamics.

(20)

-

p p0

PLANT DYNAMIC

dp PI

q

dq q₀ -

Figure 2.13: Block diagram of PI controller,with reference value q₀

(21)

Chapter 3

Model analysis

After having shortly introduced the model used in this master thesis in Section 2.5, a more accurate description is reported here.

The work of V. Misra, W. B. Gong, D. Towsley and C. V. Hollot from University of Massachusetts is explained to create the link between queueing theory and control theory.

3.1 Stochastic modelling

The additive increase-multiplicative decrease scheme of TCP can properly be activated through losses in the network, and it is through losses that TCP tries to detect congestion.

From a network centric point of view, indications of losses arrive to the source from the network at a certain rate (in diﬀerent forms, e.g. duplicate ACKs or gaps in sequence numbers showing timeout losses). The arrival process of losses can be modelled as a Poisson process: the reasons behind this assumption are the high number of hops that a packet passes through before reaching its destination and the elevate number of ﬂows at each router.

• high number of hops - at each router the packet is enqueued or dropped (marked if ECN): if the overﬂow process at each router is stationary and orderly, the overall congestion process is the sum of the individual processes. As the number of hops increases, Khinchine’s theorem [20] states that the process approaches a Poisson behavior.

• high number of flows at router - the losses seen by each flow can be considered as a sampling of the buffer overflow process at each router, as a probabilistic thinning. As the number of flows at each router increases, the thinned process starts approaching a Poisson process according to Kallenberg’s theorem [21].

With the previous statements, the loss process seen by a single ﬂow at a router is Poisson- like and the multiple hops that deﬁne the path make the aggregate loss process closer to a Poisson process.

The traﬃc is then modelled as a ﬂuid: the increases in the window size are considered continuous and represented by _{RT T}^dt , as they take place every round-trip time (in absence

21

(22)

t W

T₀ 2T₀ 4T₀

S i

Z_i^TD Z^TOi

A_i1 A_i2 A_i3 W

W W

i1 i2

i3

Figure 3.1: Evolution of window size with TD and TO losses

of losses). The losses, each modelled as a Poisson stream, are of two kinds (showed in Figure 2.2): triple duplicate ACK (TD) and timeouts (TO). A TD loss causes the window size to be halved while after a TO the transmission window decreases its size to 1 plus an exponential silence for a period of T₀, 2T₀...64T₀, depending on the number of successive TO losses detected.

The round-trip time is peculiar to each single ﬂow and takes the form R_i(t) = a_i+q(t)

C , (3.1)

where i identiﬁes the ﬂow, a_i denotes the propagation delay and ^q(t)_C is the queuing delay.

The packet losses to ﬂow i are described as a Poisson process{Ni(t)} with time varying rate λ_i(t) (the time varying rate can model the diﬀerent independent marking schemes of AQM).

Identifying the window size as W_i(t), the following equation describes its behavior:

dW_i(t) = dt

RT T_i(q(t)) −W_i(t)

2 dN_i(t) + (1− W_i(t))dN_i(t) (3.2) The ﬁrst term accounts the additive increase part of TCP, the second reﬂects the multiplicative decrease and the third the timeout behavior.

An example of the time evolution of a TCP congestion window is shown in Figure 3.1:

Z_t^{T O}is the duration of a timeout sequence, Z_t^{T D}the time interval between two consecutive timeout sequences and then S_i= Z_t^{T D}+ Z_t^{T O}.

Let M_i be deﬁned as the number of packets sent during period S_i. The throughput is then deﬁned as

B = E[M ]

E[S] (3.3)

and it accounts the number of sent packets. In Figure 3.1 the following variables can be deﬁned:

• n_i: number of TD periods in interval Z_i^{T D}

(23)

3.1. STOCHASTIC MODELLING 23

• Yij: number of packets sent in the j-th TD interval

• Aij: duration of the j-th period

• X_ij: number of rounds in the j-th period

• Wij: window size at the end of the period

In order to derive E[n] it is useful to observe that during Z_i^{T D} there are n_i TD periods, where each of the ﬁrst n_i− 1 ends in a TD, while the last one ends with a TO. This means that there is one TO out of n_i loss indications.

Calling Q the probability that a loss indication ending a TD period is a TO, it is Q = _E[n]¹ . Considering the last two rounds in a TD period and knowing the packet exchange for the triple duplicate ACK, it is possible to get the formulation of Q.

Given that there is a sequence of losses in one round, the probability that the ﬁrst k packets are ACKed in a round of w packets is

A(w, k) = (1− p)^kp

1− (1 − p)^w (3.4)

The probability, in the last round before a TO (when n packets are sent), that m packets are ACKed in sequence and the others are lost is:

C(n, m) =

(1− p)^mp m≤ n − 1

(1− p)ⁿ m = n (3.5)

Then Q(w), the probability that a loss in a window of size w is a TO, is

Q(w) =

1 if w≤ 3

₂

k=0A(w, k) +_w

k=3A(w, k)₂

m=0C(k, m) otherwise (3.6) that shows how a TO occurs if the number of packets successfully transmitted in the penultimate round is less than 3 or in case the number of packets successfully transmitted in the last round is less than 3. Assuming the independency of the beginning instants of losses in each round, and performing some algebraic manipulations, Q(w) becomes

Q(w) = min

1,(1− (1 − p)³)(1 + (1− p)³(1− (1 − p)^w−3)) 1− (1 − p)^w

(3.7) Noting that

p→0limQ(w)≈ 3

w, (3.8)

the probability of a timeout loss is given by

Q(w)≈ min(1, 3

w) (3.9)

(24)

Formula 3.2 then becomes dW_i(t) = dt

R_i(q(t)) − (1 − Q(Wi))W_i(t)

2 dN_i(t) + Q(W_i)(1− Wi(t))dN_i(t) (3.10) The approach followed to get the probability of timeout losses led to a formulation of the throughput in (3.3), which considers a complete period between timeout losses.

As the model is used to approach congestion control algorithms, it is good hypothesis to state that no timeout losses are experienced, especially when using ECN mechanism (even if ECN acknowledgment packets can be lost in the network, if the congestion algorithm is implemented in the whole network it should prevent packets to be dropped). The model used in [2] does not take into account timeout losses in order to perform a linearization of the model and the throughput is expressed in a single round view, taking the form W_i(t)/R_i(t).

Ignoring the timeout term in (3.2) the TCP window size behavior becomes dW_i(t) = dt

R_i(q(t)) −W_i(t)

2 dN_i(t) (3.11)

Taking the expected values of each side, eq. (3.11) becomes E[dW_i(t)] = E

dt R_i(q(t))

− E[W_i(t)dN_i(t)]

2 (3.12)

Even if the window size and the arrivals of losses at the source are not independent, (3.12) can be approximated as follows, without losing the matching behavior of the model:

dE[W_i] = E

dt R_i(q)

−E[W_i]λ_i(t)

2 dt, (3.13)

where λ_i(t) represents the arrival rate in the Poisson process of the loss indications to the source.

Considering one single congested router, a loss indication arrives at the source approxi- mately one round trip time (R_i) after a packet has been dropped (or marked) at the queue.

The following approximation is necessary because the number of hops and the position of the congested router in the network are variables (the delay between the marking of a packet and the loss indication to the source can get values in the range [R_i/2, R_i] according to the position where congestion is experienced).

In AQM schemes the marking probability is proportional to the load of the queue:

knowing the throughput B(t− Ri) at the time t− Ri, the rate of loss indications that the source gets at time t is given by p(¯x(t− Ri))· B(t − Ri), whose expected value can be expressed as

λ_i(t) = p(¯x(t− R_i)) W¯_i(t− R_i)

R_i(¯q(t− Ri)) (3.14) Substituting (3.14) into (3.13) and approximating the expected value of the RTT with the RTT corresponding to the expected queue length, it becomes

dE[W_i]≈ dt

R_i(¯q) − W¯_iW¯_i(t− R_i)

2R_i(¯q(t− R_i))p(¯x(t− R_i))dt (3.15)

(25)

3.2. CONTROL THEORETICAL ANALYSIS 25

The approximation done on the RTT expected value is possible considering the high deterministic contribution to the RTT formulation (propagation delay) and the limited random part (queuing delay), which is more and more stabilized when the number of ﬂows increases. The diﬀerential form of (3.15) is then

d ¯W_i dt = 1

R_i(¯q) − W¯_iW¯_i(t− R_i)

2R_i(¯q(t− R_i))p(¯x(t− R_i)) (3.16) and must be coupled with a diﬀerential equation for the queue length.

The Lindley’s equation [22] for discrete systems deﬁnes the queue behavior at time t as

q(t) = q(t− Ri)− 1_q(t)C +

N i=1

W_i(t− Ri), (3.17) where 1_q(t) takes value 1 if q(t) > 0.

Equation (3.17) can be transformed into a diﬀerential equation dq(t)

dt =−1_q(t)C +

N i=1

W_i

R_i(q) (3.18)

Taking the expectation of both sides, it becomes d¯q(t)

dt = E[−1_q(t)]C +

N i=1

W¯_i

R_i(¯q), (3.19)

where the same approximation on the RTT performed in (3.16) is used.

For a bottleneck router, the size of the queue is positive with probability close to 1:

considering the traffic as a fluid, at each instant a packet is routed according to the service capacity of the queue. Thus, the differentiate behavior of the queue length becomes

d¯q(t)

dt ≈ −C +^N

i=1

W¯_i

R_i(¯q) (3.20)

Equations (3.16) and (3.20) describe an accurate model of the dynamics of TCP.

3.2 Control theoretical analysis

Considering ﬂows with identical throughput competing for the service, the model is then described by the following coupled nonlinear diﬀerential equations:

W (t)˙ = 1

R(t) −W (t)W (t− R(t))

2R(t− R(t)) p(t− R(t))

˙

q(t) = W (t)

R(t)N (t)− C, (3.21)

where

(26)

W = expected TCP window size (packets) q = expected queue length (packets) R = round trip time (seconds) C = link capacity (packets/second) N = number of TCP sessions

p = probability of packet dropping/marking t = time

The dropping/marking probability p can take values in [0,1], while W and q are upper bounded by the network conﬁguration. In order to treat the model with common control theoretical methods, a linearization of the system is necessary. The small signal lineariza- tion is performed around the operating point (W₀, q₀, p₀), with W and q as states and p as input of the system. The operating point is calculated in the steady-state condition, where a stabilized ( ˙W = 0 and ˙q = 0) behavior is expected: it is deﬁned by

W = 0˙ ⇒ W₀²p₀ = 2 (3.22)

˙

q = 0 ⇒ W₀= R₀C

N (3.23)

In order to perform the linearization, the number of ﬂows N and the round trip time R are approximated as constants along the process.

The temporary functions f, g are used to perform partial derivatives on ˙W and ˙q:

f (W, W_R, q, p) = 1

R −W (t)W_R(t)

2R p(t− R) g(W, q) = W (t)

R N − C, (3.24)

where W_R(t) = W (t− R). Evaluating partial derivatives at the operating point deﬁned in (3.23) and substituting them in

δ ˙W (t) = δf

δWδW (t) + δf

δW_RδW_R(t) +δf

δqδq(t) +δf

δpδp(t− R) δ ˙q(t) = δg

δWδW (t) +δg

δqδq(t), (3.25)

the linearized model becomes (knowing that R is expressed as q/C + T_p):

δ ˙W (t) = − N

R₀²CδW (t)− N

R²₀CδW_R(t) + 0· δq − R₀C²

2N² δp(t− R₀) δ ˙q(t) = N

R₀δW (t)− 1

R₀δq(t), (3.26)

whose short formulation is δ ˙W (t) = − N

R²₀C(δW (t) + δW (t− R₀))−R₀C²

2N² δp(t− R₀) δ ˙q(t) = N

R₀δW (t)− 1

R₀δq(t) (3.27)

(27)

3.2. CONTROL THEORETICAL ANALYSIS 27

After the small signal linearization process the states of the model are δW .

= W− W₀ and δq .

= q− q₀, while the input is represented by δp .

= p− p₀. The delayed contribution to the window size in the first equation in (3.27) can be considered at the time t because its operational values don’t change much in a infinitesimal view if the window is much larger than one. This assumption is reasonable in the considered networks, and the simplified model becomes

δ ˙W (t) = − 2N

R₀²CδW (t)−R₀C²

2N² δp(t− R₀) δ ˙q(t) = N

R₀δW (t)− 1

R₀δq(t) (3.28)

In Figure 2.8 a representation of the system with blocks was presented, while the transfer function representation is given in (2.6).

3.2.1 Stability range

The diﬀerential equations (3.28) can also be represented with state-space formulation:

this method is widely used in control theory because inputs, states and outputs are simply related by matrixes. The common structure is given by

˙

x(t) = A(t)x(t) + B(t)u(t)

y(t) = C(t)x(t) + D(t)u(t), (3.29) where x are the states, u the inputs and y the outputs of the system.

The stability of a system is ensured when the eigenvalues of the matrix A are negative:

from (3.28) the state-state matrix is given by

−_R^2N2

0C 0

N

R0 −_R¹₀

whose eigenvalues are

− 2N

R₀²C and − 1

R₀ (3.30)

As all the network parameters (N , R₀, C) are positive quantities, the eigenvalues are always negative and the TCP dynamics together with the queue behavior is open-loop stable at the equilibrium around the operating points.

The use of controllers to make the system faster and to reduce the deviance of the output can lead to instability. The analysis of the algorithms studied in this master thesis considers stability as an important issue and theoretical results as well as limitations in the controller’s use will be presented for each solution.

The analysis of stability for the controllers in this master thesis’ work is performed in frequency domain with phase and gain margins. Stability is granted only if both margins are positive: the general idea in control theory is to have as large value as possible, in order to prevent disturbances or changes in the system leading to instability. As the

(28)

realistic range of the network parameters is limited, the strategy under the stability check is to create a number of worst-cases with which the stability of the system is tested. The magnitude of the margins becomes then a minor issue, being the positive sign of both margins the main considered proof for a stable system.

Even is a system with low margins can be easily led into instability by small changes in the network, those changes are accounted by the worst-case tests.

3.2.2 Time delay

The input to the plant (probability of packet dropping/marking) is delayed by one RTT and this means that the eﬀect of a change in the value of the input at the time t can inﬂuence the states and the output only at t + R₀. The disturbances or changes in the input that last less than T are impossible to be properly contrasted by the controller.

As the input itself is influenced by the network parameters through feedback, the fast- changing dynamics of the flows entering the router is difficult to be caught.

In the frequency domain, the delay present in the system limits the achievement of cut-oﬀ frequencies ω_c higher then 1/R₀.

The controllers developed in this master thesis try to reach the best possible results, but the problems described above create strong limitations to performance improvements.

3.3 Algorithms overview

The algorithms studied in this master thesis’ work use a common queueing and service structure, but they diﬀerentiate themselves in the way they calculate the probability of dropping/marking packets to prevent congestion and stabilize the queue size.

Contrary to what RED does, the control theoretical implementations use the instanta- neous queue size to calculate the dropping probability: the time constant of the low pass ﬁlter (representing the Exponential Weighted Moving Average system) is source of tuning problems in RED and it can be avoided in faster systems.

In all the presented implementations the sampling frequency is quite low (between 50 or 160 Hz), and this makes the computational eﬀort decrease by several orders of magnitude compared to RED.

The router queue follows a FIFO (First-In First-Out) policy, with a preliminary choice of incoming packets according to the queue length (and network conditions, for adaptive algorithms). The implementations can be represented with the ﬂow chart in Figure 3.2 and the OTcl code can be found in Appendix B.

(29)

3.3. ALGORITHMS OVERVIEW 29

a packet arrives at the router

queue length is checked

queue full?

p is calculated at every sampling instant

random u is picked from uniform distribution compare

u and p u > p ?

YES

NO

check NO ECT bit in IP header

YES ECN

capable?

set CE flag

YES NO

packet enqueued packet

dropped

Figure 3.2: Queuing scheme for incoming packets

(30)

(31)

Chapter 4

NS-2 and simulation settings

The mathematical model of the TCP dynamics is an approximation of the real behavior and the control theoretical analysis of the AQM algorithms is a proper step: a simulation program is needed to add a random contribution to the model and fully catch network behavior. Only after the validation of the analytical results with a complex simulator, an algorithm can be judged as performing well or not.

4.1 NS-2

In order to test all the designs that are studied during this master thesis work, they are implemented into the discrete event simulator NS-2.

NS-2 is a open source simulator developed and constantly updated through contribu- tions of researchers and students. In version 2.27 that is used for this project various AQM implementations are present, included the Proportional-Integral controller. Thanks to the common structure and other similar points, the PI implementation will be used as the starting point to write the code for PID and the other controllers. Once the simulator is installed on a Linux emulator (Cygwin) and the veriﬁcation test made, each new queue implementation must be recognized from the program: this means that ﬁrst the implementation in C++ is added to the Queue directory along with the working variables, and then the simulator is taught where to look for the new implementation. All the steps needed to perform these operations will be described in Appendix A.

4.1.1 Which language?

NS-2 is written in C++, and the objects described in C++ are called from OTCL, (Object- oriented TCL). This is the language used to describe the topology of the network, the type of transmission, routing options, and everything regarding the simulation (timers, protocols). In order to perform the tests on the implementations, both languages are used:

with OTCL the basic topology (nodes and links) is described, the ﬂows with relative delay times are set and the queue policy is decided. C++ let the implementation of the AQM algorithms to be designed, using the signals which come from the network as inputs (in this case the queue length at sampling times).

31

(32)

buffer router source 1

source n source 3 source 2

dest. 1

dest. n dest. 3 dest. 2

Figure 4.1: The general topology of the simulations

4.2 Description of the experiments

Analyzing the simulations used in [1] and [2], a group of 7 experiments has been designed.

A broad range of network conditions is represented in order to perform a good comparison among the existing algorithms and the ones developed during the master thesis’ work. In the following a description of the topologies and systems are presented, while the results of the experiments for each design can be found in Sections 5.4, 6.6, 6.9, 7.6 and ??.

In all 7 experiments, the TCP version used is ”New Reno” and the ECN marking protocol is used. In the following sections a description of the settings for each experiment is given, while the code used in NS-2 can be found in Appendix B.

4.2.1 Experiment 1

The ﬁrst experiment is performed around the working conditions used to linearize the system in [1]. Except for some simpliﬁcations made during the linearization process, this simulated environment should correspond to the average one used in Matlab to test the theoretical designs, which the time and frequency-domain graphs come from.

The settings are:

• Number of ﬂows (N): 60 FTP ﬂows (the average of the FTP packets can be omitted, as the PID controller does not follow a packet-size policy)

• Fixed Round-Trip Time: 200ms (with queue length at the reference point)

• Capacity of the router: 3750 pkt/sec (corresponding to 15Mb/s for an average packet size of 500 Bytes)

4.2.2 Experiment 2

On the same topology of Experiment 1, this experiment diﬀers for the variable RTT of the packets and for a block of some ﬂows between 100 and 120 seconds of the simulation: this is made in order to test the response of the system at sudden changes in the network. One of the aims of AQM algorithms is to keep the routing queue stable, whatever happens in

(33)

4.2. DESCRIPTION OF THE EXPERIMENTS 33

the network: a block of ﬂows is a good ”shock” in the router. Here is the list of features of the experiment:

• N: 60 FTP ﬂows

• RTT uniformly varying between 130 ms and 220 ms

• Block of 20 ﬂows (1/3 of the total number) between 100 and 120s.

4.2.3 Experiment 3

This experiment uses the same topology as of Experiment 1, with a reduced number of ﬂows: it was observed in the theoretical analysis that a small number of ﬂows (as well as a high round-trip time) leads towards instability. The behavior of the system under these conditions must be kept in consideration, in order to avoid unresponsive queue length.

• number of ﬂows N: 30 FTP ﬂows

• RTT: 200 ms 4.2.4 Experiment 4

It’s also important to check the behavior of the controller when the number of flows grows a lot, i.e. in the situation when the congestion control should be more effective managing many connections at the same time. All the parameters are kept fixed as in Experiment 1 and 3, but

• N: 400 FTP ﬂows

Periodical behavior and number of ﬂows

After having introduced the first experiments, which focus on the number of flows as the main parameter, some basic considerations can be added: the router gets packets from different sources which try to transmit at the same time, and enqueues the packets in its buffer.

In case of standard TCP, the buffer is filled until it gets full and then packets are discarded, causing the TCP protocol to wait for the Time-out (or a triple duplicate-ack) and for a new transmission of that packet. As the buffer is full for most of the packets which arrive in the range of time when congestion happens, many of the sources will behave in same way, waiting for a signal from the other end and then reducing the transmission window. This creates a periodical behavior of the sources and of the queue length, with very bad performance.

Even ECN can not solve this problem as the transmitters are warned about possible congestion all together, and react in the same way: reducing the transmission window.

With the use of an AQM algorithm the problem of synchronization of ﬂows is solved thanks to the probability of dropping packets (or marking, in case of ECN) being distributed

Router-based Congestion Control through Control Theoretic Active Queue Management