Ethernet flow control

Mar. 07, 2024

Tags: Mechanical Parts & Fabrication Services

Technique to suspend transmission to avoid congestion

Wireshark screenshot of an Ethernet pause frame

Ethernet flow control is a mechanism for temporarily stopping the transmission of data on Ethernet family computer networks. The goal of this mechanism is to avoid packet loss in the presence of network congestion.

The first flow control mechanism, the pause frame, was defined by the IEEE 802.3x standard. The follow-on priority-based flow control, as defined in the IEEE 802.1Qbb standard, provides a link-level flow control mechanism that can be controlled independently for each class of service (CoS), as defined by IEEE P802.1p and is applicable to data center bridging (DCB) networks, and to allow for prioritization of voice over IP (VoIP), video over IP, and database synchronization traffic over default data traffic and bulk file transfers.

Description
[
edit
]

A sending station (computer or network switch) may be transmitting data faster than the other end of the link can accept it. Using flow control, the receiving station can signal the sender requesting suspension of transmissions until the receiver catches up. Flow control on Ethernet can be implemented at the data link layer.

The first flow control mechanism, the pause frame, was defined by the Institute of Electrical and Electronics Engineers (IEEE) task force that defined full duplex Ethernet link segments. The IEEE standard 802.3x was issued in 1997.[1]

Pause frame
[
edit
]

An overwhelmed network node can send a pause frame, which halts the transmission of the sender for a specified period of time. A media access control (MAC) frame (EtherType 0x8808) is used to carry the pause command, with the Control opcode set to 0x0001 (hexadecimal).[1] Only stations configured for full-duplex operation may send pause frames. When a station wishes to pause the other end of a link, it sends a pause frame to either the unique 48-bit destination address of this link or to the 48-bit reserved multicast address of 01-80-C2-00-00-01.[2]: Annex 31B.3.3 The use of a well-known address makes it unnecessary for a station to discover and store the address of the station at the other end of the link.

Another advantage of using this multicast address arises from the use of flow control between network switches. The particular multicast address used is selected from a range of address which have been reserved by the IEEE 802.1D standard which specifies the operation of switches used for bridging. Normally, a frame with a multicast destination sent to a switch will be forwarded out to all other ports of the switch. However, this range of multicast address is special and will not be forwarded by an 802.1D-compliant switch. Instead, frames sent to this range are understood to be frames meant to be acted upon only within the switch.

A pause frame includes the period of pause time being requested, in the form of a two-byte (16-bit), unsigned integer (0 through 65535). This number is the requested duration of the pause. The pause time is measured in units of pause quanta, where each quanta is equal to 512 bit times.

By 1999, several vendors supported receiving pause frames, but fewer implemented sending them.[3][4]

Issues
[
edit
]

One original motivation for the pause frame was to handle network interface controllers (NICs) that did not have enough buffering to handle full-speed reception. This problem is not as common with advances in bus speeds and memory sizes. A more likely scenario is network congestion within a switch. For example, a flow can come into a switch on a higher speed link than the one it goes out, or several flows can come in over two or more links that total more than an output link's bandwidth. These will eventually exhaust any amount of buffering in the switch. However, blocking the sending link will cause all flows over that link to be delayed, even those that are not causing any congestion. This situation is a case of head-of-line blocking (HOL), and can happen more often in core network switches due to the large numbers of flows generally being aggregated. Many switches use a technique called virtual output queues to eliminate the HOL blocking internally, so will never send pause frames.[4]

Subsequent efforts
[
edit
]

Congestion management
[
edit
]

Another effort began in March 2004, and in May 2004 it became the IEEE P802.3ar Congestion Management Task Force. In May 2006 the objectives of the task force were revised to specify a mechanism to limit the transmitted data rate at about 1% granularity. The request was withdrawn and the task force was disbanded in 2008.[5]

Priority flow control
[
edit
]

Ethernet flow control disturbs the Ethernet class of service (defined in IEEE 802.1p), as the data of all priorities are stopped to clear the existing buffers which might also consist of low priority data. As a remedy to this problem, Cisco Systems defined their own priority flow control extension to the standard protocol. This mechanism uses 14 bytes of the 42-byte padding in a regular pause frame. The MAC control opcode for a Priority pause frame is 0x0101. Unlike the original pause, Priority pause indicates the pause time in quanta for each of eight priority classes separately.[6] The extension was subsequently standardized by the Priority-based Flow Control (PFC) project authorized on March 27, 2008, as IEEE 802.1Qbb.[7] Draft 2.3 was proposed on June 7, 2010. Claudio DeSanti of Cisco was editor.[8] The effort was part of the data center bridging task group, which developed Fibre Channel over Ethernet.[9]

References
[
edit
]

Data transmission rate management

Not to be confused with Control flow

In data communications, flow control is the process of managing the rate of data transmission between two nodes to prevent a fast sender from overwhelming a slow receiver. Flow control should be distinguished from congestion control, which is used for controlling the flow of data when congestion has actually occurred.[1] Flow control mechanisms can be classified by whether or not the receiving node sends feedback to the sending node.

Flow control is important because it is possible for a sending computer to transmit information at a faster rate than the destination computer can receive and process it. This can happen if the receiving computers have a heavy traffic load in comparison to the sending computer, or if the receiving computer has less processing power than the sending computer.

Stop-and-wait flow control is the simplest form of flow control. In this method the message is broken into multiple frames, and the receiver indicates its readiness to receive a frame of data. The sender waits for a receipt acknowledgement (ACK) after every frame for a specified time (called a time out). The receiver sends the ACK to let the sender know that the frame of data was received correctly. The sender will then send the next frame only after the ACK.

Operations
[
edit
]

Sender: Transmits a single frame at a time.
Sender waits to receive ACK within time out.
Receiver: Transmits acknowledgement (ACK) as it receives a frame.
Go to step 1 when ACK is received, or time out is hit.

If a frame or ACK is lost during transmission then the frame is re-transmitted. This re-transmission process is known as ARQ (automatic repeat request).

The problem with Stop-and-wait is that only one frame can be transmitted at a time, and that often leads to inefficient transmission, because until the sender receives the ACK it cannot transmit any new packet. During this time both the sender and the channel are unutilised.

Pros and cons of stop and wait
[
edit
]

Pros

The only advantage of this method of flow control is its simplicity.

Cons

The sender needs to wait for the ACK after every frame it transmits. This is a source of inefficiency, and is particularly bad when the propagation delay is much longer than the transmission delay.[2]

Stop and wait can also create inefficiencies when sending longer transmissions.[3] When longer transmissions are sent there is more likely chance for error in this protocol. If the messages are short the errors are more likely to be detected early. More inefficiency is created when single messages are broken into separate frames because it makes the transmission longer.[4]

Sliding window
[
edit
]

A method of flow control in which a receiver gives a transmitter permission to transmit data until a window is full. When the window is full, the transmitter must stop transmitting until the receiver advertises a larger window.[5]

Sliding-window flow control is best utilized when the buffer size is limited and pre-established. During a typical communication between a sender and a receiver the receiver allocates buffer space for n frames (n is the buffer size in frames). The sender can send and the receiver can accept n frames without having to wait for an acknowledgement. A sequence number is assigned to frames in order to help keep track of those frames which did receive an acknowledgement. The receiver acknowledges a frame by sending an acknowledgement that includes the sequence number of the next frame expected. This acknowledgement announces that the receiver is ready to receive n frames, beginning with the number specified. Both the sender and receiver maintain what is called a window. The size of the window is less than or equal to the buffer size.

Sliding window flow control has far better performance than stop-and-wait flow control. For example, in a wireless environment if data rates are low and noise level is very high, waiting for an acknowledgement for every packet that is transferred is not very feasible. Therefore, transferring data as a bulk would yield a better performance in terms of higher throughput.

Sliding window flow control is a point to point protocol assuming that no other entity tries to communicate until the current data transfer is complete. The window maintained by the sender indicates which frames it can send. The sender sends all the frames in the window and waits for an acknowledgement (as opposed to acknowledging after every frame). The sender then shifts the window to the corresponding sequence number, thus indicating that frames within the window starting from the current sequence number can be sent.

Go back N
[
edit
]

An automatic repeat request (ARQ) algorithm, used for error correction, in which a negative acknowledgement (NACK) causes retransmission of the word in error as well as the next N–1 words. The value of N is usually chosen such that the time taken to transmit the N words is less than the round trip delay from transmitter to receiver and back again. Therefore, a buffer is not needed at the receiver.

The normalized propagation delay (a) = propagation time (Tp)⁄transmission time (Tt), where Tp = length (L) over propagation velocity (V) and Tt = bitrate (r) over framerate (F). So that a =LF⁄Vr.

To get the utilization you must define a window size (N). If N is greater than or equal to 2a + 1 then the utilization is 1 (full utilization) for the transmission channel. If it is less than 2a + 1 then the equation N⁄1+2a must be used to compute utilization.[6]

Selective repeat
[
edit
]

Selective repeat is a connection oriented protocol in which both transmitter and receiver have a window of sequence numbers. The protocol has a maximum number of messages that can be sent without acknowledgement. If this window becomes full, the protocol is blocked until an acknowledgement is received for the earliest outstanding message. At this point the transmitter is clear to send more messages.[7]

Comparison
[
edit
]

This section is geared towards the idea of comparing stop-and-wait, sliding window with the subsets of go back N and selective repeat.

Error free: 1 2 a + 1 {\displaystyle {\frac {1}{2a+1}}} .[citation needed]

With errors: 1 − P 2 a + 1 {\displaystyle {\frac {1-P}{2a+1}}} .[citation needed]

Selective repeat
[
edit
]

We define throughput T as the average number of blocks communicated per transmitted block. It is more convenient to calculate the average number of transmissions necessary to communicate a block, a quantity we denote by 0, and then to determine T from the equation T = 1 b {\displaystyle T={\frac {1}{b}}} .[citation needed]

Transmit flow control
[
edit
]

Transmit flow control may occur:

between data terminal equipment (DTE) and a switching center, via data circuit-terminating equipment (DCE), the opposite types interconnected straightforwardly,
or between two devices of the same type (two DTEs, or two DCEs), interconnected by a crossover cable.

The transmission rate may be controlled because of network or DTE requirements. Transmit flow control can occur independently in the two directions of data transfer, thus permitting the transfer rates in one direction to be different from the transfer rates in the other direction. Transmit flow control can be

either stop-and-wait,
or use a sliding window.

Flow control can be performed

either by control signal lines in a data communication interface (see serial port and RS-232),
or by reserving in-band control characters to signal flow start and stop (such as the ASCII codes for XON/XOFF).

Hardware flow control
[
edit
]

In common RS-232 there are pairs of control lines which are usually referred to as hardware flow control:

RTS (request to send) and CTS (clear to send), used in RTS flow control
DTR (data terminal ready) and DSR (data set ready), used in DTR flow control

Hardware flow control is typically handled by the DTE or "master end", as it is first raising or asserting its line to command the other side:

In the case of RTS control flow, DTE sets its RTS, which signals the opposite end (the slave end such as a DCE) to begin monitoring its data input line. When ready for data, the slave end will raise its complementary line, CTS in this example, which signals the master to start sending data, and for the master to begin monitoring the slave's data output line. If either end needs to stop the data, it lowers its respective "data readiness" line.
For PC-to-modem and similar links, in the case of DTR flow control, DTR/DSR are raised for the entire modem session (say a dialup internet call where DTR is raised to signal the modem to dial, and DSR is raised by the modem when the connection is complete), and RTS/CTS are raised for each block of data.

An example of hardware flow control is a half-duplex radio modem to computer interface. In this case, the controlling software in the modem and computer may be written to give priority to incoming radio signals such that outgoing data from the computer is paused by lowering CTS if the modem detects a reception.

Polarity:
- RS-232 level signals are inverted by the driver ICs, so line polarity is TxD-, RxD-, CTS+, RTS+ (clear to send when HI, data 1 is a LO)
- for microprocessor pins the signals are TxD+, RxD+, CTS-, RTS- (clear to send when LO, data 1 is a HI)

Software flow control
[
edit
]

Conversely, XON/XOFF is usually referred to as software flow control.

Open-loop flow control
[
edit
]

The open-loop flow control mechanism is characterized by having no feedback between the receiver and the transmitter. This simple means of control is widely used. The allocation of resources must be a "prior reservation" or "hop-to-hop" type.

Open-loop flow control has inherent problems with maximizing the utilization of network resources. Resource allocation is made at connection setup using a CAC (connection admission control) and this allocation is made using information that is already "old news" during the lifetime of the connection. Often there is an over-allocation of resources and reserved but unused capacities are wasted. Open-loop flow control is used by ATM in its CBR, VBR and UBR services (see traffic contract and congestion control).[1]

Open-loop flow control incorporates two controls; the controller and a regulator. The regulator is able to alter the input variable in response to the signal from the controller. An open-loop system has no feedback or feed forward mechanism, so the input and output signals are not directly related and there is increased traffic variability. There is also a lower arrival rate in such system and a higher loss rate. In an open control system, the controllers can operate the regulators at regular intervals, but there is no assurance that the output variable can be maintained at the desired level. While it may be cheaper to use this model, the open-loop model can be unstable.

Closed-loop flow control
[
edit
]

The closed-loop flow control mechanism is characterized by the ability of the network to report pending network congestion back to the transmitter. This information is then used by the transmitter in various ways to adapt its activity to existing network conditions. Closed-loop flow control is used by ABR (see traffic contract and congestion control).[1] Transmit flow control described above is a form of closed-loop flow control.

This system incorporates all the basic control elements, such as, the sensor, transmitter, controller and the regulator. The sensor is used to capture a process variable. The process variable is sent to a transmitter which translates the variable to the controller. The controller examines the information with respect to a desired value and initiates a correction action if required. The controller then communicates to the regulator what action is needed to ensure that the output variable value is matching the desired value. Therefore, there is a high degree of assurance that the output variable can be maintained at the desired level. The closed-loop control system can be a feedback or a feed forward system:

A feedback closed-loop system has a feed-back mechanism that directly relates the input and output signals. The feed-back mechanism monitors the output variable and determines if additional correction is required. The output variable value that is fed backward is used to initiate that corrective action on a regulator. Most control loops in the industry are of the feedback type.

In a feed-forward closed loop system, the measured process variable is an input variable. The measured signal is then used in the same fashion as in a feedback system.

The closed-loop model produces lower loss rate and queuing delays, as well as it results in congestion-responsive traffic. The closed-loop model is always stable, as the number of active lows is bounded.

References
[
edit
]

Sliding window:

[1] last accessed 27 November 2012.

Previous: What does a valve do in a valve port?
Next: 5 Benefits of Water Pressure Regulators

Comments

Ethernet flow control

Description
[
edit
]

Pause frame
[
edit
]

Issues
[
edit
]

Subsequent efforts
[
edit
]

Congestion management
[
edit
]

Priority flow control
[
edit
]

See also
[
edit
]

References
[
edit
]

Operations
[
edit
]

Pros and cons of stop and wait
[
edit
]

Sliding window
[
edit
]

Go back N
[
edit
]

Selective repeat
[
edit
]

Comparison
[
edit
]

Selective repeat
[
edit
]

Transmit flow control
[
edit
]

Hardware flow control
[
edit
]

Software flow control
[
edit
]

Open-loop flow control
[
edit
]

Closed-loop flow control
[
edit
]

See also
[
edit
]

References
[
edit
]