• No results found

Optimizing Intel Data Direct I/O Technology for Multi-hundred-gigabit Networks

N/A
N/A
Protected

Academic year: 2022

Share "Optimizing Intel Data Direct I/O Technology for Multi-hundred-gigabit Networks "

Copied!
1
0
0

Loading.... (view fulltext now)

Full text

(1)

Optimizing Intel Data Direct I/O Technology for Multi-hundred-gigabit Networks

Alireza Farshin

+

, Amir Roozbeh

+*

, Gerald Q. Maguire Jr.

+

, Dejan Kostic

+

KTH Royal Institute of Technology (EECS) Ericsson Research

farshin@kth.se amirrsk@kth.se maguire@kth.se dmk@kth.se

+ *

WALLENBERG AI,

AUTONOMOUS SYSTEMS AND SOFTWARE PROGRAM

Work supported by SSF, WASP, and ERC.

1 What is DDIO? 4 How to Fine-tune DDIO

Faster link speeds causes DDIO fail to provide the expected benefits, as new incoming packets

can repeatedly evict previously received packets (i.e., not-yet-processed and

already-processed packets) from the LLC. The probability of

eviction is high when:

• High #Receive (RX) descriptors

• High load imbalance factor

• Receiving rate 100 Gbps

• I/O intensive application

• Packet size 512 Byte

A little-discussed register called “IIO LLC WAYS” can be used to tune the capacity of DDIO. Fine-tuning DDIO enables us to process packets with a larger number of RX descriptors while providing the same or better performance.

We need more RX descriptors for 100 Gbps

networks, as additional descriptors reduces the latency incurred by packet loss and PAUSE frames.

2 DDIO Can Become a Bottleneck

Data Direct I/O Technology (DDIO) transfers packets directly to Last Level Cache (LLC) rather than main memory. DDIO

updates a cache line if it is already available in LLC; otherwise, it allocates the cache line in a limited portion of LLC (i.e., 2 ways in a n-way set-associative cache).

DDIO was introduced to improve the performance of I/O applications by mitigating expensive DRAM accesses.

Sending/Receiving Packets via DDIO

I/O Device Traditional DMA

C C C C

C C C C

C C C C

Logical LLC

CPU Socket

Loading the Packets

Main Memory

The default value has only 2 set bits 1 1 0 0 0 0 0 0 0 0 0

IIO LLC WAYS

6 Conclusion

There is no one-size-fits-all approach to utilize DDIO. Therefore, it is important to optimize DDIO based on the

characteristics of applications and their workload, especially for

multi-hundred-gigabit networks.

0 300 600 900 1200 1500 1800

512 1024 2048 4096

99th Percentile Latency (µs)

Number of RX Descriptors

2Way 4Way 6Way 8Way

Lower tail latency with larger number

of RX descriptors

5 Toward 200 Gbps

Problem: DDIO can degrade performance with faster link

speeds, due to the higher cache injection rate.

Approach: LLC could be bypassed for low-priority or DDIO-insensitive

application, thus making room for the high-priority or highly-DDIO-sensitive

applications. Bypassing could be done via:

• Disabling DDIO for an specific I/O device or

• Exploiting a remote processor’s socket to DMA data

0 200 400 600 800 1000 1200 1400

100 Gbps 200 Gbps

99th Percentile Latency (µs)

30%

Forwarding Rate

Moreover, performance of DDIO only

matters when an application is I/O bound, rather than CPU/memory bound.

3 Sensitivity to DDIO

Different applications have different levels of sensitivity to DDIO.

Memcached TCP UDP

NVMe Full

Write Random

Write Random

Read

Gray applications show 5% improvement when DDIO is enabled NFV

Chain L2 FW

Stateful NFV Service Chain

Router NAPT Load Balancer

0 20 40 60 80 100

0 10 20 30 40 50 60 70 80 90 100 0 20 40 60 80 100

DDIO Performance (%) Throughput (Gbps)

Relative Processing Time

DDIO Write DDIO Read

Throughput

Increasing processing time improves DDIO performance,

but reduces throughput

References

Related documents

There is a strong correlation between MV and MR (Figure 9). One can also see that there are two substances, located in the right-hand side of the plot, for which the

Our study shows that DDIO does not provide the expected bene�ts in some scenarios, speci�cally when incoming packets repeatedly evict previously received packets

Optimizing Intel Data Direct I/O Technology for Multi-hundred-gigabit Networks. Alireza Farshin * , Amir Roozbeh *+ ,

In alignment with the desire for better cache management, this paper studies the current implementation of Direct Cache Access (DCA) in Intel processors, i.e., Data Direct

As rotation and scale is estimated outside of feature description, during key-point extraction, this means that in practice the project will focus on small

HOG features of speed limit signs, which were extracted from different traffic scenes, were computed and a Gentle AdaBoost classifier was invoked to evaluate

1.6.2.2 Värmeväxlardel levererad delad i två höljesdelar och rotor (endast SILVER C RX) .... 1.7 Eventuell montering av värmeväxlardel (Endast SILVER C

For integration purposes, a data collection and distribution system based on the concept of cloud computing is proposed to collect data or information pertaining