Applying accelerator traffic shaping findings to PCIe congestion in multi-GPU servers

Investigate how to incorporate the accelerator traffic shaping observations—such as message-size-aware scheduling, queue-pair allocation, and managing ingress/egress bandwidth asymmetry—into managing PCIe congestion for multi-GPU servers.

Background

Earlier sections present observations about performance isolation breakage due to message size mixtures, queue-pair counts, and DMA read/write directionality on PCIe. Extending these insights to multi-GPU topologies is nontrivial due to more complex and higher-volume traffic along PCIe fabrics.

The authors explicitly note it as an open problem to incorporate their findings when managing PCIe congestion for multi-GPU servers, indicating the need to translate accelerator-side shaping mechanisms to settings where several GPUs share PCIe resources.

References

Some of the big open problems when applying our design to this setting will be (1) how to perform traffic shaping when GPUs are used under spatial multiplexing, (2) how to incorporate the understanding of GPU internal contention into the traffic patterns to re-shape, and (3) how to incorporate our findings when managing PCIe congestion for multi-GPU servers.

Accelerator-as-a-Service in Public Clouds: An Intra-Host Traffic Management View for Performance Isolation in the Wild  (2407.10098 - Zhao et al., 2024) in Section 6, Managing I/O contention for GPUs