Taurus: Towards A High-Performance and Generic Congestion Control Framework for Datacenter Networks (CoNEXT'25)

Image credit: Unsplash

Abstract

Congestion control (CC) is crucial for datacenter networks (DCNs), and CC frameworks are proposed to enable users to easily deploy new algorithms tailored to diverse scenarios. The framework is desired to be high-performance and generic: (i) allows CC to achieve high throughput and low latency. (ii) supports various algorithms and congestion scenarios. However, prior works either suffer from performance limitations or lack sufficient generality. CCP experiences throughput degradation under heavy traffic, while DOCA-PCC improves performance using hardware but lacks support for detecting and mitigating host congestion. In this paper, we present Taurus, a high-performance and generic CC framework through hardware-software co-design. To this end, Taurus partitions CC functions into distinct tasks and maps them onto suitable hardware/software components while mitigating excessive interaction overhead. Specifically, Taurus designs a collaborative signal collection mechanism to support diverse congestion feedback, a type-aware message report engine to reduce communication overhead, and software built-in handlers to facilitate deployments. We have implemented a fully functional Taurus on commodity servers with FPGA-based NICs. Experimental results show that Taurus supports various CC algorithms in achieving their near-native performance. Compared to CCP, Taurus improves throughput by 32.3%, reduces latency by 96.4%, and lowers CPU overhead by 158.7%. Compared to DOCA-PCC, Taurus improves throughput by 9.3% and reduces latency by 28.8%.

Publication
In Proc. CoNEXT'25
Luyang Li
Luyang Li
Network Architect and Researcher

My research interests focus on high-performance AI and data center networking,