Alibaba’s innovative system, ZooRoute, is designed to ensure continuous operation of cloud networks in the event of failures. By swiftly redirecting traffic to alternative paths, ZooRoute minimizes downtime, providing uninterrupted service to end users. The implementation of ZooRoute has led to a remarkable 92% reduction in overall outage time for Alibaba Cloud over the past 18 months.
Another key development is Hermes, a system that optimizes layer 7 load balancers by utilizing eBPF-based scheduling to distribute traffic more efficiently. This approach has significantly reduced CPU imbalances and uneven connection counts, resulting in a nearly 100% decrease in worker “hangs” and a 19% decrease in operating costs for layer 7 load balancing infrastructure.
Additionally, Alibaba’s Nezha system focuses on balancing workloads in SmartNICs, addressing performance discrepancies among network cards with their own processors. By monitoring and redistributing tasks accordingly, Nezha enhances SmartNIC performance and eliminates bottlenecks, ultimately improving overall network efficiency.
Alibaba’s research showcases their commitment to enhancing the efficiency and reliability of cloud infrastructure, highlighting the importance of software-based solutions in managing complex networks. These advancements not only benefit providers by reducing outages and unnecessary hardware spending but also enhance customer confidence in cloud services.