Invited Speaker: Zuqing Zhu

Accelerating DML Training in OCS-Based DCNs

Joana Girard-Jollet
Professor, University of Science and Technology of China

Abstract

This talk explores the benefits of in-network computing (INC) empowered all-optical interconnects (AOI) in accelerating distributed machine learning (DML) jobs.

We describe the network architecture and service model, and present a large-job-first and a grouping-based interleaved scheduling policy for minimizing job completion time.

The results verify the superiority of our scheduling policies and show the effectiveness of INC and AOI in mitigating bandwidth bottlenecking during DML training.

About the Speaker