Invited Speaker: Zuqing Zhu
Accelerating DML Training in OCS-Based DCNs

Professor, University of Science and Technology of China
Abstract
This talk explores the benefits of in-network computing (INC) empowered all-optical interconnects (AOI) in accelerating distributed machine learning (DML) jobs.
We describe the network architecture and service model, and present a large-job-first and a grouping-based interleaved scheduling policy for minimizing job completion time.
The results verify the superiority of our scheduling policies and show the effectiveness of INC and AOI in mitigating bandwidth bottlenecking during DML training.
About the Speaker