Home

Decentralised Training of Foundation Models in Heterogeneous Environments

ETH
B. Yuan, Y. He, J. Davis

(2022)

11th September 2024Last Updated 11th September 2024
  • Introduction

    Introduces a novel scheduling algorithm for decentralised, parallelised training of foundation models, achieving up to 4.8x speedup over state-of-the-art approaches by optimising communication between heterogeneous compute resources

  • Motivation

    Unlock untapped decentralised, heterogeneous compute resources for training huge-scale foundation models

  • Problem Statement

    Optimises communication requirements by solving an allocation problem over a graph, minimising communication costs for data and pipeline parallelism

  • Methodology

    Proposes a hybrid genetic algorithm to solve allocation problem

  • Results

    Experimental results demonstrate up to 4.8x throughput gain in worldwide geo-distributed workers setup when compared to state-of-the-art methods like Megatron and DeepSpeed


Reach me

© Mika Senghaas 2024