Home

SWARM Parallelism - Training Large Models Can Be Surprisingly Communication-Efficient

HSE
M. Ryabinin, T. Dettmers, M. Diskin, M. Diskin, A. Borzunov

(2023)

13th September 2024Last Updated 13th September 2024
  • Introduction

    HSE paper devising an adaptive load-balancing strategy to enable pipeline parallelism for unreliable, heterogeneous workers connected over the Internet

  • Motivation

    Utilise ubiquitous heterogeneous computing resources for training large-scale foundation models, overcoming limitations of traditional data centre approaches

  • Problem

    Traditional pipeline parallelism suffers from bottlenecks due to the slowest worker and lack of robustness to worker failures, leading to inefficient resource utilisation

  • Methodology

    The SWARM framework partitions a set of workers into swarms, where each peer in a swarm handles the same subset of layers. There are two main components enabling robustness and efficiency:

    1. Stochastic wiring: Dynamically routes requests between pipeline stages based on worker performance (e.g. higher throughput workers are assigned more requests)
    2. Adaptive swarm balancing: Reallocates workers between stages to balance workload and handle failures/additions (e.g. move workers from underutilised to overutilised stages)
  • Results

    Experimental results show higher GPU utilisation and good convergence in heterogeneous, unreliable distributed setup


Reach me

© Mika Senghaas 2024