Training Models at Scale

Name: Training Models at Scale
Start: 2024-03-11T17:00:00Z
End: 2024-03-11T19:00:00Z
Location: University of Amsterdam

Phillip Lippe

Parallelization techniques enable efficient training of large models like in the chinchilla scaling laws.

Abstract

I was invited to give a talk on the strategies and concepts behind training large-scale models. I discussed various distributed training strategies like fully-sharded data parallelism, pipeline parallelism, and tensor parallelism, alongside single-GPU optimizations including mixed precision training and gradient checkpointing. The tutorial was framework-agnostic, so no prior knowledge in JAX or PyTorch is needed. By the end, you’ll gain the skills to navigate the complexities of large-scale training.

Date

Mar 11, 2024 17:00 — 19:00

Event

The Deep Thinking Hour, UvA

Location

University of Amsterdam

Scaling Tutorials 2024