Practical sessions:
Part 1: Foundations of Parallelism & Deep Learning (Weeks 1–4)
- Practical 1: Implement a toy neural network in PyTorch and visualize forward/backward passes.
- Practical 2: Compare matrix multiplication speeds using NumPy on the CPU versus PyTorch on the GPU.
- Practical 3: Profile a CNN training script and identify the main bottlenecks.
- Practical 4: Lecture is theoretical; practical implementation is deferred to later sessions.
Part 2: Core Parallel Strategies in Practice (Weeks 5–11)
- Practical 5: Convert a single-GPU training script to use torch.nn.DataParallel and observe its sequential bottleneck.
- Practical 6: Convert the single-GPU script to use DDP (with torchrun) and compare its performance against the DP implementation.
- Practical 7: Train a model with AMP and gradient accumulation to observe the benefits and practice DDP launch configurations.
- Practical 8: Apply basic model parallelism by distributing layers and tensors of a feedforward network across multiple devices.
- Practical 9: Explain pipeline parallelism with toy examples and outline implementation steps using a theoretical module.
- Practical 10: Design a hybrid DDP+PP strategy for a toy transformer in PyTorch, analyzing pros, cons, and communication costs.
- Practical 11: Recap and Project Q&A.
Part 3: Project Work and Assessment (Weeks 12–16)
- Practical 12: Project Work Session 1
- Practical 13: Project Work Session 2
- Practical 14: Project Work Session 3
- Practical 15: Final Project Presentations
- Practical 16: Final Exam / Assessment