Institute of Computer Science
Courses.cs.ut.ee Institute of Computer Science University of Tartu
  1. Courses
  2. 2025/26 spring
  3. Parallelism in Deep Learning (LTAT.06.030)
ET
Log in

Parallelism in Deep Learning 2025/26 spring

  • Pealeht
  • Loengud
  • Laborid
  • Kodutöö
  • Viited

Practical 9

Pipeline Parallelism in PyTorch


Objective

By the end of this practical, students will:

  • Understand pipeline execution
  • Implement model partitioning across GPUs
  • Apply micro-batching
  • Observe pipeline efficiency (bubble problem)
  • Understand differences between:
    • Naive Pipeline (Option A)
    • GPipe (Option B)
    • 1F1B (Option C)
  • Analyze how scheduling affects performance

Provided Code Files

Students will work with three versions:

1) Naive Pipeline (Option A):

  • practical9_OptionA.py Download

2) GPipe (Option B):

  • practical9_OptionB.py Download

3) 1F1B (Option C):

  • practical9_OptionC.py Download

Pipeline Scheduling Comparison

MethodBehaviorEfficiency
NaiveSequential forward + backward❌ Very Low
GPipeForward all → backward all⚠️ Medium
1F1WInterleaved forward/backward✅ High

Part 1— Run and Observe (Option A)

Understand naive pipeline behavior

1) Instructions

  • Run Option A code
  • Observe:
    • Step time
    • GPU utilization (qualitatively)

2) Questions

  • Q1) Are GPUs working in parallel?
  • Q2) Where do you expect idle time?

Part 2— Analyze GPipe (Option B)

Understand pipeline fill and drain

1) Instructions

  • Run Option B code
  • Compare with Option A:
    • Execution time
    • Loss behavior

2) Questions

  • Q1) What changed compared to Option A?
  • Q2) Why do we separate forward and backward?
  • Q3) Does this remove pipeline bubbles?

Part 3—Analyze 1F1B (Option C)

Understand overlapping execution

1) Instructions

  • Run Option C code
  • Compare with:
    • Option A
    • Option B

2) Questions

  • Q1) When does backward start?
  • Q2) What is different from GPipe?
  • Q3) Why is this more efficient?

Part 4—Modify and Experiment

Understand impact of micro-batching

1) Change

  • NUM_MICROBATCHES = 2, 4, 8

2) Run for each option:

  • A
  • B
  • C

3) Record Results

MethodµB=2µB=4'µB=4
A   
B   
C   

4) Questions

  • Q1) Does increasing micro-batches always help?
  • Q2) Which method benefits most?

Part 5—Code Understanding

1) Analyze Option C (1F1B)
Focus on this part:

if i > 0:
    prev_out = forward_outputs[i - 1]
    prev_target = targets[i - 1]

    loss = loss_fn(prev_out, prev_target)
    loss.backward()

2) Questions

  • Q1) Why do we delay backward by one step?
  • Q2) What would happen if we remove i > 0?
  • Q3) What happens to pipeline overlap?

Final Discussion

For students, discuss:

  • Why is Option A not true pipeline parallelism?
  • What is the main limitation of GPipe?
  • How does 1F1B reduce pipeline bubbles?
  • Which approach would you use in large-scale systems?

  • Institute of Computer Science
  • Faculty of Science and Technology
  • University of Tartu
In case of technical problems or questions write to:

Contact the course organizers with the organizational and course content questions.
The proprietary copyrights of educational materials belong to the University of Tartu. The use of educational materials is permitted for the purposes and under the conditions provided for in the copyright law for the free use of a work. When using educational materials, the user is obligated to give credit to the author of the educational materials.
The use of educational materials for other purposes is allowed only with the prior written consent of the University of Tartu.
Terms of use for the Courses environment