Parallelism in Deep Learning - Courses - Institute of Computer Science

Home Work 4

General Instructions & Submission

Release & Deadline

Release date: 23 April 2026
Deadline: 07 May 2026 (23:59)

Submission Requirements

Each student must submit:

1) Code files

All modified scripts used in the homework
Must be runnable

2)Report (PDF — single file) Include:

Answers to all questions
Tables of Results
Short explanations

Task 1—Conceptual Questions (3 Points)

Q1) Explain the Difference
Explain in your own words:

Naive Pipeline (Option A)
GPipe (Option B)
1F1B (Option C)

👉 Focus on:

Execution order
GPU utilization
Efficiency

Q2) Pipeline Bubble

1) What is a pipeline bubble?
2) When does it occur?
3) Why does it reduce efficiency?

Q3) Scheduling Insight
Why is 1F1B more efficient than GPipe?
Your answer must refer to:

Forward/backward overlap
Idle GPU time

Task 2—Code Completion (6 Points)

1) Create a new file

pipeline_homework.py

2)This is based directly on your Option C, but simplified.

# ----------------------------
# Homework: Complete 1F1B Logic
# ----------------------------

def run_1F1B_homework(model, micro_batches, loss_fn):
    forward_outputs = []
    targets = []

    # ----------------------------
    # TODO 1: Forward Pass
    # ----------------------------
    for i in range(len(micro_batches)):
        micro_x, micro_y = micro_batches[i]

        # TODO:
        # 1. Move target to correct device
        # 2. Run forward pass
        # 3. Store outputs and targets

        # ----------------------------
        # TODO 2: Early Backward
        # ----------------------------
        if i > 0:
            # TODO:
            # Perform backward on previous micro-batch
            pass

    # ----------------------------
    # TODO 3: Final Backward
    # ----------------------------
    # Compute backward for last micro-batch

    pass

3) Requirements

Follow logic of Option C (1F1B)
Do NOT rewrite the model
Only complete missing parts

Task 3—Execution Understanding (3 Points)

1) Analyze This Code (Option C)

if i > 0:
    prev_out = forward_outputs[i - 1]
    prev_target = targets[i - 1]

    loss = loss_fn(prev_out, prev_target)
    loss.backward()

2) Questions

Q1) Why do we use i - 1 instead of i?
Q2) What happens if we remove this condition?
Q3) What happens to pipeline efficiency?

Task 4—1F1B Execution Insight (3 Point)

1) In Option C (1F1B) add:

print(f"F{i}")
//after forward, and
print(f"B{i-1}")
//inside if i > 0.

2) Run with:

 NUM_MICROBATCHES = 4
STEPS = 1

2) Answer

Q1) Write the execution sequence (e.g., F0, F1, B0, …).
Q2) At which point does backward start overlapping with forward?
Q3) Why does this make 1F1B more efficient than GPipe?

Parallelism in Deep Learning 2025/26 spring