Arvutiteaduse instituut
Courses.cs.ut.ee Arvutiteaduse instituut Tartu Ülikool
  1. Kursused
  2. 2025/26 kevad
  3. Paralleelsus süvaõppes (LTAT.06.030)
EN
Logi sisse

Paralleelsus süvaõppes 2025/26 kevad

  • Pealeht
  • Loengud
  • Laborid
  • Kodutöö
  • Viited

Home Work 1

General Instructions & Submission


Release & Deadline
  • Release date: 2 April 2026
  • Deadline: 16 April 2026 (23:59)
Submission Requirements

Each student must submit:

1) Code files

  • All modified scripts used in the homework
  • Must be runnable

2)Report (PDF — single file) Include:

  • Answers to all questions
  • Tables of Results
  • Short explanations

Task 1—Reproducible Benchmark Setup (3 Points)

Modify the DDP script to:

1) Fix randomness:
Add at the beginning of your script:

torch.manual_seed(0)
torch.cuda.manual_seed_all(0)

2)Log per-step time (only rank 0)
👉 Measure time for each training step using time.time():

  • Start timer before forward pass
  • End timer after optimizer step
  • Print time only when rank = 0

3) Run at least 30 steps (ignore first 5 warmup steps) 👉I gnore the first 5 steps (warmup)

Submission:

1) Your modified DDP script
2) A short report (max 1 page) answering:

  • Q1) Why is warmup needed in GPU benchmarking?
  • Q2) Why must we control randomness?
Task 2—Strong Scaling Analysis (3 Points )

1) Keep global batch size fixed (64), and run:

torchrun --nproc_per_node=1 ...
torchrun --nproc_per_node=2 ...
torchrun --nproc_per_node=4 ...

2) Compute

#GPUsTime/stepSpeedupEfficency
1 1.01.0
2   
4   

Submission:

1) Your modified DDP script
2) Complated table
3) Short explanation :

  • Q1) Is scaling linear?
  • Q2) Where does efficiency drop?
  • Q3) Give a quantitative explanation (not just word)?
Task 3—Communication vs Computation (3 Points)

Create two scenarios:

  • Case A — Small model (communication dominates)
DIM = 1024
DEPTH = 2
  • Case B — Large model (computation dominates)
DIM = 8192
DEPTH = 8

Submission:

1) Completed table

Case#GPUsTime/stepSpeedup
Small1 1.0
Small2  
Small4  
Large1 1.0
Large2  
Large4  

2) Short explanation :

  • Q1) In which case is DDP more efficient?
  • Q2) Why does performance differ between small and large models?
  • Q3) When does communication become the bottleneck?
Task 4—DataParallel vs DDP (Deep Comparison) (3 Points)

1) Run both:

  • DP (Praactical 5)
  • DDP (this week)

2) Analyze:

  • GPU utilization (via nvidia-smi)
  • Step time variance
  • Memory usage

Submission:

  • Q1) Why does DP suffer from a bottleneck on GPU 0?
  • Q2) Why does DDP scale better architecturally?
  • Q3) In what scenario could DP still be acceptable?
Task 5—Research Challenge (3 Points )

1) choose one:

Option A — Artificial Communication Delay

  • Add delay before backward:
import time
time.sleep(0.01)
  • Analyze:
    • Q1) How does this affect scaling?
    • Q2) Does speedup degrade linearly or non-linearly?
    • Q3) Relate your observation to Amdahl’s Law

Option B — Batch Size Scaling Law

  • Keep GPUs fixed (e.g., 4), vary batch:

32, 64, 128, 256

  • Analyze:
    • Q1) Does larger batch improve scaling?
    • Q2) When does performance saturate?

Option C — Imbalance Experiment

  • Modify workload:
if dist.get_rank() == 0:
    time.sleep(0.02)
  • Analyze:
    • Q1) What happens to overall training time?
    • Q2) What does this reveal about synchronization?

2) Students must:

  • Use numbers (not opinions)
  • Show tables + short reasoning
  • Explain why, not just what

Submission:

1) Code for your experiemnt
2) Table of results
3) Short expination (max 12-15 lines)


  • Arvutiteaduse instituut
  • Loodus- ja täppisteaduste valdkond
  • Tartu Ülikool
Tehniliste probleemide või küsimuste korral kirjuta:

Kursuse sisu ja korralduslike küsimustega pöörduge kursuse korraldajate poole.
Õppematerjalide varalised autoriõigused kuuluvad Tartu Ülikoolile. Õppematerjalide kasutamine on lubatud autoriõiguse seaduses ettenähtud teose vaba kasutamise eesmärkidel ja tingimustel. Õppematerjalide kasutamisel on kasutaja kohustatud viitama õppematerjalide autorile.
Õppematerjalide kasutamine muudel eesmärkidel on lubatud ainult Tartu Ülikooli eelneval kirjalikul nõusolekul.
Courses’i keskkonna kasutustingimused