Institute of Computer Science
Courses.cs.ut.ee Institute of Computer Science University of Tartu
  1. Courses
  2. 2025/26 spring
  3. Parallelism in Deep Learning (LTAT.06.030)
ET
Log in

Parallelism in Deep Learning 2025/26 spring

  • Pealeht
  • Loengud
  • Laborid
  • Kodutöö
  • Viited

Home Work 2

General Instructions & Submission


Release & Deadline
  • Release date: 9 April 2026
  • Deadline: 23 April 2026 (23:59)
Submission Requirements

Each student must submit:

1) Code files

  • All modified scripts used in the homework
  • Must be runnable

2) Report (PDF — single file) Include:

  • Answers to all questions
  • Tables of Results
  • Short explanations

Task 1—Combine the Codes (6 Points)

Combine Code 1 and Code 2 to create an optimized training script.

1) Requirements
Students must:

  • Start from Code 1
  • Integrate AMP (Automatic Mixed Precision) from Code 2
  • Produce a new file named: practical5_single.py

2) Implementation Instructions
Your implementation must include:

  • autocast()
  • GradScaler()
  • scaler.scale(loss).backward()
  • scaler.step(optimizer)
  • scaler.update()

Hints:

  • Keep the gradient accumulation logic from Code 1 unchanged, and integrate AMP inside it.
  • Do not modify the training logic structure — only enhance it with AMP.

Task 2—Run and Compare All Versions (6 Points)

1) Requirements
Students must run the following three versions:

  • Code 1: DDP + Gradient Accumulation
  • Code 2: DDP + AMP
  • Code 3: Combined (Accumulation + AMP)

2) Comparison Table
Fill in the table based on your observations:

CodeTime(s)Memory Usage (GB)Stability
Code 1   
Code 2   
Code 3   

3) Comparison Table: Modify Code 1 and Code 3
Students must run with different values of: ACCUM_STEPS = 4, 8, 16, 32

CodeACCUM_STEPSTime(s)Memory Usage (GB)Observations
Code 14   
Code 18   
Code 116   
Code 132   
Code 34   
Code 38   
Code 316   
Code 332   

Task 3— Run and Compare All Versions (3 Points)

A short report (1 - 2 page) answering:

  • Q1) Compare execution time and explain differences using gradient synchronization and numerical precision?
  • Q2) Compare GPU memory usage and explain how Gradient Accumulation and Mixed Precision impact memory differently?
  • Q3) Explain how gradient accumulation changes the effective batch size. Why must the loss be divided by ACCUM_STEPS?

  • Institute of Computer Science
  • Faculty of Science and Technology
  • University of Tartu
In case of technical problems or questions write to:

Contact the course organizers with the organizational and course content questions.
The proprietary copyrights of educational materials belong to the University of Tartu. The use of educational materials is permitted for the purposes and under the conditions provided for in the copyright law for the free use of a work. When using educational materials, the user is obligated to give credit to the author of the educational materials.
The use of educational materials for other purposes is allowed only with the prior written consent of the University of Tartu.
Terms of use for the Courses environment