Arvutiteaduse instituut
Courses.cs.ut.ee Arvutiteaduse instituut Tartu Ülikool
  1. Kursused
  2. 2025/26 kevad
  3. Paralleelsus süvaõppes (LTAT.06.030)
EN
Logi sisse

Paralleelsus süvaõppes 2025/26 kevad

  • Pealeht
  • Loengud
  • Laborid
  • Kodutöö
  • Viited

Home Work 3

General Instructions & Submission


Release & Deadline
  • Release date: 16 April 2026
  • Deadline: 30 April 2026 (23:59)
Submission Requirements

Each student must submit:

1) Code files

  • All modified scripts used in the homework
  • Must be runnable

2)Report (PDF — single file) Include:

  • Answers to all questions
  • Tables of Results
  • Short explanations

Base Code

Use the LargeLinearModelMP class code provided in our Lecture 8 practical session.

Use the following script:

  • Download

Task 1—Current Behavior (3 Points)

Focus on the splitting logic: if i < depth // 2:

1) Do: Run the provided code.
2) Answer:

  • Q1) How many layers are assigned to GPU 0 and GPU 1?
  • Q2) In the forward() method, at what exact point does the data move from GPU 0 to GPU 1?

Task 2—Increase Model Size (3 Points)

Focus on the configuration constants: DIM and DEPTH.

  • Do: Increase DEPTH (e.g., from 6 to 12, then 24).
  • Submit: A table comparing DEPTH vs. Step Time.
  • Answer: Does increasing the number of layers improve speed? Explain why or why not based on your observations.

Task 3—Use All GPUs (3 Points)

Refactor the model to be dynamic.

  • Do: Modify LargeLinearModelMP so it automatically detects all available GPUs (torch.cuda.device_count()) and distributes the layers evenly across all of them.
  • Submit: Your modified __init__ method code.
  • Answer: Explain the logic you used to calculate which GPU receives which layer index.

Task 4—Run with More GPUs (3 Points)

Use your modified code from Task 3.

  • Do: Set DEPTH = 20. Run the code using:
    • 2 GPUs
    • 5 GPUs
  • Submit: A table showing Step Time and GPU Memory Usage for each configuration.
  • Answer: Are all GPUs being utilized? Does adding more GPUs make the training faster in this specific setup?

Task 5—Data Transfer (3 Points)

Focus on the "Baton Pass" in forward() : x = x.to(layer_device).

  • Do: Count how many times x.to(layer_device) is triggered in one single forward pass.
  • Submit:
    • Number of transfers with 2 GPUs:
    • Number of transfers with 5 GPUs:
  • Answer: What is the relationship between the number of GPUs and the total communication overhead?

Hints:
  • Visualization: To understand the distribution,
    • you can print the device of each layer after initialization (e.g., print(layer.parameters().device)).
  • Efficiency: Remember that moving data between devices takes time.
    • Use torch.cuda.synchronize() to ensure your Step Time measurements are accurate.

  • Arvutiteaduse instituut
  • Loodus- ja täppisteaduste valdkond
  • Tartu Ülikool
Tehniliste probleemide või küsimuste korral kirjuta:

Kursuse sisu ja korralduslike küsimustega pöörduge kursuse korraldajate poole.
Õppematerjalide varalised autoriõigused kuuluvad Tartu Ülikoolile. Õppematerjalide kasutamine on lubatud autoriõiguse seaduses ettenähtud teose vaba kasutamise eesmärkidel ja tingimustel. Õppematerjalide kasutamisel on kasutaja kohustatud viitama õppematerjalide autorile.
Õppematerjalide kasutamine muudel eesmärkidel on lubatud ainult Tartu Ülikooli eelneval kirjalikul nõusolekul.
Courses’i keskkonna kasutustingimused