Arvutiteaduse instituut
Courses.cs.ut.ee Arvutiteaduse instituut Tartu Ülikool
  1. Kursused
  2. 2025/26 kevad
  3. Paralleelsus süvaõppes (LTAT.06.030)
EN
Logi sisse

Paralleelsus süvaõppes 2025/26 kevad

  • Pealeht
  • Loengud
  • Laborid
  • Kodutöö
  • Projekt
  • Viited

Final Project

Building a High-Performance Hybrid Parallel Engine

Project Description and Required Tasks

This project is intentionally open-ended. Different implementation approaches are acceptable as long as the technical objectives are addressed and properly analyzed.

In Practical 10, we implemented a "naive" hybrid parallel system. While it worked, it was sequential (Stage 2 waited for Stage 1) and hard-coded (it only worked on a specific 4-GPU setup).

Your Mission: Transform that educational script into a professional Hybrid Parallel Engine. You will move from a conceptual model to a system that supports dynamic scaling, asynchronous execution, and memory optimization.


Technical Objectives

You are required to upgrade the base code to meet these Four Milestones:

  • Milestone 1: Dynamic Hybrid Parallel Topology (Scaling) (10 Points)
    • The Problem: The current code is hard-coded for a 4-GPU environment.
    • The Task: Use argparse to allow the user to define the 3D grid dimensions (DP, PP, MP) via the command line.
    • Requirement: The script must automatically assign the correct GPU ranks to the correct process groups.
    • Command Example: torchrun --nproc_per_node=8 train.py --dp 2 --pp 2 --mp 2
  • Milestone 2: Pipeline Overlap (Performance) (12 Points)
    • The Problem: The current micro-batch loop is "Stop-and-Wait."
    • The Task: Implement Asynchronous Pipelining.
    • Requirement: Students should attempt asynchronous pipelining using CUDA streams, RPC, or other overlap strategies. The goal is to reduce idle GPU time by overlapping the execution of different micro-batches whenever possible.
  • Milestone 3: Advanced Optimization (AMP) (8 Points)
    • The Problem: Standard 32-bit training is slow and consumes significant memory.
    • The Task: Integrate Automatic Mixed Precision (AMP) using torch.cuda.amp
    • Requirement: The engine must support float16 training and handle the necessary gradient scaling to maintain model stability.
  • Milestone 4: Performance Profiling (10 Points)
    • The Task: Conduct a comparative study.
    • Requirement: Compare your optimized engine against the "Base Code" from Practical 10. You must report:
      • Throughput: Samples processed per second.
      • Memory Savings: Maximum batch size achievable with AMP vs. without.

Deliverables

Students must submit a .zip file containing:
1) hybrid_engine.py : Your complete, commented source code.
2) A text file containing the torchrun commands used for the experiments.
3) Technical Report (PDF):

  • A diagram of your 3D Grid mapping.
  • An explanation of how you handled the Pipeline Overlap.
  • Performance tables and a brief conclusion on the bottlenecks you observed.

4) Submission Deadline : The final project must be submitted by 31 May 2026 (23:59). Late submissions may not be accepted unless approved in advance.


Getting Started
  • Base Code: Use the script provided in Practical 10 as your starting point.
  • Environment: Use the same HPC allocation parameters: srun --partition=gpu --gres=gpu:4.
  • Tip: Start by making the configuration dynamic (Milestone 1) before attempting the asynchronous streams (Milestone 2).

Note to Students: This project mimics the real-world challenges faced by engineers at companies like OpenAI and Meta. Focus on the communication between GPUs; that is where the real magic happens.


  • Arvutiteaduse instituut
  • Loodus- ja täppisteaduste valdkond
  • Tartu Ülikool
Tehniliste probleemide või küsimuste korral kirjuta:

Kursuse sisu ja korralduslike küsimustega pöörduge kursuse korraldajate poole.
Õppematerjalide varalised autoriõigused kuuluvad Tartu Ülikoolile. Õppematerjalide kasutamine on lubatud autoriõiguse seaduses ettenähtud teose vaba kasutamise eesmärkidel ja tingimustel. Õppematerjalide kasutamisel on kasutaja kohustatud viitama õppematerjalide autorile.
Õppematerjalide kasutamine muudel eesmärkidel on lubatud ainult Tartu Ülikooli eelneval kirjalikul nõusolekul.
Courses’i keskkonna kasutustingimused