Parallelism in Deep Learning - Courses - Institute of Computer Science

Practical 10

Hybrid Parallelism (DP + MP + Pipeline Concept)

Objective

In this practical session, you will:

Run a hybrid parallel training system
Identify:
- Data Parallelism (DP)
- Model Parallelism (MP)
- Pipeline concept (micro-batching)
Modify the system and observe behavior changes

Background

Students should:

Understand:
- DP, MP, Pipeline (from lecture)
Have access to:
- Multi-GPU machine (≥ 4 GPUs)
- PyTorch with distributed support

Setup Instructions

Step 1 — Allocate GPUs (HPC): srun --partition=gpu --gres=gpu:4 --pty bash
Step 2 — Run the code : torchrun --nproc_per_node=2 python/sample.py

Part 1—Run and Observe

1) Use the following script:

Download

2) Questions:

Q1) How many processes are running?
Q2) Which GPUs does each rank use?
Q3) How many micro-batches are processed?
Q4) When does synchronization happen?

Part 2—Modify Micro-batches

1) Change:

MICRO_BATCHES = 4

2) Try:

MICRO_BATCHES = 2
MICRO_BATCHES = 8

3) Questions:

Q1) What changes in the output?
Q2) How many forward/backward steps now?

Part 3—Change Batch Size

1) Modify: BATCH_SIZE = 64

2) Try:

BATCH_SIZE = 32
BATCH_SIZE = 128

2) Questions:

Q1) Does execution pattern change?
Q2) What stays the same?

Part 4—Change Number of Processes (DP)

1) Run: torchrun --nproc_per_node=1 python/sample.py

2) Questions:

Q1) What happens to [Rank 1]?
Q2) Is synchronization still happening?

Part 5—Break Model Parallelism (Important)

1) Modify code:

Replace : device1 = torch.device(f"cuda:{local_rank + 2}")
With : device1 = torch.device(f"cuda:{local_rank}")

2) Questions:

Q1) What changes in output?
Q2) Are multiple GPUs still used?

Parallelism in Deep Learning 2025/26 spring

Practical 10

Hybrid Parallelism (DP + MP + Pipeline Concept)

Objective

Background

Setup Instructions

Part 1—Run and Observe

Part 2—Modify Micro-batches

Part 3—Change Batch Size

Part 4—Change Number of Processes (DP)

Part 5—Break Model Parallelism (Important)

Note: This is not just code and practice—you are observing how modern distributed training actually works