Transformerid - Kursused - Arvutiteaduse instituut

Homework 1 (5 points)

Deadline: Wednesday, September 24, 23:59 (no late submissions)

Watch Andrej Karpathy’s “Let's build GPT: from scratch, in code, spelled out” video and review the accompanying code.

Your task is to

write down any questions or parts that you found confusing,
reflect briefly on what you learned,
answer the following three questions:
1. Why is the attention scaled by head size?
2. In the video, we learned that attention is a communication mechanism where the elements in the sequence can be seen as nodes in a directed graph. What has to change in the self-attention of a decoder-only transformer implemented in the video in order for every node to have a connection to every other node and itself (complete digraph with self-loops)?
3. Why is the `torch.tril` function useful, according to Karpathy?

This assignment is intended to help you become familiar with the inner workings of GPT-style models and Transformers in general. Your submission should be no more than one page.

Grading: Full credit (5 points) will be awarded for complete submissions that address all parts of the task.

Note: Use of AI tools such as ChatGPT is allowed and even encouraged, but must be disclosed (e.g., via a short description or by attaching the relevant conversation history).

Video link: https://www.youtube.com/watch?v=kCc8FmEb1nY&t=107s

Transformerid 2025/26 sügis

Homework 1 (5 points)