Homeworks
Submit both your code and report.
1. Homework 1: Imitation LearningSellele ülesandele ei saa enam lahendusi esitada.
2. Homework 2: Policy Gradient, tasks 1-5
Sellele ülesandele ei saa enam lahendusi esitada.
3. Homework 2: Policy Gradient, tasks 6-8
Sellele ülesandele ei saa enam lahendusi esitada.
4. Homework 3: Q-Learning, Actor-Critic
Sellele ülesandele ei saa enam lahendusi esitada.
5. Homework 4: Model-Based RL
Sellele ülesandele ei saa enam lahendusi esitada.
Homework presentations
Fill in your choice here.
| Homework | Person |
| Homework 1 (20.09.2018) | |
| Behavioral Cloning (easy) | Markus Kängsepp |
| DAgger (medium) | Markus Loide |
| Homework 2 tasks 1-5 (11.10.2018) | |
| State-dependent baseline (mathy) | Anton Potapchuk |
| Implement neural network (easy) | Andre Tättar |
| Implement policy gradient (medium) | Novin Shahroudi |
| CartPole (easy) | Sebastian Värv |
| InvertedPendulum (easy) | Kristjan Veskimäe |
| Homework 2 tasks 6-8 (25.10.2018) | |
| Neural network baseline (medium) | Maksym Semikin |
| LunarLander (medium) | Daniel Majoral |
| HalfCheetah (medium) | Kristjan Veskimäe |
| Bonus: implement parallelization (hard?) | Aqeel Labash? |
| Bonus: implement generalized advantage estimation (easy) | Novin Shahroudi |
| Bonus: implement multi-step policy gradient (easy) | |
| Homework 3 (15.11.2018) | |
| Basic Q-learning (medium) | Hannes Liik |
| Double Q-learning (easy) | Laura Ruusmann |
| Hyperparameter search (easy) | |
| Bonus: Actor-Critic (easy?) | Oriol Corcoll |
| Homework 4 (13.12.2018) | |
| Implement dynamics model (medium) | |
| Implement action selection (medium) | |
| Implement model-based reinforcement learning (easy?) | Hasan Sait Arslan |
| Hyperparameter search (easy) | |
| Bonus: use CEM for action selection | |
| Bonus: use multi-step loss |
These are the people who are going to present their homework solutions. EVERYBODY is expected to submit their homeworks the next day after presentation (excluding bonus exercises).