Special Course in Machine Learning: AI Safety
The purpose of this course is to understand the safety risks that stem from misaligned AI. AI alignment refers to the extent an AI system adcvances the goals intended by its developers. A misaligned AI pursues and achieves objectives that differ somewhat (or a lot) from the intended goals.
In this course, we are going to follow a public online course by Richard Ngo (OpenAI), David Krueger (University of Cambridge), Adam Gleave (FAR) and Beth Barnes (Alignment Research Center Evals): https://course.aisafetyfundamentals.com/alignment
Next up: Week 14
- Work on your project according to the plan.
- Record your results and progress in your research journal latest by Tuesday night (12.12.2023; approximately 1 page, min 0.5, max 2 pages).
- Seminar on Wednesday (13.12.2023) at 16:15 in Zoom.
- Prepare a short presentation (7 minutes) to present your project results.
Syllabus
Seminar 1: Introduction to machine learning: Discussion points Seminar 2: Artificial General Intelligence: Discussion points Seminar 3: Reward misspecification and instrumental convergence: Discussion points Seminar 4: Goal misgeneralization Discussion points Seminar 5: Task decomposition for scalable oversight Discussion points Seminar 6: Adversarial techniques for scalable oversight Discussion points Seminar 7: Interpretability Discussion points Seminar 8: Governance Discussion points Seminar 9: Agent foundations Discussion points Seminar 10: Careers and Projects Discussion points Seminar 11: Projects/presentations Seminar 12: Projects/presentations Seminar 13: Projects/presentations Seminar 14: Projects/presentations
Prerequisites
- Previous knowledge about machine learning and deep learning will definitely be helpful, although the course might be doable and useful also without it.
- If you don't have previous background in machine learning then you should take the first week readings (Introduction to machine learning) especially seriously.
Organization
The first part of the course (10 seminars) consists of a set of weekly learning materials (mostly reading), expected to take 2-4 hours to engage. Students are expected to read the materials for each week before the seminar. In the seminars, we will discuss the readings and the questions, thoughts and ideas raised.
The last four weeks are planned for a small project. The course page offers plenty of ideas for different levels of expertise to choose from. The seminars are then used to discuss the progress on the projects and the problems raised.
The seminars are held over Zoom on Wednesdays at 16:15-17:45. The seminars are not recorded. Students are expected to participate with video.
The first seminar will take place on Wednesday, September 13th.
Join Zoom Meeting https://ut-ee.zoom.us/j/95523819971?pwd=WmF2QXFYYXBBMGdEeUV1dWpGK1dVQT09
Meeting ID: 955 2381 9971 Passcode: 853134
Passing the course
The goal of this course is to learn together about the AI alignment and the related safety problems. Thus, the overall success of the course relies on coming to seminars prepared and shaping the seminars with active participations. Thus, the course will be graded by preparation level, attendance and active participation.
Each week students are expected to:
- Read/watch the required materials listed on the course page for that week before the seminar.
- Come to seminar with questions, thoughts and discussion ideas based on the read/watched materials.
- Actively participate in the seminar discussions.
During project weeks, students are expected to:
- Work on their project and to progress at least somewhat every week.
- Shortly present the progress in the seminar.
- Raise questions and give feedback to others.
Point system
In order to pass the course, you need to collect at least 70 points according to the following system:
Each seminar is worth 7 points, totalling into 7 x 14 = 98 points, which will be finally calibrated into 100 points.
- 3 points per seminar can be obtained from preparation, demonstrated as contributing at least one meaningful question or discussion point based on the seminar materials.
- 2 points per seminar can be obtained by attending the seminar (and taking part in the small group discussions).
- 2 points per seminar can be obtained by contributing the general discussion in the large group.
Materials
We will follow the AI Aligment Course from AI Safety Fundamentals.
Contact
Kairit Sirts, sirts@ut.ee