AI-Safety - Kursused - Arvutiteaduse instituut

Special Course in Machine Learning: AI Safety

The purpose of this course is to understand the safety risks that stem from misaligned AI. AI alignment refers to the extent an AI system adcvances the goals intended by its developers. A misaligned AI pursues and achieves objectives that differ somewhat (or a lot) from the intended goals.

In this course, we are going to follow a public online course by Richard Ngo (OpenAI), David Krueger (University of Cambridge), Adam Gleave (FAR) and Beth Barnes (Alignment Research Center Evals): https://course.aisafetyfundamentals.com/alignment

Next up: Week 14

Work on your project according to the plan.
- Record your results and progress in your research journal latest by Tuesday night (12.12.2023; approximately 1 page, min 0.5, max 2 pages).
Seminar on Wednesday (13.12.2023) at 16:15 in Zoom.
- Prepare a short presentation (7 minutes) to present your project results.

Syllabus

     Seminar 1: Introduction to machine learning: Discussion points
     Seminar 2: Artificial General Intelligence: Discussion points
     Seminar 3: Reward misspecification and instrumental convergence: Discussion points
     Seminar 4: Goal misgeneralization Discussion points
     Seminar 5: Task decomposition for scalable oversight Discussion points
     Seminar 6: Adversarial techniques for scalable oversight Discussion points
     Seminar 7: Interpretability Discussion points
     Seminar 8: Governance Discussion points
     Seminar 9: Agent foundations Discussion points
     Seminar 10: Careers and Projects Discussion points
     Seminar 11: Projects/presentations
     Seminar 12: Projects/presentations
     Seminar 13: Projects/presentations
     Seminar 14: Projects/presentations

Prerequisites

Previous knowledge about machine learning and deep learning will definitely be helpful, although the course might be doable and useful also without it.
If you don't have previous background in machine learning then you should take the first week readings (Introduction to machine learning) especially seriously.

Organization

The first part of the course (10 seminars) consists of a set of weekly learning materials (mostly reading), expected to take 2-4 hours to engage. Students are expected to read the materials for each week before the seminar. In the seminars, we will discuss the readings and the questions, thoughts and ideas raised.

The last four weeks are planned for a small project. The course page offers plenty of ideas for different levels of expertise to choose from. The seminars are then used to discuss the progress on the projects and the problems raised.

The seminars are held over Zoom on Wednesdays at 16:15-17:45. The seminars are not recorded. Students are expected to participate with video.

The first seminar will take place on Wednesday, September 13th.

Join Zoom Meeting https://ut-ee.zoom.us/j/95523819971?pwd=WmF2QXFYYXBBMGdEeUV1dWpGK1dVQT09

Meeting ID: 955 2381 9971 Passcode: 853134

Passing the course

The goal of this course is to learn together about the AI alignment and the related safety problems. Thus, the overall success of the course relies on coming to seminars prepared and shaping the seminars with active participations. Thus, the course will be graded by preparation level, attendance and active participation.

Each week students are expected to:

Read/watch the required materials listed on the course page for that week before the seminar.
Come to seminar with questions, thoughts and discussion ideas based on the read/watched materials.
Actively participate in the seminar discussions.

During project weeks, students are expected to:

Work on their project and to progress at least somewhat every week.
Shortly present the progress in the seminar.
Raise questions and give feedback to others.

Point system

In order to pass the course, you need to collect at least 70 points according to the following system:

Each seminar is worth 7 points, totalling into 7 x 14 = 98 points, which will be finally calibrated into 100 points.

3 points per seminar can be obtained from preparation, demonstrated as contributing at least one meaningful question or discussion point based on the seminar materials.
2 points per seminar can be obtained by attending the seminar (and taking part in the small group discussions).
2 points per seminar can be obtained by contributing the general discussion in the large group.

Materials

We will follow the AI Aligment Course from AI Safety Fundamentals.

Contact

Kairit Sirts, sirts@ut.ee

AI-Safety 2023/24 sügis