Software Project: Implementing Mechanistic Interpretability in NLP
Software Project, Saarland University, Winter Semester 2025/26, 2025
Mondays 12:15 to 13:45
Kick-off: 20.10.2025
End: 31.03.2026
ECTS: 8 for MSc students, 12 for BSc students
BSc students will only be admitted to the course with sufficient English skills.
Course Content
This course will consists of two parts:
In the first part, we will focus on understanding and using the most important methods of mechanisitic interpretability (MI) research. Every week will be dedicated to a method, with a mini-lecture, a hands on part, and a small homework.
In the second part, students will independently work on a small MI project, with weekly optional office hours, an intermediate presentation, and a final presentation. At the end of the semester, students submit a short report and a github repo.
Preliminary Schedule
| Date | Content |
|---|---|
| 20.10.2025 | Kick-off and Introductory Lecture on Mechanistic Interpretability |
| 27.10.2025 | Topic 1 - Feature Visualization |
| 03.11.2025 | No Meeting |
| 10.11.2025 | No Meeting |
| 17.11.2025 | Topic 2 - Logic Lens |
| 24.11.2025 | Topic 3 - Activation patching (MLP layers) |
| 01.12.2025 | Topic 4 - Circuit Discovery (attention layers) |
| 08.12.2025 | Topic 5 - Probing |
| 15.12.2025 | Topic 6 - Sparse Autoencoders |
| 22.12.2025 | No Meeting |
| 29.12.2025 | No Meeting |
| 05.01.2026 | Final Project Development |
| 12.01.2026 | Final Project Development |
| 19.01.2026 | Office Hour |
| 26.01.2026 | Office Hour |
| 02.02.2026 | Office Hour |
| 09.02.2026 | Office Hour |
| 16.02.2026 | Intermediate Presentations |
| 23.02.2026 | Office Hour |
| 02.03.2026 | Office Hour |
| 09.03.2026 | Office Hour |
| 16.03.2026 | Final Presentations |
| 31.03.2026 | Deadline: Report and Code |
A well meaning tip: Please expect this course to be about as much work as a BSc Thesis (compare to ECTS). Speaking from experience of the last few iterations of this course, doiing a software project half-heartedly is not a fun experience. If this sounds like too much work because of other responsibilites, it probably is. However, also speaking from experience, motivated students can learn a LOT in this course, and some even came out with publications. I personally do not care about the level of experience you come in with - as long as you are willing to put in the work and are eager to learn, you are very welcome to participate in the course.