Software Project: Implementing Mechanistic Interpretability in NLP

Software Project, Saarland University, Winter Semester 2025/26, 2025

Course in LSF

Mondays 12:15 to 13:45

Kick-off: 20.10.2025

End: 31.03.2026

ECTS: 8 for M.Sc. students, 12 for B.Sc. students

B.Sc. students will only be admitted to the course with sufficient English skills.

Course Content

This course will consist of two parts:

In the first part, we will focus on understanding and using the most important methods of mechanistic interpretability (MI) research. Every week will be dedicated to a method, with a mini-lecture, a hands-on part, and a small homework.

In the second part, students will independently work on a small MI project, with weekly optional office hours, an intermediate presentation, and a final presentation. At the end of the semester, students submit a short report and a GitHub repo.

Preliminary Schedule

DateContent
20.10.2025Kick-off and Introductory Lecture on Mechanistic Interpretability
27.10.2025Topic 1 - Feature Visualization
03.11.2025No Meeting
10.11.2025No Meeting
17.11.2025Topic 2 - Logic Lens
24.11.2025Topic 3 - Activation patching (MLP layers)
01.12.2025Topic 4 - SAEs and Probing (attention layers)
08.12.2025Topic 5 - Induction Heads and Induction Circuits
15.12.2025Topic 6 - Path Patching
22.12.2025No Meeting
29.12.2025No Meeting
05.01.2026Final Project Development
12.01.2026Final Project Development
19.01.2026Office Hour
26.01.2026Office Hour
02.02.2026Office Hour
09.02.2026Intermediate Presentations
16.02.2026Office Hour
23.02.2026Office Hour
02.03.2026Office Hour
09.03.2026Office Hour
16.03.2026Final Presentations
31.03.2026Deadline: Report and Code

A well-meaning tip: Please expect this course to be about as much work as a B.Sc. Thesis (compare to ECTS). Speaking from experience of the last few iterations of this course, doing a software project half-heartedly is not a fun experience. If this sounds like too much work because of other responsibilities, it probably is. However, also speaking from experience, motivated students can learn a LOT in this course, and some even came out with publications. I personally do not care about the level of experience you come in with - as long as you are willing to put in the work and are eager to learn, you are very welcome to participate in the course.