Software Project: Implementing Mechanistic Interpretability in NLP

Software Project, Saarland University, Winter Semester 2025/26, 2025

Course in LSF

Mondays 12:15 to 13:45

Kick-off: 20.10.2025

End: 31.03.2026

ECTS: 8 for MSc students, 12 for BSc students

BSc students will only be admitted to the course with sufficient English skills.

Course Content

This course will consists of two parts:

In the first part, we will focus on understanding and using the most important methods of mechanisitic interpretability (MI) research. Every week will be dedicated to a method, with a mini-lecture, a hands on part, and a small homework.

In the second part, students will independently work on a small MI project, with weekly optional office hours, an intermediate presentation, and a final presentation. At the end of the semester, students submit a short report and a github repo.

Preliminary Schedule

DateContent
20.10.2025Kick-off and Introductory Lecture on Mechanistic Interpretability
27.10.2025Topic 1 - Feature Visualization
03.11.2025No Meeting
10.11.2025No Meeting
17.11.2025Topic 2 - Logic Lens
24.11.2025Topic 3 - Activation patching (MLP layers)
01.12.2025Topic 4 - Circuit Discovery (attention layers)
08.12.2025Topic 5 - Probing
15.12.2025Topic 6 - Sparse Autoencoders
22.12.2025No Meeting
29.12.2025No Meeting
05.01.2026Final Project Development
12.01.2026Final Project Development
19.01.2026Office Hour
26.01.2026Office Hour
02.02.2026Office Hour
09.02.2026Office Hour
16.02.2026Intermediate Presentations
23.02.2026Office Hour
02.03.2026Office Hour
09.03.2026Office Hour
16.03.2026Final Presentations
31.03.2026Deadline: Report and Code

A well meaning tip: Please expect this course to be about as much work as a BSc Thesis (compare to ECTS). Speaking from experience of the last few iterations of this course, doiing a software project half-heartedly is not a fun experience. If this sounds like too much work because of other responsibilites, it probably is. However, also speaking from experience, motivated students can learn a LOT in this course, and some even came out with publications. I personally do not care about the level of experience you come in with - as long as you are willing to put in the work and are eager to learn, you are very welcome to participate in the course.