Note: This is the syllabus for the 2025w2 offering; some parts might be updated closer to the 2026w1 offering.

CPSC_V 532H: Topics in Artificial Intelligence: Analysis, Control and Interpretation of Models in NLP


Overview

Models for natural language processing have been advancing at an astonishing pace in recent years, but they remain opaque and difficult to reason about. In this course, we will cover different methods of model interpretation, attempting to open these black boxes, and survey some algorithms for controlling specific properties and behaviors of large language models (LLMs). We will also cover common practices in model analysis, including in multilingual settings, and with respect to model biases.


Assessments

ComponentWeight
Written paper comments10%
Participation5%
Student presentation — main20%
Student presentation — review5%
Student presentation — future research5%
Project pitch5%
Project proposal10%
Project presentation20%
Project report20%
Total100%

Attendance

Students are expected to attend classes and actively participate in class discussions.


Paper Comments

For 10 out of the papers we will be reading, students are expected to send 1–2 paragraphs summarizing the paper and raising 2–3 questions. These are due the day before class at 6pm, by email to the TA. Paper comments cannot be about papers the student is presenting (presentation/review/future research).


Student Presentations

Each paper will be presented by a student with a main presentation (40 min), with two other students following with a review (10 min) and proposed future research (10 min). This will be followed by a 20-minute structured group discussion in class.

Each student is expected to:


Projects

Projects will be done in groups of 2–3 students (depending on the number of students), and will consist of the following parts: Pitch, Proposal, Final Presentation, and Final Report.

More details and instructions will be provided on February 2nd.


Course Schedule (Tentative)

DateSession
Jan 5Lecture 1: Course details and intro
Jan 7Lecture 2: Control and analysis through the lens of societal impact
Jan 12Lecture 3
Jan 14Paper presentation: What do you learn from context? Probing for sentence structure in contextualized word representations
Jan 19Paper presentation: Attention is not explanation
Jan 21Paper presentation: Attention is not not explanation
Jan 26Lecture 4
Jan 28Paper presentation: Mass-Editing Memory in a Transformer
Feb 2Paper presentation: Steering Llama 2 via Contrastive Activation Addition
Feb 4Project introduction + Paper presentation: BERT meets Shapley: Extending SHAP Explanations to Transformer-based Classifiers
Feb 9Project pitching
Feb 11Project pitching
Feb 16Reading week — no class
Feb 18Reading week — no class
Feb 23Paper presentation: Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
Feb 25In-class project preparation
Mar 2Guest lecture (Thomasz)
Mar 4Paper presentation: Why Language Models Hallucinate
Mar 9Guest lecture (Stella)
Mar 11In-class project preparation
Mar 16No class
Mar 18Guest lecture (Sahil)
Mar 23Paper presentation: Precise In-Parameter Concept Erasure in Large Language Models
Mar 25Paper presentation: Do Llamas Work in English? On the Latent Language of Multilingual Transformers
Mar 30How to read a paper
Apr 1Project presentations
Apr 6Easter — no class
Apr 8Project presentations

Reading List

Will be finalized after the add/drop deadline.

Interpretation

Control

Analysis