Note: This is the syllabus for the 2025w2 offering; some parts might be updated closer to the 2026w1 offering.
CPSC_V 532H: Topics in Artificial Intelligence: Analysis, Control and Interpretation of Models in NLP
- When: Monday, Wednesday, 10:30am – 11:50am
- Where: SWNG SWNG-308
- Instructor: Hila Gonen
- Office hours: By appointment
- Canvas: https://canvas.ubc.ca/courses/178785
Overview
Models for natural language processing have been advancing at an astonishing pace in recent years, but they remain opaque and difficult to reason about. In this course, we will cover different methods of model interpretation, attempting to open these black boxes, and survey some algorithms for controlling specific properties and behaviors of large language models (LLMs). We will also cover common practices in model analysis, including in multilingual settings, and with respect to model biases.
Assessments
| Component | Weight |
|---|---|
| Written paper comments | 10% |
| Participation | 5% |
| Student presentation — main | 20% |
| Student presentation — review | 5% |
| Student presentation — future research | 5% |
| Project pitch | 5% |
| Project proposal | 10% |
| Project presentation | 20% |
| Project report | 20% |
| Total | 100% |
Attendance
Students are expected to attend classes and actively participate in class discussions.
Paper Comments
For 10 out of the papers we will be reading, students are expected to send 1–2 paragraphs summarizing the paper and raising 2–3 questions. These are due the day before class at 6pm, by email to the TA. Paper comments cannot be about papers the student is presenting (presentation/review/future research).
Student Presentations
Each paper will be presented by a student with a main presentation (40 min), with two other students following with a review (10 min) and proposed future research (10 min). This will be followed by a 20-minute structured group discussion in class.
Each student is expected to:
- Fully present one paper
- Present a review of one paper
- Present a future research idea for one paper
Projects
Projects will be done in groups of 2–3 students (depending on the number of students), and will consist of the following parts: Pitch, Proposal, Final Presentation, and Final Report.
More details and instructions will be provided on February 2nd.
Course Schedule (Tentative)
| Date | Session |
|---|---|
| Jan 5 | Lecture 1: Course details and intro |
| Jan 7 | Lecture 2: Control and analysis through the lens of societal impact |
| Jan 12 | Lecture 3 |
| Jan 14 | Paper presentation: What do you learn from context? Probing for sentence structure in contextualized word representations |
| Jan 19 | Paper presentation: Attention is not explanation |
| Jan 21 | Paper presentation: Attention is not not explanation |
| Jan 26 | Lecture 4 |
| Jan 28 | Paper presentation: Mass-Editing Memory in a Transformer |
| Feb 2 | Paper presentation: Steering Llama 2 via Contrastive Activation Addition |
| Feb 4 | Project introduction + Paper presentation: BERT meets Shapley: Extending SHAP Explanations to Transformer-based Classifiers |
| Feb 9 | Project pitching |
| Feb 11 | Project pitching |
| Feb 16 | Reading week — no class |
| Feb 18 | Reading week — no class |
| Feb 23 | Paper presentation: Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? |
| Feb 25 | In-class project preparation |
| Mar 2 | Guest lecture (Thomasz) |
| Mar 4 | Paper presentation: Why Language Models Hallucinate |
| Mar 9 | Guest lecture (Stella) |
| Mar 11 | In-class project preparation |
| Mar 16 | No class |
| Mar 18 | Guest lecture (Sahil) |
| Mar 23 | Paper presentation: Precise In-Parameter Concept Erasure in Large Language Models |
| Mar 25 | Paper presentation: Do Llamas Work in English? On the Latent Language of Multilingual Transformers |
| Mar 30 | How to read a paper |
| Apr 1 | Project presentations |
| Apr 6 | Easter — no class |
| Apr 8 | Project presentations |
Reading List
Will be finalized after the add/drop deadline.
Interpretation
- What do you learn from context? Probing for sentence structure in contextualized word representations
- Attention is not explanation
- Attention is not not explanation
- BERT meets Shapley: Extending SHAP Explanations to Transformer-based Classifiers
- Do Llamas Work in English? On the Latent Language of Multilingual Transformers
- Why Language Models Hallucinate
- Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
- Mechanistic?
- Analyzing the Source and Target Contributions to Predictions in Neural Machine Translation
Control
- Mass-Editing Memory in a Transformer
- Steering Llama 2 via Contrastive Activation Addition
- Linguistic Binding in Diffusion Models: Enhancing Attribute Correspondence through Attention Map Alignment
- Precise In-Parameter Concept Erasure in Large Language Models
- Simple Synthetic Data Reduces Sycophancy in Large Language Models
