Note: This is the syllabus for the 2025w2 offering; some parts might be updated closer to the 2026w1 offering.

CPSC_V 532H: Topics in Artificial Intelligence: Analysis, Control and Interpretation of Models in NLP

When: Monday, Wednesday, 10:30am – 11:50am
Where: SWNG SWNG-308
Instructor: Hila Gonen
Office hours: By appointment
Canvas: https://canvas.ubc.ca/courses/178785

Overview

Models for natural language processing have been advancing at an astonishing pace in recent years, but they remain opaque and difficult to reason about. In this course, we will cover different methods of model interpretation, attempting to open these black boxes, and survey some algorithms for controlling specific properties and behaviors of large language models (LLMs). We will also cover common practices in model analysis, including in multilingual settings, and with respect to model biases.

Assessments

Component	Weight
Written paper comments	10%
Participation	5%
Student presentation — main	20%
Student presentation — review	5%
Student presentation — future research	5%
Project pitch	5%
Project proposal	10%
Project presentation	20%
Project report	20%
Total	100%

Attendance

Students are expected to attend classes and actively participate in class discussions.

Paper Comments

For 10 out of the papers we will be reading, students are expected to send 1–2 paragraphs summarizing the paper and raising 2–3 questions. These are due the day before class at 6pm, by email to the TA. Paper comments cannot be about papers the student is presenting (presentation/review/future research).

Student Presentations

Each paper will be presented by a student with a main presentation (40 min), with two other students following with a review (10 min) and proposed future research (10 min). This will be followed by a 20-minute structured group discussion in class.

Each student is expected to:

Fully present one paper
Present a review of one paper
Present a future research idea for one paper

Projects

Projects will be done in groups of 2–3 students (depending on the number of students), and will consist of the following parts: Pitch, Proposal, Final Presentation, and Final Report.

More details and instructions will be provided on February 2nd.

Course Schedule (Tentative)

Date	Session
Jan 5	Lecture 1: Course details and intro
Jan 7	Lecture 2: Control and analysis through the lens of societal impact
Jan 12	Lecture 3
Jan 14	Paper presentation: What do you learn from context? Probing for sentence structure in contextualized word representations
Jan 19	Paper presentation: Attention is not explanation
Jan 21	Paper presentation: Attention is not not explanation
Jan 26	Lecture 4
Jan 28	Paper presentation: Mass-Editing Memory in a Transformer
Feb 2	Paper presentation: Steering Llama 2 via Contrastive Activation Addition
Feb 4	Project introduction + Paper presentation: BERT meets Shapley: Extending SHAP Explanations to Transformer-based Classifiers
Feb 9	Project pitching
Feb 11	Project pitching
Feb 16	Reading week — no class
Feb 18	Reading week — no class
Feb 23	Paper presentation: Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
Feb 25	In-class project preparation
Mar 2	Guest lecture (Thomasz)
Mar 4	Paper presentation: Why Language Models Hallucinate
Mar 9	Guest lecture (Stella)
Mar 11	In-class project preparation
Mar 16	No class
Mar 18	Guest lecture (Sahil)
Mar 23	Paper presentation: Precise In-Parameter Concept Erasure in Large Language Models
Mar 25	Paper presentation: Do Llamas Work in English? On the Latent Language of Multilingual Transformers
Mar 30	How to read a paper
Apr 1	Project presentations
Apr 6	Easter — no class
Apr 8	Project presentations

Hila Gonen

CPSC_V 532H: Topics in Artificial Intelligence: Analysis, Control and Interpretation of Models in NLP

Overview

Assessments

Attendance

Paper Comments

Student Presentations

Projects

Course Schedule (Tentative)

Reading List

Interpretation

Control

Analysis