CS 6960 Human-AI Alignment

Instructor: Daniel Brown

Description: This course will cover a range of topics related to the problem of how to get AI systems to do what we, as humans, actually want them to do. We will explore a range of topics including active learning, human-in-the-loop reinforcement learning, human intent and preference learning, algorithmic teaching, and AI safety. Classes will be a mix of lectures covering foundational materials as well as hands-on analysis and exploration of both seminal and recent research readings. Students will also be engaged in a novel research project, culminating in a final presentation and written technical report. By taking this course, students will develop a broad understanding of the common techniques and unique research challenges involved in building AI systems that learn from, interact with, and assist humans. Additionally, students will learn and practice fundamental research skills, including how to read, write, and review research readings, how to quickly prototype and test research ideas, and how to give technical presentations.

Format: This course combines lectures with paper presentations and analyses by the students, encouraging both fundamental knowledge acquisition as well as open-ended discussions. There will be a series of short homework assignments that test concepts learned in class and give students an opportunity to gain hands-on experience with these ideas. Each student will also carry out an individual or group research project. Weekly paper analyses/presentations will follow a roleplaying model, where students will take turns participating in different rolls.

Syllabus: Available on Canvas.

Schedule: Subject to change

#	Date	Topic	Reading	Supplemental
1	Mon Aug 21	Class intro and logistics
2	Wed Aug 23	Sequential Decision Making	Russell Norvig MDP chapter Sutton Barto Book Sections I.1 Introduction, I.3 The Reinforcement Learning Problem, II.4 Dynamic Programming, and II.6 Temporal Difference Learning.
3	Mon Aug 28	Imitation Learning via Behavioral Cloning	Behavioral Cloning from Observation Implicit Behavioral Cloning	ALVINN
4	Wed Aug 30	Interactive Imitation Learning	DAgger ThriftyDAgger	SafeDAgger HG-DAgger
5	Mon Sep 4	Labor Day	NA
6	Wed Sep 6	Interactive Reinforcement Learning 1	Trial without Error: Towards Safe Reinforcement Learning via Human Intervention TAMER	Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning
		Homework 1 Released	Homework link: BC and BCO	OpenAI Gym PyTorch
7	Mon Sept 11	Interactive Reinforcement Learning 2	Deep TAMER COACH	Deep COACH
8	Wed Sept 13	Inverse RL 1	Algorithms for inverse reinforcement learning (Intro only) Apprenticeship learning via inverse reinforcement learning
9	Mon Sept 18	Inverse RL 2	Bayesian inverse reinforcement learning Maximum entropy inverse reinforcement learning Homework 1 Due by end of day (11:59 MST)
		Homework 2 Released	Homework link: Bayesian IRL
10	Wed Sept 20	Adversarial Imitation Learning	Generative Adversarial Imitation Learning Adversarial Inverse Reinforcement Learning	f-IRL: Inverse Reinforcement Learning via State Marginal Matching
11	Mon Sept 25	RL from Human Preferences 1	Deep Reinforcement Learning from Human Preferences Learning to summarize from human feedback	PEBBLE
12	Wed Sept 27	RL from Human Preferences 2	InstructGPT Homework 2 Due by end of day on Wednesday (9/27) (11:59 MST).	ChatGPT Blog
		Homework 3 Released	RLHF
13	Mon Oct 2	Alignment 1	Scalable agent alignment via reward modeling: a research direction
14	Wed Oct 4	Alignment 2	Alignment for Deep Learning Homework 3 Due by end of day on Friday October 6th (11:59 MST).
	Oct 9-13	Fall Break	No Class
15	Mon Oct 16	Shared Autonomy and Assistance 1	Formalizing Assistive Teleoperation Paragraph Pitch of Final Project Due (11:59 MST)	Shared Autonomy via Hidsight Optimization
16	Wed Oct 18	Shared Autonomy and Assistance 2	Human-in-the-Loop Optimization of Shared Autonomy in Assistive Robotics Controlling Assistive Robots with Learned Latent Actions	Learning to share autonomy from repeated HRI
17	Mon Oct 23	Self-Calibrating Interfaces 1	Interactive Introduction to Self-Calibrating Interfaces
18	Wed Oct 25	Self-Calibrating Interfaces 2	X2T First Contact
19	Mon Oct 30	Optimal Teaching 1	Algorithmic and Human Teaching of Sequential Decision Tasks Machine teaching for IRL
20	Wed Nov 1	Optimal Teaching 2	Cooperative IRL	Pragmatic Pedagogic Value Alignment
21	Mon Nov 6	Multiple Forms of Feedback	Reward rational implicit choice INQUIRE Final Project Proposal and Lit Review Due (11:59 MST)	Unified Learning from Demonstrations, Corrections, and Preferences during Physical Human-Robot Interaction
22	Wed Nov 8	Alignment Verification	Value Alignment Verification
23	Mon Nov 13	Reward Specification Issues	Goal Misgeneralization Inverse Reward Design
24	Wed Nov 15	Ethics	Computational Ethics
25	Mon Nov 20	Existential AI Risk	Overview of Catastrophic AI Risks
	Wed Nov 22	Day Before Thanksgiving	No Class
26	Mon Nov 27	Guest Lecture (virtual)	Yuchen Cui
27	Wed Nov 29	Guest Lecture (virtual)	Dylan Hadfield-Menell
28	Mon Dec 4	Project presentations (virtual)
29	Wed Dec 6	Project presentations
30	Fri Dec 15	Final Project Report Due	Overleaf template Just go to Menu and select copy project.