CS 6960 Human-AI Alignment

Instructor: Daniel Brown 

Description: This course will cover a range of topics related to the problem of how to get AI systems to do what we, as humans, actually want them to do. We will explore a range of topics including active learning, human-in-the-loop reinforcement learning, human intent and preference learning, algorithmic teaching, and AI safety. Classes will be a mix of lectures covering foundational materials as well as hands-on analysis and exploration of both seminal and recent research readings. Students will also be engaged in a novel research project, culminating in a final presentation and written technical report. By taking this course, students will develop a broad understanding of the common techniques and unique research challenges involved in building AI systems that learn from, interact with, and assist humans. Additionally, students will learn and practice fundamental research skills, including how to read, write, and review research readings, how to quickly prototype and test research ideas, and how to give technical presentations.

Format: This course combines lectures with paper presentations and analyses by the students, encouraging both fundamental knowledge acquisition as well as open-ended discussions. There will be a series of short homework assignments that test concepts learned in class and give students an opportunity to gain hands-on experience with these ideas. Each student will also carry out an individual or group research project. Weekly paper analyses/presentations will follow a roleplaying model, where students will take turns participating in different rolls.

Syllabus: Available on Canvas.

Schedule: Subject to change

#      Date Topic Reading Supplemental
1 Mon Aug 21 Class intro and logistics
2 Wed Aug 23 Sequential Decision Making
  • Russell Norvig MDP chapter
  • Sutton Barto Book Sections I.1 Introduction, I.3 The Reinforcement Learning Problem, II.4 Dynamic Programming, and II.6 Temporal Difference Learning.
  • 3 Mon Aug 28 Imitation Learning via Behavioral Cloning
  • Behavioral Cloning from Observation
  • Implicit Behavioral Cloning
  • 4 Wed Aug 30 Interactive Imitation Learning
  • DAgger
  • ThriftyDAgger
  • SafeDAgger
  • HG-DAgger
  • 5 Mon Sep 4 Labor Day NA
    6 Wed Sep 6 Interactive Reinforcement Learning 1
  • Trial without Error: Towards Safe Reinforcement Learning via Human Intervention
  • Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning
  • Homework 1 Released Homework link: BC and BCO
  • OpenAI Gym
  • PyTorch
  • 7 Mon Sept 11 Interactive Reinforcement Learning 2
  • Deep TAMER
  • Deep COACH
  • 8 Wed Sept 13 Inverse RL 1
  • Algorithms for inverse reinforcement learning (Intro only)
  • Apprenticeship learning via inverse reinforcement learning
  • 9 Mon Sept 18 Inverse RL 2
  • Bayesian inverse reinforcement learning
  • Maximum entropy inverse reinforcement learning
  • Homework 1 Due by end of day (11:59 MST)
    Homework 2 Released Homework link: Bayesian IRL
    10 Wed Sept 20 Adversarial Imitation Learning
  • Generative Adversarial Imitation Learning
  • Adversarial Inverse Reinforcement Learning
  • f-IRL: Inverse Reinforcement Learning via State Marginal Matching
  • 11 Mon Sept 25 RL from Human Preferences 1
  • Deep Reinforcement Learning from Human Preferences
  • Learning to summarize from human feedback
  • 12 Wed Sept 27 RL from Human Preferences 2
  • InstructGPT
  • Homework 2 Due by end of day on Wednesday (9/27) (11:59 MST).
  • ChatGPT Blog
  • Homework 3 Released RLHF
    13 Mon Oct 2 Alignment 1
  • Scalable agent alignment via reward modeling: a research direction
  • 14 Wed Oct 4 Alignment 2
  • Alignment for Deep Learning
  • Homework 3 Due by end of day on Friday October 6th (11:59 MST).
    Oct 9-13 Fall Break No Class
    15 Mon Oct 16 Shared Autonomy and Assistance 1
  • Formalizing Assistive Teleoperation
  • Paragraph Pitch of Final Project Due (11:59 MST)
  • Shared Autonomy via Hidsight Optimization
  • 16 Wed Oct 18 Shared Autonomy and Assistance 2
  • Human-in-the-Loop Optimization of Shared Autonomy in Assistive Robotics
  • Controlling Assistive Robots with Learned Latent Actions
  • Learning to share autonomy from repeated HRI
  • 17 Mon Oct 23 Self-Calibrating Interfaces 1
  • Interactive Introduction to Self-Calibrating Interfaces
  • 18 Wed Oct 25 Self-Calibrating Interfaces 2
  • X2T
  • First Contact
  • 19 Mon Oct 30 Optimal Teaching 1
  • Algorithmic and Human Teaching of Sequential Decision Tasks
  • Machine teaching for IRL
  • 20 Wed Nov 1 Optimal Teaching 2
  • Cooperative IRL
  • Pragmatic Pedagogic Value Alignment
  • 21 Mon Nov 6 Multiple Forms of Feedback
  • Reward rational implicit choice
  • Final Project Proposal and Lit Review Due (11:59 MST)
  • Unified Learning from Demonstrations, Corrections, and Preferences during Physical Human-Robot Interaction
  • 22 Wed Nov 8 Alignment Verification
  • Value Alignment Verification
  • 23 Mon Nov 13 Reward Specification Issues
  • Goal Misgeneralization
  • Inverse Reward Design
  • 24 Wed Nov 15 Ethics
  • Computational Ethics
  • 25 Mon Nov 20 Existential AI Risk
  • Overview of Catastrophic AI Risks
  • Wed Nov 22 Day Before Thanksgiving No Class
    26 Mon Nov 27 Guest Lecture (virtual) Yuchen Cui
    27 Wed Nov 29 Guest Lecture (virtual) Dylan Hadfield-Menell
    28 Mon Dec 4 Project presentations (virtual)
    29 Wed Dec 6 Project presentations
    30 Fri Dec 15 Final Project Report Due Overleaf template Just go to Menu and select copy project.