CS 6530 – Advanced Database Systems

Lectures: TuTh / 10:45AM-12:05PM MT at LNCO 1110

Instructor: Prashant Pandey

  • Email: prashant [dot] pandey [at] utah [dot] edu

  • Office Hours: TuTh / 9:30AM-10:30AM MT at WEB 2686

Teaching Assistants:

  • Yuvaraj Chesetti

    • Email: u1412831 [at] [dot] utah [dot]edu

    • Office Hours: Wednesday 12Noon - 2PM at WEB 2780

Course Overview

This course is a comprehensive study of the internals of modern database systems and the challenges of indexing and querying large-scale data in the context of continuously evolving hardware. It will cover the core concepts and fundamentals of indexing and hashing data structures, concurrency control, storage, file organization, and query processing. The course will study both the in-memory and disk-based database systems and will use examples from modern key-value stores. All the class projects will be in the context of real in-memory and disk-based database systems. The course is appropriate for graduate students in software systems and for advanced undergraduates with systems programming skills.

Prerequisites

Unofficial Prerequisites: CS 5530 (Undergrad databases), CS 3505 software practice in C/C++.

You should know, or be willing to learn quickly by yourself, the programming language C++ for the projects. Here is a good C++ tutorial.

Course Topics

  • In-memory indexing

  • The design space of data structures

  • Row stores vs Column stores

  • Concurrency control

  • Data storage, Buffer management, File organization

  • Key-value stores

  • Logging and recovery

  • Query optimization, execution, compilation

  • Parallel join algorithms

  • External sorting

  • Vector databases

  • Data systems on modern hardware

  • Learned indexes and ML for Databases

Projects

  • Project 1: The first programming assignment is a single-person project and will be completed individually.

  • Project 2: The second programming assignment is a group project. Each group must have three members unless given prior approval by the instructor.

  • Final project: The main portion of a student's grade in this course is the final group project. Students will organize into groups of three and choose to implement a project that is

    • relevant to the materials discussed in class,
    • requires a significant programming effort from all team members,
    • unique (i.e., two groups may not choose the same project topic).

    The projects will vary in both scope and topic, but they must satisfy this criterion. We will discuss this more in-depth during class, though students are encouraged to begin to think about projects that interest them early on. If a group is unable to come up with their own project idea, the instructor will provide suggestions on interesting topics.

Paper Reading

There is a set of assigned paper readings for the course. The reading list is designed to provide additional information and insight into the current state-of-the-art database systems research. Each student is required to pick five papers from the reading list and turn in a one-paragraph synopsis of each of the five papers. There will be five deadlines throughout the semester when students would be required to submit the synopsis. Late submissions will not be accepted without prior approval from the instructor.

Each review must include the following information:

  • An overview of the main idea and contributions (Three sentences).
  • What system was used in the implementation (One sentence).
  • The workloads that they used for their evaluation (One sentence).

These reading reviews must be your own writing. You may not copy from the papers or other sources that you find on the web. Plagiarism will not be tolerated.

Useful Resources

Please refer to this brief overview of asymptotic notations The Asymptotic Cheat Sheet. This will help you easily follow theoretical analyses in the course.

Grading

  • Project 1: 15%

  • Project 2: 25%

  • Project 3: 30%

  • Paper Reports: 10%

  • Final Exam: 10%

  • Class participation: 10%

Late submission policy

  • No late submissions are allowed. Please plan accordingly based on the submission dates.
  • In case of emergencies, prior permission from the instructor is required.

Collaboration and Plagiarism

Everyone needs to read the SoC Policy on Academic Misconduct.

Working with others on assignment is a good way to learn the material and we encourage it. However, there are limits to the degree of cooperation that we will permit.

When working on programming assignments, you must work only with others whose understanding of the material is approximately equal to yours. In this situation, working together to find a good approach for solving a programming problem is cooperation; listening while someone dictates a solution is cheating. You must limit collaboration to a high-level discussion of solution strategies, and stop short of actually writing down a group answer. Anything that you hand in, whether it is a paper report or a computer program, must be written in your own words. If you base your solution on any other written solution, you are cheating.

If you collaborate with other students to discuss a problem and then write your own solution, make sure to declare upfront in the write up names of all the students you collaborated with.

Never look at another student's code or share your code with any other student.

You must not make your code public (on github or by any other means).

Tools like Github Copilot, ChatGPT, and copying code from sites like Stack Overflow also constitutes cheating. Do not write code with Copilot enabled in this course.

We do not distinguish between cheaters who copy other's work and cheaters who allow their work to be copied. If you cheat, you will be given an E in the course and referred to the University Student Behavior Committee.

Clearly, any attempt to subvert the ordinary grading process constitutes cheating.

If you have any questions about what constitutes cheating, please ask first.