Project #3: Logging and Recovery

Project Overview

The third programming project will teach you how to implement write-ahead logging, checkpointing, and recovery in a disk-based key-value store. The primary goal of this assignment is to become familiar with the low-level implementation details of write-optimized key-value stores and to learn how to implement write-ahead logging (WAL) and recovery to bring the index back to a consistent state after a crash. All the code in this programming assignment must be written in C++. If you have not used C++ before, here's a short tutorial on the language. Even if you are familiar with C++, go over this guide for additional information on writing code in the system.

Here are some resources to learn about write-ahead logging and key-value store.

This is a group project that will be completed in groups of two or three students. The student groups will be based on your responses to the Project2 groups. If you plan to diverge from the already specified group please first seek permission from the instructor.

This project will have three deadlines. The first milestone will require the students to submit a design document.

Release date: Friday, October 18
Design Doc Due date: Thursday, October 29
Logging/Checkpointing Doc Due date: Tuesday, November 12
Final Submission (Recovery) Due date: Thursday, November 26

Implementation Details

In this assignment, you will need to add the following functionality in the key-value store:

Logging
Checkpointing
Receovery

In this assignment, you will need to modify the following existing files:

betree.hpp
swap_space.hpp
backing_store.hpp

You can locally modify the test.cpp benchmark already included in the system to verify the correctness/performance of your implementation. But you will not submit those files.

You will also need to write a report and submit that with the final source code.

There are four steps to completing this project:

This project does not involve concurrency and transactions.

We will deal with a single-threaded version of the key-value store. The index involved in this project does not currently support transactions. Each operation will act as an individual entity. However, you will need to think of dependencies among nodes during a split/merge operation in the tree and ensure that the tree can be recovered to a consistent state after a crash.

Step #1 - Design the logging/recovery functionality

The first step is to understand how the key-value store works and then write a design document on how you would add the logging and recovery functionality.

You should first build the test benchmark and run the ./test to learn about the key-value store:

make 
mkdir tmpdir
./test -m benchmark-upserts -d tmpdir

The test benchmark has a help mode that explains the various arguments required to run it.

In order to properly understand the key-value store please read the README file and the comments at the top of source files. You can also read this paper to understand the internals of the B^ε-tree. B^ε-tree is at the heart of this index.

The design document should include the following items:

New class names, structure, and code files that you would create.
The new API to support logging, checkpointing, and recovery.
What type of logging will you use?
What would be the structure of a log record?
New methods to add in the B^ε-tree code to support logging and recovery.
What new arguments (knobs) related to the logging granularity and checkpointing granularity will be added?
What would be the test cases to verify the correctness of logging and recovery?

Once the group submits the design document, we will provide any required feedback and suggest any changes. We will also provide a test file and a bash script to test the correctness of the recovery process after a crash. The new test file will help guide you on what is expected from the logging and recovery functionality.

Step #2 - Implement write ahead logging (WAL)

This part will require you to implement write-ahead logging. As part of the logging functionality, you will need to create a file-backed logger where you will append the update operations before inserting them into the key-value store. The changes are first recorded in the log, which must be written to stable storage, before the changes are written to the database. Every operation that modifies the key-value store state has to be logged on disk before the contents on the associated nodes in the tree can be modified.

After the changes are appended, the log file needs to be persisted to disk. The system can only acknowledge the user after the changes are persisted to disk. The logger can persist the file after adding every update operation or after a fixed number of changes. The log persist granularity can affect the performance of the index.

To implement the log file, you can implement your own file handling code or you can also make use of the backing_store API already provided in the source code. Here's a short tutorial on file handling in C++.

Step #3 - Implement checkpointing

This part will require you to implement the checkpoint operation using the log file. A "Checkpoint" operation transfers the write-ahead logging file changes into the key-value store. Once the changes from the log file are inserted into the key-value store and corresponding nodes are written back to disk you need to purge those log entries.

The checkpointing operation needs to be performed at regular intervals. Depending on the granularity of the checkpointing operation, the user might get stale results for some duration when they query the key-value store. Similar to the log persist granularity, the checkpointing granularity can affect the staleness guarantees of the index.

Step #4 - Implement recovery

This part will require you to implement recovery after a crash. The recovery is the first thing that is called when a system comes back up after a crash. The recovery logic will check in the log file to determine any changes that are not committed to the key-value store yet. This can be done by looking at the size of the log file on disk and the last checkpoint index. The recovery function must replay remaining changes from the log and update the checkpoint index.

The recovery must also implement a function in the key-value store to reconstruct the tree by reading nodes from disk after a crash. This can be done using the serialization/deserialization methods already implemented in the data structure.

Tips

If you're not sure how to start, try splitting the work into smaller objectives:

Logging
- First, build a log item type that contains enough information to replay one operation on the tree.
- Then, create a system to log every operation on the tree in memory.
- Finally, given a set persistence granularity, push items from this log to disk.
- This creates a write-after log, where operations are logged after they occur. To finish part 2, invert the order so that operations on the tree don’t occur until after their log item has been persisted to disk.
Checkpointing
- Every now and then, we want to checkpoint by persisting the current tree to disk and clearing the log. The order in which these operations occur is very important, so think about it!
- The swap space contains a pair of functions maybe_evict_something and write_back that look for unused items and flush an object to disk with a new version ID. How can you expand these to flush the entire tree?
- After the tree is persisted, all items in the log have been applied, so we can clear the log.
- Think about what information is needed to recover from the cleared log: is deleting the log sufficient, or do you need to save something?
Recovery
- This is the most complicated part, so be sure to leave enough time to work on it!
- One of the simpler ways to do recovery is to always keep a valid copy of the tree stored: if the log contains all the items between when the tree was saved / when the crash occurred, replaying the log on the saved tree if sufficient for recovery.
- Something that might be helpful is to delay the swap space from deleting files until a checkpoint occurs: that way, the checkpointed version of your tree is always available until a new checkpoint has completed.
- You’ll need to do some augmentation to the backing store, but nodes are saved with a version number. Some modifications to the backing store to find and load the newest/oldest version of a node may help during recovery.

The tree is implemented using copy-on write semantics. Make sure you understand the copy-on write semantics. This will help simplify the recovery process.

Instructions

You can download the Project #3 source code (as a tar file) from Canvas. It is uploaded under files. You can extract the source code using the following command:

unzip project3.zip

To debug any correctness issues, you can compile the test benchmark using the -g flag and removing the -O3 flag to turn off optimizations.

Make sure to turn on the -O3 flag when doing benchmarking.

You will use the Cade cluster to finish this project.

CADE manages clusters that you can use to do your development and testing for all of the class projects. You are free to use other machines and environments, but all grading will be done on these machines. Please test your solutions on these machines.

Check with CADE if you need to setup an account.

CADE machines all share your home directory, so you needn't log in to the same machine each time to continue working.

After you have an account choose a machine at random from the lab status page from the lab1- set of machines (that is, lab1-1.eng.utah.edu through lab1-40.eng.utah.edu).

ssh lab1-10.eng.utah.edu

CADE user accounts have tcsh set as their default shell. Each time you login first run bash before anything else. All instructions, examples, and scripts from this class assume you are using bash as your shell. You'll need to do this each time unless you reset your default shell ( link) (which I'd recommend). Perhaps, savvy users can provide slick setups. This step is important. If you don't reset your shell, other things will mysteriously break as you try to work through the labs.

Essential software are installed on all Cade lab1 machines.

Submission

You need to submit a .zip file of your source code to canvas.

You should also include a report.pdf in your submission that contains:

Make sure that report.pdf is included separately and not a part of the .zip file.

A design document on how to add the logging and recovery functionality in the key-value store. The design document should be submitted on or before the first milestone deadline.
A brief report describing your implementation, how it will safeguard the index against crashes, and what kind of guarantees does the index provides.
A complete analysis on how the tunable parameters ( logging and checkpointing granularity) related to logging and recovery in your implementation impact the read and write performance of the key-value store.
Plot the performance of the key-value store (upsert, query, recovery) with changing values of the knobs. You need to comment on the performance: explain why the performance changes the way it changes.
A list of contributions made by individual students in the group. For parts that multiple students contributed equally should mention "equal contribution".

We will evaluate the correctness and the performance of your implementation off-line after the project due date.

Collaboration Policy

Students will work in the same group as they specified in the Project2 team quiz in Canvas.
Students are allowed to discuss high-level details about the project with others.
Students are not allowed to copy the contents of a white-board after a group meeting with other students.
Students are not allowed to copy the solutions from another colleague.