Project #1: Concurrency and Contention Hotspot

Project Overview

The first programming project will teach you how to detect a contention hotspot in a reader-writer lock and fix the contention using a distributed counter. The primary goal of this assignment is to become familiar with the low-level implementation details of high-performance reader-writer locks and distributed counters and to learn how to use profiling tools like PERF . All the code in this programming assignment must be written in C. If you have not used C before, here's a short tutorial on the language. Even if you are familiar with C, go over this guide for additional information on writing code in the system.

This is a single-person project that will be completed individually (i.e. no groups).

  • Release date: Tue, August 31

  • Due date: Tue, September 12

Implementation Details

You can refer to these GNU Builtin Atomics and Memory model aware atomics for a list of atomics.

You can refer to this PERF examples documentation or this PERF tutorial on how to use PERF to profile the system. You can also refer to the thread-local storage concept taught in class to reduce the contention in the system.

In this assignment, you will only need to modify the following file:

  • src/lock.h
  • src/lock.c

You will not need to make any changes to any other file in the system. You can locally modify the benchmark.c already included in the system to verify the correctness/performance of your implementation. But you will not submit those files.

You will also need to write a report on how you implemented the reader-writer lock, how you identified the hotspot with PERF profiling, how you implemented the distributed counter, and how the distributed counter changes the performance.

There are three steps to implement a high-performance reader-writer lock in the DBMS:

  1. Profile the lock and identify the hotspot
  2. Implement a distributed counter
  3. Test, Profile again, and Evaluate the Performance

Step #1 - Profile the lock and identify the hotspot

The first step is to run PERF against the ./benchmark code:

bash
make benchmark
perf record ./benchmark 100 5 10000 100

The arguments to the benchmark are, in order:

  1. nreaders: How many reader threads to launch
  2. nwriters: How many writer threads to launch
  3. nitems: # of items in the array
  4. niters: # of iterations. In each iteration a thread acquires the lock once

All arguments must be >= 1.

Sample Output:

Running benchmark with 100 readers, 5 writers, 10000 items, 100 iterations
Threads done, stats:
Readers: min 0.000016 ms, max 1.879325 ms, mean 0.008598 ms, std_dev 0.053511
Writers: min 0.000034 ms, max 0.423291 ms, mean 0.011356 ms, std_dev 0.039250

In order to properly observe the contention bottleneck, you will need a machine with 8 cores. Cade machines have 8 physical cores. This particular hotspot generally shows up with higher than 8 threads.

The PERF command will generate a result file perf.data in the folder that you run PERF command. Then you need to use PERF again to analyze the profiling result.

perf report

The percentages of the sampled on-CPU functions will show up in the window. You will then select into the top hottest functions in the PERF results and examine the annotated code from PERF. The contention comes from some shared resources in the atomic counter that are protected by a lock. But it is your job to identify which resources they are and which lock it is.

HINT: You will need to submit screenshots of your PERF analysis in your report (see Submission).

You next need to fix the contention hotspot on the resources in lock.c by using another, highly concurrent version of the same resource.

Step #2 - Implement a distributed counter

This is the most important step. We are assuming you have successfully identified the contention hotspot.

You next need to fix the contention hotspot on the resources in lock.c by using another, highly concurrent version of the same resource. You need to implement a distributed counter.

Having per-thread data structures does not mean that there cannot be concurrent operations on those data structures. It just reduces the level of contention. When there are concurrent operations on the per-thread data structures, you still need to protect them appropriately. You will not get any point for the programming part of the project if your solution does not guarantee correctness.

Step #3 - Test, Profile again, and Evaluate the Performance

You need to make sure that your implementation is correct before proceed to evaluation. We have implemented some unit tests and basic benchmarks in our system as part of the benchmark. You can also extend the tests by writing your own test cases or scaling up the number of CPUs in the benchmarks.

Then you need to repeat the profiling process from Step #1 to verify that your implementation has reduced the on-CPU percentages of the hot functions in the lock.c. Finally you should compare the performance (throughput) numbers of the benchmark before and after your fix, which will be printed out on the terminal after you execute the benchmark.

HINT: You also need to submit screenshots of your new PERF analysis in your report (see Submission).

Instructions

You can download the Project #1 source code (as a Zip file) from Canvas. It is uploaded under files. You can extract the source code by uncompressing the zip file. using the following command:

unzip p1.zip -d p1/

To debug any correctness issues, you can compile the main benchmark using -D flag to turn off optimizations.

make clean
make D=1 main

You will use the Cade cluster to finish this project.

CADE manages clusters that you can use to do your development and testing for all of the class projects. You are free to use other machines and environments, but all grading will be done on these machines. Please test your solutions on these machines.

Check with CADE if you need to setup an account.

CADE machines all share your home directory, so you needn't log in to the same machine each time to continue working.

After you have an account choose a machine at random from the lab status page from the lab1- set of machines (that is, lab1-1.eng.utah.edu through lab1-40.eng.utah.edu).

ssh lab1-10.eng.utah.edu

CADE user accounts have tcsh set as their default shell. Each time you login first run bash before anything else. All instructions, examples, and scripts from this class assume you are using bash as your shell. You'll need to do this each time unless you reset your default shell ( link) (which I'd recommend). Perhaps, savvy users can provide slick setups. This step is important. If you don't reset your shell, other things will mysteriously break as you try to work through the labs.

PERF and other essential software are installed on all Cade lab1 machines.

Submission

You need to submit a tar.gz file of your source code to canvas.

You should also include a report.pdf in your submission that contains:

  1. A screenshot of the PERF profiling results which shows the two hottest functions in the main before your fix.
  2. A screenshot of the PERF profiling results which shows the bottleneck in any one of the above two functions with the annotated code before your fix.
  3. A brief analysis on how you identify the hotspot and the resources under contention with the help of the above profiling results.
  4. A screenshot of the PERF profiling results which shows the new percentages of the on-CPU functions after your fix.
  5. A screenshot of the PERF profiling results which shows the new bottleneck in any one of the original two hottest functions with the annotated code after your fix.
  6. A brief analysis on how your implementation helps reduce the contention hotspot in the system with the evidence in the above two screenshots.

We will evaluate the correctness and the performance of your implementation off-line after the project due date.

Collaboration Policy

  • Every student has to work individually on this assignment.
  • Students are allowed to discuss high-level details about the project with others.
  • Students are not allowed to copy the contents of a white-board after a group meeting with other students.
  • Students are not allowed to copy the solutions from another colleague.