CS6963 Distributed Systems


CS6963 Fall 2016 Project

Proposals due: Fri Oct 28 23:59

Code and write-up due: Thu Dec 8 23:59

Presentations: Thu Dec 8 and Thu Dec 15


You must form a group of three CS6963 students to collaborate on the project. You'll turn in your code and a short write-up describing the design and implementation of your project, and make a short in-class presentation about your work. We will post your write-up and code on the web site after the end of the semester, unless you explicitly talk to us about why you want to keep yours confidential.

Your project should be something interesting and challenging that's closely related to CS6963 core topics, such as fault tolerance. Below you'll find some half-baked ideas that we think could turn into interesting projects, but we haven't given them too much thought.


There are four concrete steps to the final project, as follows:

  1. Form a group and decide on the project you would like to work on. Feel free to use Canvas to find group members and discuss ideas. Course staff will be happy to discuss project ideas via e-mail or in person.
  2. Flesh out the exact problem you will be addressing and how you will go about solving it. By the proposal deadline, you must submit a proposal (less than a page) describing: your group members list, the problem you want to address, how you plan to address it, and what are you proposing to specifically design and implement. Submit your proposal to both the TA and the instructor via email. We'll tell you whether we approve, or not, and give you feedback. The projects can take almost any form. Here are some high-level templates; below are more specific ideas:
    • Use your research area: several students work on labs working on distributed systems projects or project adjacent to distributed systems. If at all possible, leverage that to try to find a new question related to the work you already do. Specifying distributed systems with domain-specific languages, modeling them, visualizing them are all related to Ganesh's DS2 project. Tackling a small concrete question or producing a related demo is perfect.
    • Extend/improve/measure existing systems: Runway would be a great project to contribute to. The core infrastructre could be improved, but even just providing additional models would be great.
    • Extend the labs: implement Lab 3b, Lab 4, and Lab 5 (true persistence) and run your Raft KVS on a real network on Emulab. Find one unique question or enhancement to assess. Profile the performance and/or find pathologies (starvation due to leader election, asymmetric geo-graphic placement with unfortunate leader placement, ePaxos-like enhancements, assessing costs/tradeoffs of many/few Raft groups, etc).
    • A literature review: find and review 3 to 5 papers related to a specific topic/theme (MR/Spark, replication, load balancing, consistency, distributed transactions, etc, etc) from the most recent top conferences (SOSP, NSDI, VLDB, SIGMOD, '15, '16). Such a review should include comparisons of common approaches/themes or infer a trajectory for that area of research. (e.g. Read RAMCloud SOSP'15, FaRM NSDI'14 and SOSP'15, End of Slow Networks VLDB'16; closely compare the performance, data model, fault-tolerance, cost tradeoffs of the different transactions approaches).
  3. Execute your project: design and build something neat!
  4. Write a document describing the design and implementation of your project, and turn it in along with your project's code by the final deadline. The document should be about 3 pages of text that helps us understand what problem you solved, and what your code does. The code and writeups will be posted online after the end of the semester.
  5. Prepare a short in-class presentation about the work that you have done for your final project. We will provide a projector that you can use to demonstrate your project. Depending on the number of project groups, we may have to limit the total number of presentations, so some groups might not end up presenting.

Half-baked project ideas

Here's a list of ideas to get you started thinking -- but you should feel free to propose your own ideas.

  • Instrument your Raft implementation and visualize it with ShiViz.
  • Model Two-phase commit in Runway. See if it can be used to find/debug blocking under certain node failure patterns.
  • Design a strategy for scaling up a memcached cluster (that uses consistent hashing, for example). Measure the impact on cache hit rates and performance when the configuration is changed.
  • Simulate a protocol similar to Lab 4 (a partitioned Raft-based KVS) and compare its tail latency to a Dynamo (with (N=3, R=2, W=2)) based approach.
  • Port a simple web application (but more interesting than shopping cart) from a conventional database to only using CRDTs and try running it when two sites span a wide geographic area.
  • Understand the memory fragmentation issues of modern DSMs and design a solution.
  • Port a service to a Unikernel; compare the request latency distribution to running on Linux and characterize the differences you see (see Leverich, et al).
  • Look at the dispatch overhead of a modern request-response based service and design some form of lightweight event dispatch to reduce overheads.
  • Specify a simple system in TLA or Coq. State key invariants and prove correctness.
  • Develop a system that transmits responses directly from a data structure without synchronizing with writers, but uses client-size logic to patch up inconsistencies.
  • Simulate a transaction protocol from class, like Thor, under more modern network assumptions and suggest improvements.
  • Build a distributed, decentralized, fault-tolerant reddit.
  • Make the state synchronization protocol (DDP) in Meteor more efficient (e.g., send fewer bytes between server and client) and more fault-tolerant (e.g., a client should be able to tolerate server failures, as long as enough servers remain live).
  • Build a fault-tolerant file service; on the client side, you could use FUSE to run your own client code, or you could have clients talk NFS to your server, as in Harp.
  • Build a better fault-tolerant peer-to-peer tracker for BitTorrent.
  • Build a system for making Node.js applications fault-tolerant, perhaps using some form of replicated execution.
  • Add cross-shard atomic transactions to Lab 4, using two-phase commit and/or snapshots.
  • Build a system with asynchronous replication (like Dynamo or Ficus or Bayou). Perhaps add stronger consistency (as in COPS or Walter or Lynx).
  • Build a file synchronizer (like Unison or Tra).
  • Build a distributed shared memory (DSM) system, so that you can run multi-threaded shared memory parallel programs on a cluster of machines, using paging to give the appearance of real shared memory. When a thread tries to access a page that's on another machine, the page fault will give the DSM system a chance to fetch the page over the network from whatever machine currently stores.
  • Build a distributed RAID in the style of FAB. Maybe you can get standard operating systems to talk to you network virtual disk using iSCSI or Linux's NBD (network block device).
  • Build a coherent caching system for use by web sites (a bit like memcached), perhaps along the lines of TxCache.
  • Build a distributed cooperative web cache, perhaps along the lines of Firecoral or Maygh.
  • Build a collaborative editor like EtherPad.