In this lab you will build a fault-tolerant key-value storage service using your Raft library from lab 2. You will build your key-value service as a replicated state machine, consisting of several key-value servers that coordinate their activities through the Raft log. Your key/value service should continue to process client requests as long as a majority of the servers are alive and can communicate, in spite of other failures or network partitions.
Your system will consist of clients and key/value servers, where each key/value server also acts as a Raft peer. Clients send Put(), Append(), and Get() RPCs to key/value servers (called kvraft servers), which then place those calls into the Raft log and execute them in order. A client can send an RPC to any of the kvraft servers, but should retry by sending to a different server if the server is not currently a Raft leader, or if there's a failure. If the operation is committed to the Raft log (and hence applied to the key/value state machine), its result is reported to the client. If the operation failed to commit (for example, if the leader was replaced), the server reports an error, and the client retries with a different server.
This lab has two parts, but you only need to implement part A. In part A, you will implement the service without worrying that the Raft log can grow without bound.
You must write all the code you hand in for this class, except for code that we give you as part of the assignment. You are not allowed to look at anyone else's solution, and you are not allowed to look at code from previous years. You may discuss the assignments with other students, but you may not look at or copy each others' code. Please do not publish your code or make it available to future students in this class; for example, please do not make your code visible on github.
Do a git pull to get the latest lab software. We supply you with new skeleton code and new tests in src/kvraft. You will need to modify kvraft/client.go, kvraft/server.go, and perhaps kvraft/common.go.
To get up and running, execute the following commands:
$ cd ~/cs6450/labs $ git pull ... $ cd src/kvraft $ GOPATH=~/cs6450/labs $ export GOPATH $ go test ... $When you're done, your implementation should reliably pass the tests in the src/kvraft directory through TestPersistPartitionUnreliable():
for t in TestBasic TestConcurrent TestUnreliable TestUnreliableOneKey TestOnePartition TestManyPartitionsOneClient TestManyPartitionsManyClients TestPersistOneClient TestPersistConcurrent TestPersistConcurrentUnreliable TestPersistPartition TestPersistPartitionUnreliable; do go test -run $t$ done Test: One client ... ... Passed PASS ok kvraft 15.427s Test: concurrent clients ... ... Passed PASS ok kvraft 16.712s Test: unreliable ... ... Passed PASS ok kvraft 16.955s Test: Concurrent Append to same key, unreliable ... ... Passed PASS ok kvraft 2.061s ...
The service supports three RPCs: Put(key, value), Append(key, arg), and Get(key). It maintains a simple database of key/value pairs. Put() replaces the value for a particular key in the database, Append(key, arg) appends arg to key's value, and Get() fetches the current value for a key. An Append to a non-existant key should act like Put.
You will implement the service as a replicated state machine consisting of several kvservers. Your kvraft client code (Clerk in src/kvraft/client.go) should try different kvservers it knows about until one responds positively. As long as a client can contact a kvraft server that is a Raft leader in a majority partition, its operations should eventually succeed.
Your first task is to implement a solution that works when there are no dropped messages, and no failed servers. Your service must ensure that Get(), Put(), and Append return results that are linearizable. That is, completed application calls to the Clerk.Get(), Clerk.Put(), and Clerk.Append() methods in kvraft/client.go must appear to all clients to have affected the service atomically in some total order, even in there are failures and leader changes. A Clerk.Get(key) that starts after a completed Clerk.Put(key, …) or Clerk.Append(key, …) should see the value written by the most recent Clerk.Put(key, …) or Clerk.Append(key, …) in the total order. Completed calls should have exactly-once semantics.
A reasonable plan of attack may be to first fill in the Op struct in server.go with the "value" information that kvraft will use Raft to agree on (remember that Op field names must start with capital letters, since they will be sent through RPC), and then implement the PutAppend() and Get() handlers in server.go. The handlers should enter an Op in the Raft log using Start(), and should reply to the client when that log entry is committed. Note that you cannot execute an operation until the point at which it is committed in the log (i.e., when it arrives on the Raft applyCh).
You have completed this task when you reliably pass the first test in the test suite: "One client". You may also find that you can pass the "concurrent clients" test, depending on how sophisticated your implementation is.
Your kvraft servers should not directly communicate; they should only interact with each other through the Raft log.
Add code to cope with duplicate client requests, including situations where the client sends a request to a kvraft leader in one term, times out waiting for a reply, and re-sends the request to a new leader in another term. The client request should always execute just once. To pass part A, your service should reliably pass all tests through TestPersistPartitionUnreliable().
Before submitting, please run the 3A tests one final time (ensuring it works though TestPersistPartitionUnreliable()). You are responsible for making sure your code works. Keep in mind that the more obscure corner cases may not appear on every run, so it's a good idea to run the tests multiple times.
You will receive full credit if your software passes these tests reliably on our machines:
for t in TestBasic TestConcurrent TestUnreliable TestUnreliableOneKey TestOnePartition TestManyPartitionsOneClient TestManyPartitionsManyClients TestPersistOneClient TestPersistConcurrent TestPersistConcurrentUnreliable TestPersistPartition TestPersistPartitionUnreliable; do go test -run $t$ done Test: One client ... ... Passed PASS ok kvraft 15.427s Test: concurrent clients ... ... Passed PASS ok kvraft 16.712s Test: unreliable ... ... Passed PASS ok kvraft 16.955s Test: Concurrent Append to same key, unreliable ... ... Passed PASS ok kvraft 2.061s ...All late days must be pre-approved by the TA before the deadline, and you must notify the TA when you've pushed your submission if it is after the normal deadline. If you don't, then we won't receive your submission since we need to manually pull it.
Please post questions on Canvas.
Turn in will be handled entirely through git. When you solution is working as you'd like make sure all of your changes are committed in your local repo.
$ git commit -am 'Final changes for lab 3a.' $ git status On branch master nothing to commit, working directory clean
You'll need to push the changes you've made to your working branch in your local repository to a new branch called 'lab3a' in your private gitlab repo. If you haven't explicitly made any local branches, then your working branch should be called 'master'. This command takes the local repo's master branch and pushes it to a branch called lab3a in the repo you cloned from (hopefully your own private gitlab repo if you done things right).
git push origin master:lab3a
If you discover you'd like to make more updates before the deadline that's fine. Just make additional commits to your local repo just like before, then repeat the above step. The new commits will be pushed to your gitlab repo.
We have a grading script that periodically fetches lab submission branches from each student's gitlab repo. You can check the status of the script - it reports for each lab branch name (lab1, lab2a, etc.) the SHA1 hash of the commit that will be used for grading if you make no further commits. Note, the grading script stops collecting updates for a particular branch once the submission deadline has passed.