CS 475, Spring 2019, Homework 1, Due Feb 11, 11:00 am

Start by downloading the handout:

Download the handout

Please post questions on Piazza

Revision History:

  • 1/30 Add 2 hints/notes to bottom of part 1
  • 2/4: Draw attention to the fact that you can use, 1: synchronized(theList) for part 2,  2: you can add new fields to your KeyValueLoader, and 3: the semantics of addToList and deleteFromList
  • 2/5: Add note about using the “kvStore” instance in the KeyValueLoader
  • 2/6: If you use an executor, make 100% certain that you do not return from loadFile before the executor is done — remember from the code example in class, we do this by calling executor.shutdown() and then executor.awaitTermination(…), using a timeout that is far greater than might ever be used to actually do the work (say, several minutes) 
  • 2/10: Add hint for part 4

The purpose of this assignment is to get you familiar with basic concurrency control topics in Java. For this assignment, you will build a simple key-value store (KV Store) and client in Java. KV stores are widely used as layers underneath complex distributed systems, and examples include Redis, Voldemort, Dynamo. KV stores are very interesting to study from the perspective of concurrent and distributed software because the interface that they expose to client applications is incredibly simple (appearing effectively to be just a HashMap), but their implementation can be incredibly complex. In particular, once a KV store needs to be able to store more data than can fit in main memory on a single machine, deciding how to swap data to disk can be an interesting problem. If the KV store is then distributed amongst multiple machines, then the specific mechanisms used to keep data in sync can become quite complicated as well.

For this assignment, you will implement a simple KV store that operates entirely within memory, and on a single machine. However, the KV store will be accessed by multiple client threads, and hence, must maintain thread safety. You will also implement a data loader tool, which will read a file in line-by-line, parse each line, and then load that data into the KV store.

General requirements:

We have provided you a baseline implementation of the KV store that has various methods stubbed out, which you will implement. You may not include any additional libraries (e.g. download and and require additional JARs), although feel free to use any of the various libraries included with Java or already included by the starter project. Your project will be compiled and tested using Java 8, so please do not use any Java 9+ exclusive features (the first long-term support version of Java > 8 just came out in September; we’ll switch to Java 11 next semester, sorry).

Your KV Store will be compiled and tested using apache maven, which automatically executes the provided JUnit tests. Please install Maven on your computer. Unfortunately, Maven is not installed on Zeus, however you can download the most recent version (e.g. apache-maven-3.5.2-bin.zip) and unzip it in your home directory. Then, to run maven, type ~/apache-maven-3.5.2/bin/mvn in the snippets below (instead of just mvn). Note that you can easily import maven projects into eclipse and IntelliJ (we suggest IntelliJ over eclipse).

To compile and run your KV store, run mvn package and then run  java -jar target/kvstore-2019.3.1-SNAPSHOT.jar input.dat.

Your KV store will be automatically graded for correctness (note that there will be a manual grading phase to check hard-to-automatically-catch concurrency issues). Included with your handout is a simple version of the (large) test script that we will use to grade your assignment. Upon submitting your assignment, our server will automatically compile and test your assignment and provide you with test results. You can resubmit 50 times before the deadline without penalty. To run these tests, simply execute mvn -Ptest test (of course, if you do this first off, you’ll see that they all fail!)

Note: Your code must compile and run on the autograder, under Java 8. It is unlikely that you will have any difficulties developing on a Mac, Linux or Windows. When you feel satisfied with implementing one phase of the assignment, submit to AutoLab and verify that AutoLab agrees.

Academic honesty reminder: You may NOT share any of your code with anyone else. You may NOT post your code in a publicly viewable place (e.g. in a public GitHub repository).  You may face severe penalties for sharing your code, even “unintentionally.” Please review the course’s academic honesty policy.

Tips and References

You might find it useful to reference the official Java tutorial on concurrency, which does a nice job outlining at a high level how you can create threads and locks. If you find any useful references, please feel free to share them on Piazza and we will update this page as well.

Part 1: The Simplest Key Value Store (20%)

To get started, you’ll implement the basic functionality of the key value store, and ensure that it works in a normal (single-threaded) environment.  This will act as something of a baseline for Java programming without concurrency. The tricky parts will come when we make this functionality thread-safe (in Part 2).

Your KV store will support two different kinds of values: strings and lists. Hence, a key might be mapped to: (1) nothing, (2) a single string, and (3) a list (containing any number of strings).

You will implement this behavior in  edu.gmu.cs475.KeyValueStore, specifically in the following four methods:

Note the following requirements:

  1. You should feel free to add whatever methods you would like to the  KeyValueStore class, but you must not change the names or parameters of the methods listed above.
  2. You must not add any fields to any classes.
  3. You can not use your own HashMap, but instead it is almost as easy: You must call  _get, _set, _remove, and  _listKeys in your  KeyValueStore to access the underlying data store. You must not maintain any mapping yourself between keys and values.

Additional comments/tips:

  • 1/30: set should throw an IllegalArgumentException if the value is not a String or List — this means that the value can be either any instance of the class java.lang.String or any instance of java.util.List (and its subtypes) instanceOf is what you should be using here.
  • 1/30: getType will likely also need to call get and use instanceof

Precise grading breakdown:

  • Automated tests: 15 points
    • 9 JUnit tests, 1 2/3 points each
  • 5 points from manual grading

Part 2: Managing Thread Safety (35%)

Once you are satisfied that your key-value store works correctly in a single-threaded environment, your next task will be to add locking to it. You should implement all of your concurrency control through (1) the this.bigLock object (a  java.util.concurent.locks.Lock object), and (2) by synchronizing on list objects. (2/4: Note this is a hint – you should be making calls to the lock, and also to synchronized(theListObject)).

Note the following concurrency requirements:

  1. You must not allow concurrent calls to any of the following methods:  _get_set, _remove, or  _listKeys
  2. For the operations addToList and deleteFromList: if  addToList (or deleteFromList) are concurrently called for two different keys, you must allow them to execute concurrently (subject to requirement 1). (2/4: Note that a side effect of this requirement is that a key might be deleted from the KV store while addToList is running [after it calls _get and before it calls add] – this is OK).

Precise grading breakdown:

  • 5 JUnit tests, 6 points each
  • 5 points from manual grading

Part 3: Bulk Data Loader (35%)

Once you have gotten the KV store running, the last component to develop will be the data loader (in the KeyValueLoader class). The idea here is to implement a simple component that takes in a file of key=value pairs, then inserts them into the KV store. The semantics of the methods that you need to implement are detailed below:

For this component, you should feel free to add any (non-static) fields or methods to the  KeyValueLoader. (2/4: Hint: you will need to have your own map of locks for each key, and ensure that you acquire and release this lock in loadLine. As per the prior sentence, you should feel free to add new fields or methods to the KeyValueLoader. Start by adding a method getLockForKey(String key), and reason through its correctness first.)

2/5: To pass the tests, you must use the “this.kvStore” as your instance of the KeyValueStore for storing the processed lines.

2/6: If you use an executor, make 100% certain that you do not return from loadFile before the executor is done — remember from the code example in class, we do this by calling executor.shutdown() and then executor.awaitTermination(…), using a timeout that is far greater than might ever be used to actually do the work (say, several minutes) 

Note: you must not use Files.lines(..).parallel().forEach(...) to read the line in and process it in parallel: in Java 8, this stream does not support parallel processing (and it happens sequentially).

Take note of the concurrency issues that you may run into, given that  loadLine will likely be called from multiple threads simultaneously. In particular, take note of your handling of step (4) in the process described for the method (converting an existing string into a list).

Precise grading breakdown:

  • 3 JUnit tests, 10 points each
  • 5 points from manual grading

Part 4: Testing the Tests (10%)

We have included several functional tests that simply test the behavior of each of these methods. We have not provided any tests (for you to see) that test the behavior of your software under concurrency (the tests for parts 2 and 3). One of your tasks is to write such tests.

Note that it is generally impossible to automatically prove that concurrent programs have no races, it is common instead to test for low-hanging fruit. For instance, it is possible to write tests that demonstrate that two events can happen concurrently, and tests that show that it’s unlikely (but not impossible) that two other events do not happen concurrently.

We are asking you to implement one of the tests that we are using to test part 2:  testAddOrDeleteFromDifferentListsShouldBeConcurrent. The goal of this test is to demonstrate that it is possible to call  addToList(key1,value1) and  addToList(key2,value2) concurrently, and for both of the underlying  add operations to actually occur concurrently (and, similarly for deleteFromList). To do so, you can use any concurrency tools that you would like: make as many threads as you need (keep it under 10 though, please), use monitors ( synchronized, with wait and notify), locks, semaphores, latches, etc. There are many ways to do this correctly – it is up to you to decide how to proceed.

Hint (2/5): If you’re unsure where to start, we suggest that you start by using just threads, using Thread.join(timeout), with no extra concurrency control.

Hint (2/10): The illustration below shows the interleaving that we are trying to witness: that while Thread 1’s addToList is still running (that is to say, before addToList returns in Thread 1), Thread 2 successfully starts and adds an item to the same list at the same key as Thread 1. Note that in the graphic we show the two linearization points (from your call to kvStore.get and to list.add, which should both be protected with different locks). Because list.add (or remove) should be protected with a lock (or synchronized block), it should not be possible to see the two list.add’s overlap. 

Implement your test in  Part4TestingTheTestTests. Feel free to add any non-static fields to this class.

Your test will be tested on Autolab on three different KeyValueStores: a correct KVStore (your test should pass on this implementation) and two incorrect KVStores (your test should fail on both of these implementations).

No marks will be awarded for empty tests, or tests that have no assertions (note that otherwise, these tests would all pass on our correct implementation). You will receive full marks only if your test successfully passes on the correct implementation and fails on the buggy versions.

Grading

Your assignment will be graded on a series of functional tests, making sure that it implements the specification above. Your test will be graded by running it on purposely buggy code and verifying that it throws some errors, and then manually inspecting them for completeness. We have provided a (not exhaustive) test suite which should help you judge (on your own) how well you have done with the functional requirements. We may add additional tests beyond what we are providing you with now, so please do not rely entirely on them.

In accordance with the “reasonable person principle,” we reserve the right to audit your code and correct any marks that are improperly assigned, for instance, due to your code incorrectly following the specification, but passing the test.  We would encourage you to spend your time correctly implementing the assignment, and not trying to force it to pass the test, even if it is not following the spirit of the assignment.

Hand In Instructions

You must turn in your assignment using Autolab (You MUST be on the campus network, or connected to the GMU VPN to connect to Autolab). If you did not receive a confirmation email from Autolab to set a password, enter your @gmu.edu (NOT @masonlive) email, and click “forgot password” to get a new password.

Create a zip file of only the src directory in your assignment (please: .zip, not .tgz or .7z etc). When you upload your assignment, Autolab will automatically compile and test it. You should verify that the result that Autolab generates is what you expect. Your code is built and tested in a Linux VM. Assignments that do not compile using our build script will receive a maximum of 50%. Note that we have provided ample resources for you to verify that our view of your assignment is the same as your own: you will see the result of the compilation and test execution for your assignment when you submit it.

You can resubmit your assignment an unlimited number of times before the deadline. Note the course late-submission policy: assignments will be accepted up until 24 hours past the deadline at a penalty of 10%; after 24 hours, no late assignments will be accepted, no exceptions.

Questions

Please post questions on Piazza

 

Contact