CS 475, Fall 2019, Homework 2, Due Oct 7 9:00am

Start by downloading the handout:

Download the handout

Please post questions on Piazza

Revision history:

  • Note 9/25: Your set , remove and   removeDirectory operations will need to add and remove items from directory listings, which they can do by retrieving a reference to the KeyValueStoreDirectory and then accessing the list of children with getChildren() and then eventually calling add or remove on that list.

This assignment will build on your knowledge of locking and concurrency control in Java, extending our KV Store to have a notion of key value pairs  (normal) and directories. Normally, KV stores are “dumb” about keys: there is no relationship between keys. However, some applications demand a notion of hierarchy more similar to file systems. That is, rather than have no relationship between keys, keys are organized like paths on a file system. In this scheme, the key “/a/b/c.txt” represents a file which is stored within a folder (the folder “/a/b/”), which is itself stored within a folder (“/a/”), which is itself stored within a folder (“/”). A user could query the KV store for the contents of “/”, which would return “/a/” — one of the directories that the key “/a/b/c.txt” is stored within. In this assignment, you will implement a concurrent KV store that automatically maintains directory structures for keys.

As in Homework 1, you will implement a simple KV store that operates entirely within memory, and on a single machine. However, the KV store will be accessed by multiple client threads, and hence, must maintain thread safety.

General requirements:

We have provided you a baseline implementation of the KV store that has various methods stubbed out, which you will implement. You may not include any additional libraries (e.g. download and and require additional JARs), although feel free to use any of the various libraries included with Java or already included by the starter project. Your project will be compiled and tested using Java 8, so please do not use any Java 9+ exclusive features.

Your KV Store will be compiled and tested using apache maven, which automatically executes the provided JUnit tests. Please install Maven on your computer. Unfortunately, Maven is not installed on Zeus, however you can download the most recent version (e.g. apache-maven-3.6.1-bin.zip) and unzip it in your home directory. Then, to run maven, type ~/apache-maven-3.6.1/bin/mvn in the snippets below (instead of just mvn). Note that you can easily import maven projects into eclipse and IntelliJ (we suggest IntelliJ over eclipse).

Your KV store will be automatically graded for correctness (note that there will be a manual grading phase to check hard-to-automatically-catch concurrency issues). Included with your handout is all of the automated tests that we will use to test your assignment. Upon submitting your assignment, our server will automatically compile and test your assignment and provide you with test results. We will also for this assignment use a state-of-the-art race detector to check for races in your program – this will run automatically in Autolab. You can resubmit 20 times before the deadline without penalty. To run these tests, simply execute mvn test (of course, if you do this first off, you’ll see that they all fail!)

Note: Your code must compile and run on the autograder, under Java 8. It is unlikely that you will have any difficulties developing on a Mac, Linux or Windows. When you feel satisfied with implementing one phase of the assignment, submit to AutoLab and verify that AutoLab agrees.

Academic honesty reminder: You may NOT share any of your code with anyone else. You may NOT post your code in a publicly viewable place (e.g. in a public GitHub repository).  You may face severe penalties for sharing your code, even “unintentionally.” Please review the course’s academic honesty policy.

Tips and References

You might find it useful to reference the official Java tutorial on concurrency, which does a nice job outlining at a high level how you can create threads and locks. If you find any useful references, please feel free to share them on Piazza and we will update this page as well.

Part 1: The Hierarchical Key Key-Value Store (32%)

To get started, you’ll implement the basic functionality of the key value store, and ensure that it works in a normal (single-threaded) environment.  This will act as something of a baseline for Java programming without concurrency. The tricky parts will come when we make this functionality thread-safe (in Part 2).

Unlike in Homework 1, this KV store will only allow get and set to be called on strings as values (no push or pop, hooray!). However, we’ll have some new methods and new semantics to handle hierarchical keys — and to handle those, you will maintain a new kind of structure – a Directory Node.

Your KVStore will deal in two types of keys: Directory Keys, which end in / and Value Keys, which do not end in /. Both directory and value keys will be internally managed in your code by calling _get and _set. The getset, and  remove operations operate only on non-directory keys, and the  listDirectory and  removeDirectory operations operate only on directory keys.  listKeys returns all keys (both those that are a directory and those that are not). All keys must start with a / – that is, all keys are (at least transitively) contained by the special directory /.

All objects stored in the AbstractKeyValueStore._get()  and _set() map will now be either a KeyValueStoreDirectory or a KeyValueStoreValue (instead of storing Strings directly, they will be wrapped in KeyValueStoreValue objects). These objects both track a ReentrantReadWriteLock for each key: you’ll need to use these locks in Part 2. You may not change the implementation of KeyValueStoreDirectory or KeyValueStoreValue.

Your KVStore will automatically create and maintain directory structures so that it can efficiently implement the  listDirectory and  removeDirectory functions — it must implement these functions without calling  listKeys. You’ll do this by creating a KeyValueStoreDirectory (which, in turn, has a list to keep track of the keys contained by that directory) for the directory key, and storing that object as the value for that directory key. Hence (assuming that the KVStore starts empty), calling  set("/foo/bar/b", "someValueToStoreHere") will result in the following directories being created:

  1. / (with a single child  /foo/)
  2. /foo/ (with a single child  /foo/bar/)
  3. /foo/bar/ (with with a single child  /foo/bar/b)

Hence, valid keys are now:  //foo//foo/bar/ and  /foo/bar/b

There will now be the following keys and value pairs in your KV store:

  1. / (a KeyValueStoreDirectory, with a single child /foo/)
  2. /foo/ (a KeyValueStoreDirectory, with a single child /foo/bar/)
  3. /foo/bar/ (a KeyValueStoreDirectory, with a single child /foo/bar/b)
  4.   /foo/bar/b (a KeyValueStoreValue, with value "someValueToStoreHere") .

To continue this example: after the set operation, we can work out what keys will happen if another set is called: set("/foo/bar/anotherDirectory/anotherFile", "otherValue"), the following new key and value pairs will be created by your KV store:

  1. The existing KeyValueStoreDirectory for /foo/bar/ will be updated, to now have the child  /foo/bar/b (from the prior call) and also now the child  /foo/bar/anotherDirectory/.
  2. /foo/bar/anotherDirectory/ (a KeyValueStoreDirectory, with a single child /foo/bar/anotherDirectory/anotherFile)
  3. /foo/bar/anotherDirectory/anotherFile (a KeyValueStoreValue, with value "otherValue")

You will still use the same interface to store keys and their values:  _set_get, _remove. Since the directory itself is a key, you must store the directory structure using  _set_get, and  _remove. Furthermore, each of the above directories must have their own entry (in the above example, you must call _set to create each of these directories).  Note that although your   get or set functions will not work for directories, directory contents will still be stored in and retrieved using the _get and _set functions. Note 9/25: Your set , remove and   removeDirectory operations will need to add and remove items from directory listings, which they can do by retrieving a reference to the KeyValueStoreDirectory and then accessing the list of children with getChildren() and then eventually calling add or remove on that list.

Your KVStore should implement the following invariants:

  1. All valid keys must have non-null representations in the underlying map (e.g. you will need to call  _set for every key, including nested directories that must be created).
  2. If a key exists, and that key is nested in a directory, then its directory parent must exist as well.

The primary implications of these invariants are that: (1) When creating (or checking to create) nested directories, you must always start at the “bottom” ( /), and work your way up; and (2) your directories must be stored “flat” (the children of directory  /foo/bar/ is stored in the KVStore in the key /foo/bar/).

For the first part of the assignment, you do not need to be meet thread safety requirements.

You will implement this behavior in  edu.gmu.cs475.KeyValueStore, specifically in the following six methods:

Note the following requirements:

  1. You should feel free to add whatever methods you would like to the  KeyValueStore class, but you must not change the names or parameters of the methods listed above or override any other methods of the AbstractKeyValueStore.
  2. You should feel free to add new fields to  KeyValueStore
  3. You can not use your own HashMap, but instead it is almost as easy: You must call  _get, _set, _remove, and  _listKeys in your  KeyValueStore to access the underlying data store. You can call these methods at most once in each of the KeyValueStore methods that you implement. Please do not call  super._set() and instead simply call as _set().

Precise grading breakdown:

  • Automated tests: 32 points
    • 8 JUnit tests, 4 points each

Part 2: Managing Thread Safety (68%)

Once you are satisfied that your key-value store works correctly in a single-threaded environment, your next task will be to add locking to it. You are free to use:  synchronizedReentrantLock,  ReentrantReadWriteLock, and Condition. Our reference solution uses ReentrantLock, ReentrantReadWriteLock and  Condition (although perhaps several of each!). Different from the last assignment, we have ensured that  _get_set,  _remove and  _listKeys will correctly uphold their contract regardless of the locking that you do: so, do not use a big lock around these methods again. You must store your locks in (non-static) fields of your KeyValueStore, and/or use ReentrantReadWriteLocks that each KeyValueStoreDirectory and KeyValueStoreValue have accessible. Do not store them in a static field, and do not store them using _set and _get.

Your implementation of get and set must not make duplicate calls to _get and _set. They may call _get and _set more than once, but they may not call them more than once with the same parameter – this is necessary to check and enforce linearizability in the tests.

Note the following concurrency requirements:

  1. For regular value operations ( get, set, remove): these should be able to occur concurrently if they have different keys. Multiple gets should be permitted for the same key concurrently, but set and  remove are exclusive operations (you must not allow two  set calls on the same key at once, or a  set and a  get, etc.). Hint: Consider the getValue(), _set() and _remove() methods to be your linearization points. This means that we will evaluate whether two set’s are concurrent based on whether the underlying _set calls are concurrent. This means that you must acquire whatever locks you need before calling getValue().
  2. For directory-level operations (list directory contents, remove empty directory and implicitly creating a directory or adding to its contents through set key): These should always be able to occur concurrently if they have different keys and if the two directories are not hierarchically nested (e.g. /my/path/to/key1/ and /my/path/to/key1/key2/ are nested, since key2 is a directory in / my/paty/to/key1/). These operations should be able to occur concurrently on two different keys as long as neither is the parent of the other key, and they do not share the same immediate parent. Multiple threads can concurrently list the contents of the same directory (or of nested directories), but these reads must not overlap concurrently with any write to those directory’s structure (e.g. creating a directory, adding a file to a directory, removing a directory).
  3. For global operations ( listKeys): It must be possible for multiple threads to concurrently call listKeys, but it must not be possible for any operation that creates or removes keys to occur concurrently. Hint for listKeys: Look back at our discussion of the Reader-Writer problem, and of ReadWriteLocks. This requirement is similar to the Reader-Writer problem (applied to some global lock on the KeyValueStore), where the writers are set and remove, and the readers are listKeys. Notice that in the normal Reader-Writer problem, we want to allow any number of readers OR a single writer access to the resource, but in this case, we need to allow any number of readers OR any number of writers.  If you find it helpful, you can create new classes in the edu.gmu.cs475 package to support your implementation.

Another way to think of these semantics, is that each resource that is shared (the listing of keys in each directory, and the value of each key itself) may be protected with its own lock. Multiple threads can concurrently read the same resource, but no two threads can write the same resource at once, or read/write the same resource at once. A single high-level operation, like set may touch multiple such resources: especially in the case of a nested key, where it is necessary for your code to also create the parent directories. Hence, you will have to have a lock for each key, and a single operation might require acquiring multiple locks.

Precise grading breakdown:

  • 12 JUnit tests, 5 points each
  • 8 points from manual grading, including checking for races

Grading

Your assignment will be graded on a series of functional tests, making sure that it implements the specification above. We have provided a (not exhaustive) test suite which should help you judge (on your own) how well you have done with the functional requirements. We may add additional tests beyond what we are providing you with now, so please do not rely entirely on them.

We have also provided you with feedback from the RV-Predict tool, running on your code. You can see this when you submit on Autolab , in the job details. RV-Predict will either report some races, or “No races found.”

In accordance with the “reasonable person principle,” we reserve the right to audit your code and correct any marks that are improperly assigned, for instance, due to your code incorrectly following the specification, but passing the test.  We would encourage you to spend your time correctly implementing the assignment, and not trying to force it to pass the test, even if it is not following the spirit of the assignment.

Hand In Instructions

You must turn in your assignment using Autolab (You MUST be on the campus network, or connected to the GMU VPN to connect to Autolab). If you did not receive a confirmation email from Autolab to set a password, enter your @gmu.edu (NOT @masonlive) email, and click “forgot password” to get a new password.

Create a zip file of only the  src directory in your assignment (please:  .zip, not  .tgz or  .7z etc). When you upload your assignment, Autolab will automatically compile and test it. You should verify that the result that Autolab generates is what you expect. Your code is built and tested in a Linux VM. Assignments that do not compile using our build script will receive a maximum of 50%. Note that we have provided ample resources for you to verify that our view of your assignment is the same as your own: you will see the result of the compilation and test execution for your assignment when you submit it.

You can resubmit your assignment an unlimited number of times before the deadline. Note the course late-submission policy: assignments will be accepted up until 24 hours past the deadline at a penalty of 10%; after 24 hours, no late assignments will be accepted, no exceptions.

Questions

Please post questions on Piazza

 

Contact