A lot of my research has involved Java bytecode instrumentation with ASM and more recently, also JVMTI development. For instance, with VMVM, we instrumented bytecode to enable efficient java class reinitialization. With Phosphor, we instrumented bytecode to track metadata with every single variable, enabling performant and portable dynamic taint tracking. In ElectricTest, we combined bytecode instrumentation with JVMTI to efficiently detect data dependencies between test cases running within the same JVM.
As I built each of these projects, I leaned heavily on code snippets that I found across the internet – while I had done a lot of Java development before starting these projects, I’d done nothing with bytecode instrumentation. I found it particularly difficult to find examples of how to use JVMTI (aside from the basic man-pages, and a few excellent blog posts by Kelly O’Hair [1, 2, 3]).
To try to make it easier for others to use the same tools to build their own systems, I’m compiling some examples that I think might be useful here. I don’t intend for this to serve as a beginner’s resource – there are plenty of bytecode instrumentation tutorials out there – instead, I plan to collect some interesting examples (mostly related to JVMTI), that I think would be useful. If you have any particular requests, please let me know (as a comment here, or via email, or twitter).
Byte code rewriting can be used to change and insert instructions in code, and JVMTI can be used to interact with low level events in the JVM (such as objects being freed, garbage collection beginning, and allows you to assign a relatively efficient tag to any reference type (object or array). Each one of these examples has some interesting trick though, that I thought was worthy to share. Each one is a maven project, and you can build and run the tests with mvn verify (the JVMTI projects should work on Mac OS X and Linux, but are not configured to build on Windows – it’s possible to do but the scripts aren’t there). All of the examples are in a GitHub project. To import them in eclipse, first run mvn eclipse:eclipse in the project to generate eclipse project configuration files.
- Method Coverage recording – efficiently records per-method coverage information (e.g. which methods in an application under test are executed by each test). Byte code is instrumented dynamically as its loaded into the JVM (using a java agent). There is a local cache within each class that records whether a method was hit during the current test case, and a global collection that stores them too. This local + global cache is much more performant than just keeping a global cache, because when each method is executed we can first check a local boolean field (which is easily optimized by the JIT compiler), and if it hasn’t been hit, THEN we store the fact that the method was executed in a global set (which is relatively much more expensive).
- Static Instrumenter – applies byte code instrumentation statically, rather than at load time. This technique is needed if you want to instrument various core JRE classes that would be loaded already (and immutable) after your javaagent gets called.
- Heap Tagging – uses JVMTI and byte code instrumentation to allow you to apply an arbitrary object “label” to every reference type (objects or arrays). Doing this for many instances of classes (objects) is trivial: we just add a field to each class to store the tag, and generate some code to set and fetch it for each class (every class is made to implement the interface Tagged). However, you can’t do this for all classes – the constant pool offsets for some fields of some classes (like Object, Long, Byte, etc.); plus you can’t do this for arrays (which aren’t instances of classes). For this, we use JVMTI’s getTag and setTag functions. Tagger provides an abstraction to get and set the label of an object. The JVMTI code implementation is mostly book-keeping that makes sure that we don’t leak memory from these object labels. The JVMTI code is largely inspired by another excellent example by Kelly O’Hair.
- Heap Walking – uses JVMTI for a slightly contrived (but still somewhat interesting) example of heap walking and tagging. It crawls the heap (using FollowReferences), and for every object, builds a list of the static fields that can reach that object. After crawling, the library can return the list of static fields that point (perhaps indirectly) to the requested object. This example also shows off how to calculate the internal JVM field offsets for classes (which was a pain to write out my first time…).