Debugging Java Bytecode Instrumentation

screenshot-2016-11-15-17-04-06If you’ve ever tried, or are planning to try instrumenting the JRE, and plan to instrument the entire JRE, you might have run up against some “fun” debugging challenges. For instance, you might have found out that it is possible to generate some bytecode that causes the JVM to produce a segmentation fault while starting. This can be really, really really annoying to debug if you aren’t aware of all of the great tools available. Here are some key tricks (going from most straightforward to least), all written down in one place. I’m aware that this isn’t so much of a how-to to use each of these approaches, as much as it is some pointers down the right path, as when I was trying to figure this all out I had a really tough time finding where to start.

ASM’s CheckClassAdapter

This handy ClassVisitor will do some basic verification of the bytecode you’re outputting. It won’t perform a COMPLETE verification (e.g. some code might pass it that still is wrong), but it will catch a lot, and it’s handy because you can delegate to it (before a ClassWriter, for instance), so you can see exactly where in your code is emiting the invalid code.

Remote Debugging

Do you appreciate the Eclipse debugger? Did you know that you can attach it to a remote process… for instance, one that you start in a crazy JVM with a ton of instrumentation? Yes, you can. This can be really helpful for simple debugging.

“Fastdebug” build of OpenJDK

There are a ton of flags available to you when you use a special version of OpenJDK that was compiled with debug support enabled. You can get this by building OpenJDK yourself, and doing ./configure –enable-debug. If you’re on OS X and don’t want to go through the big ball of fun that is building OpenJDK 8 on Mac OS X, here’s a binary build that I made and use myself (1.8.0_71). My absolute #1 favorite flag that you’ll get is -XX:+TraceExceptions. This flag will print out EVERY exception that occurs in the JVM, even if it’s caught and squelched by an app, and yes, even if you are causing an exception while printing an exception (ugh definitely an unpleasant way to crash the JVM). Example:

Exception <a 'java/lang/NoSuchFieldError': method resolution failed> (0x000000076af86a88)
thrown [/Users/jon/jdk8/hotspot/src/share/vm/prims/methodHandles.cpp, line 1146]
for thread 0x00007fccf9000800
Exception <a 'java/lang/NoSuchFieldError'> (0x000000076af86a88)
thrown in interpreter method <{method} {0x0000000122cb66a8} 'resolve' '(Ljava/lang/invoke/MemberName;Ljava/lang/Class;)Ljava/lang>
at bci 0 for thread 0x00007fccf9000800
Exception <a 'java/lang/NoSuchFieldError'> (0x000000076af86a88)
thrown in interpreter method <{method} {0x0000000122e70520} 'resolve' '(BLjava/lang/invoke/MemberName;Ljava/lang/Class;)Ljava/lan>
at bci 32 for thread 0x00007fccf9000800
Exception <a 'java/lang/NoSuchMethodException'> (0x000000076af88158)
thrown in interpreter method <{method} {0x0000000122e70650} 'resolveOrFail' '(BLjava/lang/invoke/MemberName;Ljava/lang/Class;Ljav>
at bci 51 for thread 0x00007fccf9000800
Exception <a 'java/lang/NoSuchMethodException'> (0x000000076af88158)
thrown in interpreter method <{method} {0x00000001277baef8} 'resolveOrFail' '(BLjava/lang/Class;Ljava/lang/String;Ljava/lang/invo>
at bci 44 for thread 0x00007fccf9000800
Exception <a 'java/lang/NoSuchMethodException'> (0x000000076af88158)
thrown in interpreter method <{method} {0x00000001277b9ac0} 'findStatic' '(Ljava/lang/Class;Ljava/lang/String;Ljava/lang/invoke/M>
at bci 6 for thread 0x00007fccf9000800
Exception <a 'java/lang/NoSuchMethodException'> (0x000000076af88158)
thrown in interpreter method <{method} {0x0000000122e65af8} 'findCollector' '(Ljava/lang/String;ILjava/lang/Class;[Ljava/lang/Cla>
at bci 23 for thread 0x00007fccf9000800
Exception <a 'java/lang/IllegalThreadStateException'> (0x000000076b0349a0)
thrown in interpreter method <{method} {0x0000000127733c98} 'exitValue' '()I' in 'java/lang/UNIXProcess'>
at bci 16 for thread 0x00007fccf9000800
Exception <a 'java/lang/IllegalThreadStateException'> (0x000000076b0349a0)
thrown in interpreter method <{method} {0x0000000127798168} '' '(Lorg/eclipse/debug/core/ILaunch;Ljava/lang/Process;Ljava/l>
at bci 36 for thread 0x00007fccf9000800


You also get -XX:+TraceBytecodes which will dump EVERY single bytecode out to the console as it’s being executed.


Javap, Krakatau and verification

You’ve probably already figured javap out already: the tool included with Java that disassembles .class files and prints out the bytecode in text. Krakatau is an awesome tool written in python that does this too, but ALSO WILL VERIFY YOUR CODE AND PRINT OUT DETAILED MESSAGES (MUST use an old version, e.g. 3724c05ba11ff6913c01ecdfe4fde6a0f246e5db)

. Here’s where this comes in handy. Sometimes the JVM gives us really helpful VerifyErrors, like here:

Exception in thread "main" java.lang.VerifyError: Bad type on operand stack
Exception Details:
Test.()V @26: checkcast
Type uninitializedThis (current frame, stack[0]) is not assignable to 'java/lang/Object'
Current Frame:
bci: @26
flags: { flagThisUninit }
locals: { uninitializedThis }
stack: { uninitializedThis }
0x0000000: b200 0b9a 0016 04b3 000b b200 0e9a 000c
0x0000010: 04b3 000e 1202 b800 142a c000 04b7 0016
0x0000020: b1
Stackmap Table:

From reading the error, we can probably figure out what’s going on, because we see exactly where the problem is: in class Test, method <init>(), at bytecode offset 26. On the other hand, sometimes, you might see a VerifyError like this:

java.lang.VerifyError: (class: org/apache/batik/ext/awt/image/rendered/PadRed, method: handleReplicate signature: (Ljava/awt/image/WritableRaster;)V) Incompatible argument to function

That’s really really unhelpful, because it turns out that this method, handleReplicate, is huge, and all that it tells us is that somewhere in this method, there’s an incompatible argument to a function. Why wouldn’t it give us all of the helpful information in this error that it does above (with the exact location, and expected stack frame)? We might try to do javap and look at this method and try to figure out what’s going on, but, there might be dozens or hundreds of call sites in it, and carefully inspecting each to see where the invalid call is is annoying.
Enter Krakatau: Instead of using javap to disassemble this class, let’s try using it:

$ python experiments/Krakatau/ -path target/Phosphor-0.0.2-SNAPSHOT.jar -path target/batik-inst/jar/batik-all.jar org.apache.batik.ext.awt.image.rendered.PadRed
... (output, eventually error as below) ...
817: invokevirtual(245)
Stack: java/awt/image/WritableRaster, .int, .int, .int, .int, .int, .int, .int, .int, edu/columbia/cs/psl/phosphor/struct/LazyArrayIntTags, .int[]
Locals: org/apache/batik/ext/awt/image/rendered/PadRed, java/awt/image/WritableRaster, org/apache/batik/ext/awt/image/rendered/CachableRed, java/awt/Rectangle, java/awt/Rectangle, .int, .int, .int, .int, .int, .int, .int, .int, .int, .int, .int, java/awt/Rectangle, .int, .int, .int, .int, .int, .int, java/awt/image/WritableRaster, .int, .int, .int, .int, .int, .int, .int

Now, we know specifically which invocation was causing the problem (at bytecode offset 817 in that method), and what the stack and locals are at that point that the verifier is calculating. Then, we can look in our instrumentation and see why we are generating this invalid code.

Dragons not to mess with

If you’re trying to instrument every class, you’ll quickly find that there are some things that you just can’t touch. The JVM has hardcoded offsets to some fields of some classes (namely, Object, Short, Byte, Boolean, StackTraceElement, perhaps a few others), and if you instrument these classes and this changes the layout of these fields, you’ll have a bad day. You might get around this by storing whatever auxiliary data you wanted for these types using JVMTI Object Tagging, or a WeakHashMap. Moreover, there are SOME things that you can do to these classes (you can definitely get away with adding a single boolean or byte field to Byte, Boolean, Short and Character, for instance…).

What are your Java instrumentation tips?

Do you have any other debugging techniques for Java bytecode instrumentation? Feel free to share in comments below!

Two Graduate Research Assistant (PhD) Positions in Software Engineering and Software Systems


George Mason University’s Engineering School


I am currently seeking two students interested in software engineering and software systems to join my research group as PhD students at George Mason University starting in Fall 2017. My research interests are in software engineering and software systems, focusing on approaches and tools to make it easier for developers to create reliable software. The positions are for fully funded (tuition and stipend) graduate research assistantships. There are two main projects that I am seeking students for:

Phosphor – Dynamic Dataflow Analysis in off the shelf JVMs


Taint Tracking

Dynamic taint tracking is a form of information flow analysis that identifies relationships between data during program execution. Inputs to the program are labeled with a marker (“tainted”), and these markers are propagated through data flow. Phosphor efficiently implements this dynamic analysis in regular JVMs. I’m seeking a student interested in program analysis who will expand Phosphor to support other, related dynamic analyses, and also to investigate combining Phosphor’s dynamic analysis with related static analyses.

Detecting Behaviorally Similar Code

This project investigates tools that detect similarly behaving code to aid in software engineering tasks. When software engineering researchers discuss “similar” code, they often mean code determined to be syntactically or structurally similar, known as “code clones.” In this project, we mean something different.  An emerging body of research has focused on detecting code that looks different, but behaves similarly: “behavioral clones.” But how do we match behaviorally similar code? Consider the four “magic” functions defined below that all sum the contents of an array (and possibly double it) – are these behavioral clones? How do we detect such behavioral clones? Some of my previous work has considered code that produces similar outputs for the same inputs (HitoshiIO), or have similar execution characteristics (Dyclink). I’m seeking a student to join this project who will extend my previous work in this area, implementing new behavioral similarity detection systems.


int magic1(int[] ar){
	int sum = 0;
	for(int i=0;i<ar.length;i++)
		sum += ar[i];
	return sum;
int magic2(int[] ar){
	int sum = 0;
	for(int i=0;i<ar.length;i++)
		sum += 2 * ar[i];
	return sum;
int magic3(int[] ar){
	int sum = 0;
	int i = 0;
			sum += ar[i++];
	catch(ArrayIndexOutOfBoundsException ex){}
	return sum;
int magic4(int[] ar){
	int sum = 0;
	int i = 0;
			sum += 2 * ar[i++];
	catch(ArrayIndexOutOfBoundsException ex){}
	return sum;

Application deadline:

The deadline for these positions is January 15, 2017. However, applications will be considered immediately upon receipt, so if you are interested, you are strongly encouraged to apply ASAP.

International students are very strongly encouraged to apply by November 15th to ensure timely processing of visa-related materials.


George Mason University is located in Fairfax, VA, approximately 20 minutes outside of Washington, DC.

Expected Skills and Qualifications:

Successful candidates will have a BS and/or MS in Computer Science, with a very strong background in Java. Ideally, the successful candidate would also have some degree of background in program analysis and software testing. Candidates with industrial experience are welcome.

How to apply & for more information:

Interested students should email Prof Jonathan Bell at [email protected] and include a brief statement describing their interest in one or both of the positions above, and a CV. You will also be required to submit the department’s application, which will require GRE, three letters of recommendation, and TOEFL for non-native english speakers. Interested students should email Prof Bell immediately to discuss the GRA position (before filling out the school application).

Java Bytecode and JVMTI Examples

Screen Shot 2015-10-08 at 2.16.41 PMA lot of my research has involved Java bytecode instrumentation with ASM and more recently, also JVMTI development. For instance, with VMVM, we instrumented bytecode to enable efficient java class reinitialization. With Phosphor, we instrumented bytecode to track metadata with every single variable, enabling performant and portable dynamic taint tracking. In ElectricTest, we combined bytecode instrumentation with JVMTI to efficiently detect data dependencies between test cases running within the same JVM.

As I built each of these projects, I leaned heavily on code snippets that I found across the internet – while I had done a lot of Java development before starting these projects, I’d done nothing with bytecode instrumentation. I found it particularly difficult to find examples of how to use JVMTI (aside from the basic man-pages, and a few excellent blog posts by Kelly O’Hair [1, 2, 3]).

To try to make it easier for others to use the same tools to build their own systems, I’m compiling some examples that I think might be useful here. I don’t intend for this to serve as a beginner’s resource – there are plenty of bytecode instrumentation tutorials out there – instead, I plan to collect some interesting examples (mostly related to JVMTI), that I think would be useful. If you have any particular requests, please let me know (as a comment here, or via email, or twitter).

Byte code rewriting can be used to change and insert instructions in code, and JVMTI can be used to interact with low level events in the JVM (such as objects being freed, garbage collection beginning, and allows you to assign a relatively efficient tag to any reference type (object or array). Each one of these examples has some interesting trick though, that I thought was worthy to share. Each one is a maven project, and you can build and run the tests with mvn verify (the JVMTI projects should work on Mac OS X and Linux, but are not configured to build on Windows – it’s possible to do but the scripts aren’t there). All of the examples are in a GitHub project. To import them in eclipse, first run mvn eclipse:eclipse in the project to generate eclipse project configuration files.

  1. Method Coverage recording – efficiently records per-method coverage information (e.g. which methods in an application under test are executed by each test). Byte code is instrumented dynamically as its loaded into the JVM (using a java agent). There is a local cache within each class that records whether a method was hit during the current test case, and a global collection that stores them too. This local + global cache is much more performant than just keeping a global cache, because when each method is executed we can first check a local boolean field (which is easily optimized by the JIT compiler), and if it hasn’t been hit, THEN we store the fact that the method was executed in a global set (which is relatively much more expensive).
  2. Static Instrumenter – applies byte code instrumentation statically, rather than at load time. This technique is needed if you want to instrument various core JRE classes that would be loaded already (and immutable) after your javaagent gets called.
  3. Heap Tagging – uses JVMTI and byte code instrumentation to allow you to apply an arbitrary object “label” to every reference type (objects or arrays). Doing this for many instances of classes (objects) is trivial: we just add a field to each class to store the tag, and generate some code to set and fetch it for each class (every class is made to implement the interface Tagged). However, you can’t do this for all classes – the constant pool offsets for some fields of some classes (like Object, Long, Byte, etc.); plus you can’t do this for arrays (which aren’t instances of classes). For this, we use JVMTI’s getTag and setTag functions. Tagger provides an abstraction to get and set the label of an object. The JVMTI code implementation is mostly book-keeping that makes sure that we don’t leak memory from these object labels. The JVMTI code is largely inspired by another excellent example by Kelly O’Hair.
  4. Heap Walking – uses JVMTI for a slightly contrived (but still somewhat interesting) example of heap walking and tagging. It crawls the heap (using FollowReferences), and for every object, builds a list of the static fields that can reach that object. After crawling, the library can return the list of static fields that point (perhaps indirectly) to the requested object. This example also shows off how to calculate the internal JVM field offsets for classes (which was a pain to write out my first time…).

Let me know (email, twitter, or comment on this blog post) if you have any questions or requests.

Test Dependencies and The Future of Build Acceleration

tl;dr Tests can depend on other tests’ execution which can cause them to unexpectedly fail. These dependencies crop up especially when developers try to speedup their slow test suites with test acceleration techniques (like test parallelization or selection). We have created efficient techniques for detecting these dependencies and isolating them, most recently published at FSE 2015, ElectricTest.


More Tests, More Problems

With the proliferation of testing culture, many developers are facing new challenges. As projects are getting started, the focus may be on developing enough tests to maintain confidence that the code is correct. However, as developers write more and more tests, performance and repeatability become growing concerns. As test suites grow, they take longer and longer to run, and it becomes harder to run them with every single change set.

In the projects that took the very longest to build (more than an hour), testing completely dominated the build process – on average, 90% of time was spent running tests!

In fact, we did a study of open source Java projects to assess the impact of testing on the overall build process. Our result was startling: in the 351 Maven+Java projects on GitHub that we built, testing consumed on average 41% of build time (that is, total time needed to run “mvn package”). In the projects that took the very longest to build (more than an hour), testing completely dominated the build process – on average, 90% of time was spent running tests! Clearly, running tests can take a very long time, and a reduction in testing time can be a big win for developers. Recent tools can help everyday testing, for instance, Milos Gligoric’s tool, Ekstazi, is a maven plugin that keeps track of which files are changed between each execution of the test suite, and which files each test depends on. If a developer makes just one change to the code, Ekstazi will automatically run only the tests that might have a different outcome because of that change. Tests can also be accelerated when running on multi core hardware, for instance, Maven and Ant (since 1.9.4) both now can run many tests in parallel.



Unfortunately, sometimes developers notice that some tests fail when running in parallel, or just running a subset of tests, even though they pass when they run the complete test suite normally! This can be caused by test dependencies.

While we might think that each test should be independent (i.e. that a test’s outcome isn’t influenced by the execution of another test), we and others have found many examples in real software projects where tests truly have these dependencies: running tests in a different order causes their outcome to change! If there are test dependencies, then we are forced to always run all tests together, in their normal order. This prevents us from accelerating our test suites with typical approaches.

Test Dependencies

Prior to our work, the best chance that developers had to detect these test dependencies was to just run the tests and hope for the best. Some tools, like Maven’s surefire plugin, allow developers to run tests in different orders for each run, which might expose a dependency, when tests run out of their normal order. Sai Zhang and his colleagues also developed a tool for JUnit, DTDetector, that executes every possible combination of tests and automatically reports tests that depend on each other. Unfortunately, neither of these approaches is very scalable, even for a project like joda-time, with tests that complete in less than a minute, it would take over 46 days to run all of the possible combinations of the tests.

ElectricTest reports all data dependencies between tests, indicating which need to be run in order 

My advisor, Gail Kaiser, and I joined up with ElectricCloud’s Eric Melski and Mohan Dattatreya to try to tackle this problem. ElectricCloud offers a tool that automatically and safely parallelizes long and complex Makefile-based software building, working with companies like Qualcomm, Cisco, and Huawei. To address this particular challenge of test dependency detection, we built ElectricTest, which detects data dependencies between tests with a reasonable overhead: only taking on average 20 times longer to run than running the test suite normally (this is over 1,000 times faster than the previous approach of running all pairs of all tests and observing their outputs, on average).  ElectricTest reports all data dependencies between tests, indicating which need to be run in order. ElectricTest can provide developers with a complete stack trace showing how the dependency occurred, allowing them to determine if it’s a valid dependency, or if it should be eliminated. ElectricTest can then automatically enable sound test parallelization or selection, by ensuring that tests with order dependencies are always executed in the correct sequence.

I’ll be presenting ElectricTest at ESEC/FSE in September, 2015 (preprint available). ElectricCloud is continuing to develop ElectricTest from our research prototype.

Test isolation: The Price of Reliability

The speedup in test execution when using VMVM instead of traditional isolation

The speedup in test execution when using VMVM instead of traditional isolation

When starting out from scratch (with no existing tests and no dependencies), one easy approach to avoiding dependencies is to simply isolate each test. If there is no way that one test can pollute the state of another, then there will never be dependencies. One way to isolate JUnit tests is to execute each in its own JVM. However, unit tests are often fairly quick (for instance, taking only 100’s of milliseconds), while the time to create a new JVM for each test is relatively steep: 1-2 seconds. Gail and I developed VMVM (ICSE ’14 paper, GitHub, demo), a tool for efficiently isolating test’s in-memory state without requiring a new JVM for each test. VMVM can reduce the time needed to run an isolated test suite by up to 97% (on average, 62% in our evaluation of 20 open source projects), and is available under an MIT license on GitHub.

Further Reading

Test All The Things

Test All The Things

Efficient Dependency Detection for Safe Java Test Acceleration (FSE 2015)
Jonathan Bell, Gail Kaiser, Eric Melski and Mohan Dattatreya
For more information on ElectricTest and efficiently detecting test dependencies.
Alex Gyori, August Shi, Farrah Hariri and Darko Marinov
Presents PolDet, a tool for detecting tests that might later become involved in test dependencies.
Unit Test Virtualization with VMVM (ICSE 2014, SIGSOFT Distinguished Paper Award)
Jonathan Bell and Gail Kaiser
For more information on VMVM and efficiently isolating tests.
Sai Zhang, Darioush Jalali, Jochen Wuttke, Kıvanç Mus ̧lu, Wing Lam, Michael D. Ernst and David Notkin
Zhang and colleagues studied the impact of test dependencies and present DTDetector, a tool for running pairwise combinations of tests to find dependencies.
Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov
A study of the various factors that might cause tests to behave erratically, and what developers do about them.