Software Engineering and Software Systems Researcher

Jon Bell

About CV

About Jon

Jon is an Assistant Professor directing research in Software Engineering and Software Systems at George Mason University. His research makes it easier for developers to create reliable software by improving software testing. Jon’s recent work in accelerating software testing has been recognized with an ACM SIGSOFT Distinguished Paper Award (ICSE ’14 – Unit Test Virtualization with VMVM), and has been the basis for an industrial collaboration with Electric Cloud. His program analysis research has resulted in several widely adopted runtime systems for the JVM, including the Phosphor taint tracking system (OOPSLA ’14) and CROCHET checkpoint/rollback tool (ECOOP ’18). Jon is committed to releasing software artifacts that accompany his research papers. His research interests bring him to publish at venues such as ICSE, FSE, ISSTA, OOPSLA, OSDI and EuroSys. Jon serves on a variety of program committees, and has been involved in the logistical organization of many recent OOPSLAs, most recently as publicity chair. He also co-organized the PL/SE mentoring workshop at SPLASH in 2017 and 2018.

His other interests include photography and cycling.

Jon is currently recruiting exceptional students at all levels (undergrad, masters and PhD) to join his research group. If you are interested, please send him an email.

More information about Jon, including a complete publications listing, is available in his Curriculum Vitae.


Research Overview

I apply a systems perspective to software engineering challenges, observing the issues that developer face when creating reliable software, and then designing new mechanisms to support developers. My research focuses on improving existing developer-written tests, making them run faster and more reliably while amplifying them to be more comprehensive and also tracking their overall quality. Rather than focus solely on finding a handful of high-value “million dollar bugs” in a small pool of high-assurance software, my research aims to have very broad impacts, helping everyday developers just as much as experts.

In developing new techniques and methodologies to address these real-world developer challenges, I often find that my research contributions span across several different fields, yielding publications in program analysis venues like OOPSLA and ECOOP, software testing venues like ICST, and software systems venues like EuroSys and OSDI — in addition to my home community of software engineering (ICSE, FSE and ASE). In addition to publications, I believe that it is important to disseminate code artifacts, and have made all of my tools publicly available (licensing restrictions permitting).

An up-to-date listing of my publications can be found online, or in my CV and Google Scholar.

You can read more about several selected research projects below:

Continuous integration (CI) aims to improve modern software development by automating software compilation and regression testing. Recent studies report that CI helps developers deploy faster and reduce development cost. Given these success stories, CI has attracted rapidly increasing interest and adoption, e.g., Travis CI is used by over 300,000 GitHub projects. Despite the success of CI, developers report they would like to see improvements in CI. First, they want to faster obtain regression test results. Second, they want better handling of flaky tests, which are regression tests that can non-deterministically pass or fail, and whose failures negatively affect developer’s productivity. Third, developers report that CI builds do not provide sufficient debugging assistance.

Much of my recent work in this project has focused specifically on flaky tests. For most modern applications, which are expected to behave nondeterministically, it is impossible to truly test them in a purely deterministic way, and hence, flaky tests (which pass or fail nondeterministically) are inevitable. While some developers might struggle to eliminate flakiness in tests by replacing nondeterministic program behavior with (deterministic) mocks and stubs, they will never truly represent the full range of possible application behaviors, and hence, may not be effective tests. Hence, what developers need is an effective way to gauge confidence in both the outcome of each test case (for a given execution) and also in the overall quality of their test suite.

DeFlaker begins to bridge this gap – DeFlaker can mark a test execution as flaky without requiring any reruns. DeFlaker combines historical test results with code coverage and code revision information, marking a test as flaky if its outcome changes (from pass to fail or fail to pass) without executing any code that changed since its prior run. DeFlaker imposes only a very minimal performance overhead by tracking hybrid differential coverage, tracking coverage of only changed code, and combining multiple granularities of coverage (statement and file-level). DeFlaker can be trivially added to a Maven-based project, and we evaluated it in the context of a hundred projects using TravisCI.

We have built one-of-a-kind JVM-based runtime systems for dynamic taint tracking and checkpoint-rollback that have enabled many new research contributions in software engineering and security. These systems-oriented contributions answer engineering problems that have arisen while we have been working to solve (developer-facing) software engineering problems. Both of these systems are designed to be extremely portable (using only public APIs to interface with the JVM) and extremely performant, allowing them to be embedded as a part of a larger tool (in our ongoing and future work).

Dynamic taint tracking is a form of information flow analysis that identifies relationships between data during program execution. Inputs to the program are labeled with a marker (tainted”), and these markers are propagated through data flow. Traditionally, dynamic taint tracking is used for information flow control, or detection of code-injection attacks. Without a performant, portable, and accurate tool for performing dynamic taint tracking in Java, software engineering research can be restricted. In Java, associating metadata (such as tags) with arbitrary variables is very difficult: previous techniques have relied on customized JVMs or symbolic execution environments to maintain this mapping, limiting their portability and restricting their application to large and complex real-world software. To close this gap, we created Phosphor (OOPSLA 2014), which provides taint tracking within the Java Virtual Machine (JVM) without requiring any modifications to the language interpreter, VM, or operating system, and without requiring any access to source code.

Checkpoint/rollback (CR) tools capture the state of an application and store it in some serialized form, allowing the application to later resume execution by returning to that same state. CR tools have been employed to support many tasks, including fault tolerance, input generation and testing, and process migration. Prior work in JVM checkpointing required a specialized, custom JVM, making them difficult to use in practice.  Our goal is to provide efficient, fine-grained, and incremental checkpoint support within the JVM, using only commercial, stock, off-the-shelf, state-of-the-art JVMs (e.g. Oracle HotSpot and OpenJDK). Guided by key insights into the JVM Just-In-Time (JIT) compiler behavior and the typical object memory layout, we created CROCHET: Checkpoint ROllbaCk with lightweight HEap Traversal for the JVM (ECOOP 2018). CROCHET is a system for in-JVM checkpoint and rollback, providing copy-on-access semantics for individual variables (on the heap and stack) that imposes very low steady-state overhead and requires no modifications to the JVM. 

Statement coverage is commonly used as a measure of test suite quality. Coverage is often used as a part of a code review process: if a patch decreases overall coverage, or is itself not covered, then the patch is scrutinized more closely. Traditional studies of how coverage changes with code evolution have examined the overall coverage of the entire program, and more recent work directly examines the coverage of patches (changed statements). We performed an evaluation much larger than prior studies and moreover consider a new, important kind of change — coverage changes of unchanged statements. At ASE 2018, we presented a large-scale evaluation of code coverage evolution over 7,816 builds of 47 projects written in popular languages including Java, Python, and Scala. We found that in large, mature projects, simply measuring the change to statement coverage does not capture the nuances of code evolution. Going beyond considering statement coverage as a simple ratio, we examined how the set of statements covered evolves between project revisions. We have presented and studied new ways to assess the impact of a patch on a project’s test suite quality that both separates coverage of the patch from coverage of the non-patch, and separates changes in coverage from changes in the set of statements covered.

Slow builds remain a plague for software developers. The frequency with which code can be built (compiled, tested and packaged) directly impacts the productivity of developers: longer build times mean a longer wait before determining if a change to the application being built was successful. We have discovered that in the case of some languages, such as Java, the majority of build time is spent running tests, where dependencies between individual tests are complicated to discover, making many existing test acceleration techniques unsound to deploy in practice. In my first approach to accelerate testing, Unit Test Virtualization (published at ICSE where it received a distinguished paper award), we looked at ways to speed up testing in projects that isolate the in-memory state of each test case in an attempt to prevent dependencies from occurring. However: for projects that do not isolate their tests, not only would VMVM not be applicable, but out-of-the-box test acceleration techniques such as test selection or test parallelization would be unsound. When dependencies go unnoticed, tests can unexpectedly fail when executed out of order, causing unreliable builds. My second approach, ElectricTest (published at FSE), identifies data dependencies between test cases, allowing for automatic and sound test acceleration. While this approach was sound (that is, it never misses a possible dependency), it is not very precise, which may over-constraint test selection and parallelization. Extending ElectricTest, I collaborated with Alessio Gambi and Andreas Zeller to create PraDet (ICST), an approach that refines ElectricTest’s dependency results.


I teach undergraduate and graduate Software Engineering and Systems classes at George Mason University, and make all of my teaching materials publicly available in the hope that they may be useful to others (outside of my institution).

CS 475: Concurrent and Distributed Systems (Undergraduate, Spring 2019)

CS 475: Concurrent and Distributed Systems (Undergraduate, Fall 2019)

SWE 432: Design and Implementation of Software for the Web (Undergraduate, Fall 2018)
CS 475: Concurrent and Distributed Systems (Undergraduate, Spring 2018)
CS/SWE 795: Program Analysis for Software Testing (Graduate, Fall 2017)
SWE 622: Distributed Software Engineering (Graduate, Spring 2017)
SWE 432: Design and Implementation of Software for the Web (Undergraduate, Fall 2016)



A complete up-to-date list of my service activities such as journal reviewing (e.g. Empirical Software Engineering and others), program committee membership (e.g. ASE, ICST and others) and funding evaluation (e.g. NSF panels) is maintained in my CV. I serve as faculty advisor to GMU’s Student-Run Computing and Technology (SRCT) group, an undergraduate organization that supports students interested in computing. I have (in 2017 and 2018) and will continue (in 2019) to co-organize the NSF-supported undergraduate Programming Languages Mentoring Workshop (PLMW) at SPLASH, an event focusing on broadening the participation of underrepresented groups in PL and SE research. In the two years that I have been involved in the organization (2017 and 2018), PLMW has reached an incredibly diverse audience and made a demonstrable impact on undergraduate students’ decisions to pursue graduate school in computer science.