Software Engineering and Software Systems Researcher

Jon Bell

About CV

About Jon

Jon is an Assistant Professor directing research in Software Engineering and Software Systems at Northeastern University. His research makes it easier for developers to create reliable and secure software by improving software testing and program analysis. Jon’s work on accelerating software testing has been recognized with an ACM SIGSOFT Distinguished Paper Award (ICSE ’14 – Unit Test Virtualization with VMVM), and was the basis for an industrial collaboration with Electric Cloud. His research in flaky tests have led to open source contributions to the Maven build system and Pit mutation testing framework. His program analysis research has resulted in several widely adopted runtime systems for the JVM, including the Phosphor taint tracking system (OOPSLA ’14) and CROCHET checkpoint/rollback tool (ECOOP ’18). His contributions to the object-oriented programming community were recognized with the 2020 Dahl-Nygaard Junior Researcher Prize, and he was invited to give a keynote address at SPLASH on this work. His research has been funded by the NSA and the NSF, and he is the recipient of the NSF CAREER award.

At Northeastern, Jon teaches Software Engineering, and previously, at George Mason, Jon received a university-wide Teacher of Distinction award for his courses in distributed systems, web development, and program analysis. Jon serves on a variety of program committees and was recently co-chair of the PLDI 2020 Artifact Evaluation Committee. As part of his efforts to broaden the participation of underrepresented groups in computing, Jon co-organizes the PL/SE mentoring workshop at SPLASH (in 2017, 2018, 2019 and 2020). In Summer 2020, Jon co-founded the Clowdr open source project to help support virtual academic conferences, and subsequently co-founded a startup to provide paid support and development for the project.

His other interests include photography, cooking and cycling.

Jon is currently recruiting exceptional students at all levels (undergrad, masters and PhD) to join his research group. If you are interested, please send him an email.

More information about Jon, including a complete publications listing, is available in his Curriculum Vitae.


Research Overview

I apply a systems perspective to software engineering challenges, observing the issues that developer face when creating reliable software, and then designing new mechanisms to support developers. My research focuses on improving existing developer-written tests, making them run faster and more reliably while amplifying them to be more comprehensive and also tracking their overall quality. Rather than focus solely on finding a handful of high-value “million dollar bugs” in a small pool of high-assurance software, my research aims to have very broad impacts, helping everyday developers just as much as experts.

In developing new techniques and methodologies to address these real-world developer challenges, I often find that my research contributions span across several different fields, yielding publications in program analysis venues like OOPSLA and ECOOP, software testing venues like ICST, and software systems venues like EuroSys and OSDI — in addition to my home community of software engineering (ICSE, FSE and ASE). In addition to publications, I believe that it is important to disseminate code artifacts, and have made all of my tools publicly available (licensing restrictions permitting).

An up-to-date listing of my publications can be found online, or in my CV and Google Scholar.

You can read more about several selected research projects below:

Testing for Security Vulnerabilities

Code injection vulnerabilities have been exploited in repeated attacks on US election systems, in the theft of sensitive financial data, and in the theft of millions of credit card numbers. My collaborators and I have created a new approach for detecting injection vulnerabilities in applications by harnessing the combined power of both human developers' test suites and automated dynamic analysis. Our new approach, RIVULET, monitors the execution of developer-written functional tests in order to detect information flows that may be vulnerable to attack (using my taint tracking system, Phosphor). Then, RIVULET uses a white-box test generation technique to repurpose those functional tests to check if any vulnerable flow could be exploited. When applied to the version of Apache Struts exploited in the 2017 Equifax attack, RIVULET quickly identifies the vulnerability, leveraging only the tests that existed in Struts at that time. We compared RIVULET to the state-of-the-art static vulnerability detector Julia on benchmarks, finding that RIVULET outperformed Julia in both false positives and false negatives. We also used RIVULET to detect previously unknown vulnerabilities in Jenkins and iTrust.

Our ongoing work in this area aims to detect more kinds of vulnerabilities with even less reliance on developer-provided tests.

Flaky Tests

Whenever a developer pushes some changes to a repository, tests are run to check whether the changes broke some functionality. Ideally, every new test failure would be due to the latest changes that the developer made and the developer could focus on debugging these failures. Unfortunately, some failures are not due to the latest changes, but due to flaky tests. A flaky test is a test that can non-deterministically pass or fail when run on the same version of the code --- flaky tests might also pass when they should have failed. For most modern applications, flaky tests are inevitable. For instance: consider a system test at Google that involves loading a page that has an ad embedded in it. If the ad serving system is overloaded and unable to serve an ad within a time limit, the test might be served the page without any ads. In this case, the test runner may not be able to distinguish between a broken ad server (which might not serve ads to any client), and a functional ad server that might simply have dropped the request.

My work in flaky tests began with the problem of test order dependencies: tests that can unexpectedly fail if they are run in a different order. This early work considered how to efficiently isolate tests to prevent this flakiness and how to precisely detect which tests depend on each other (ElectricTest and PraDet), allowing developers to detect which orderings will result in flakiness. Looking at flaky tests more broadly than just test order dependencies, my collaborators and I created DeFlaker, which can mark test outcomes as flaky immediately upon failure (without rerunning it) by using code coverage results. Further considering the relationship between coverage and flaky tests, we conducted a very large scale analysis of code coverage, looking to see how coverage of individual statements may be non-deterministic and changes over time.

If tests non-deterministically cover different statements, then downstream testing techniques like mutation testing, program repair, and fault localization can be confounded. We found that even when tests do not appear flaky (e.g. their outcome is always "pass"), the set of lines covered by each test may vary non-deterministically (we found 22% of statements across 30 projects to have flaky coverage). As we reported in our ISSTA 2019 paper, this change in coverage can result in a wide variance in mutation scores.

Most recently, we performed a longitudinal study of test flakiness, tracing the origin of 245 flaky tests, and presented this study at OOPSLA 2020. We evaluated all of the revisions of each project containing each flaky test from the revision that first introduced the test until the revision in which the test was first flaky. We found that 75% of the tests that we studied were flaky when they were first added to the project: 25% of tests became flaky only after they were first added to the project.

We have several active projects underway that aim to help developers cope with flaky tests.

Runtime Systems for Security and Testing

We have built one-of-a-kind JVM-based runtime systems for dynamic taint tracking and checkpoint-rollback that have enabled many new research contributions in software engineering and security. These systems-oriented contributions answer engineering problems that have arisen while we have been working to solve (developer-facing) software engineering problems. Both of these systems are designed to be extremely portable (using only public APIs to interface with the JVM) and extremely performant, allowing them to be embedded as a part of a larger tool (in our ongoing and future work).

Dynamic taint tracking is a form of information flow analysis that identifies relationships between data during program execution. Inputs to the program are labeled with a marker (``tainted''), and these markers are propagated through data flow. Traditionally, dynamic taint tracking is used for information flow control, or detection of code-injection attacks. Without a performant, portable, and accurate tool for performing dynamic taint tracking in Java, software engineering research can be restricted. In Java, associating metadata (such as tags) with arbitrary variables is very difficult: previous techniques have relied on customized JVMs or symbolic execution environments to maintain this mapping, limiting their portability and restricting their application to large and complex real-world software. To close this gap, we created Phosphor (OOPSLA 2014), which provides taint tracking within the Java Virtual Machine (JVM) without requiring any modifications to the language interpreter, VM, or operating system, and without requiring any access to source code.

Checkpoint/rollback (CR) tools capture the state of an application and store it in some serialized form, allowing the application to later resume execution by returning to that same state. CR tools have been employed to support many tasks, including fault tolerance, input generation and testing, and process migration. Prior work in JVM checkpointing required a specialized, custom JVM, making them difficult to use in practice.  Our goal is to provide efficient, fine-grained, and incremental checkpoint support within the JVM, using only commercial, stock, off-the-shelf, state-of-the-art JVMs (e.g. Oracle HotSpot and OpenJDK). Guided by key insights into the JVM Just-In-Time (JIT) compiler behavior and the typical object memory layout, we created CROCHET: Checkpoint ROllbaCk with lightweight HEap Traversal for the JVM (ECOOP 2018). CROCHET is a system for in-JVM checkpoint and rollback, providing copy-on-access semantics for individual variables (on the heap and stack) that imposes very low steady-state overhead and requires no modifications to the JVM.



I teach undergraduate and graduate Software Engineering and Systems classes, and make all of my teaching materials publicly available in the hope that they may be useful to others.


CS4530: Fundamentals of Software Engineering (Undergraduate, Spring 2021)

Past (At Northeastern):
CS4530/5500: Fundamentals/Foundations of Software Engineering (Undergraduate/MS, Fall 2020)

Past (At George Mason):

CS 475: Concurrent and Distributed Systems (Undergraduate, Fall 2019)
CS 475: Concurrent and Distributed Systems (Undergraduate, Spring 2019)
SWE 432: Design and Implementation of Software for the Web (Undergraduate, Fall 2018)
CS 475: Concurrent and Distributed Systems (Undergraduate, Spring 2018)
CS/SWE 795: Program Analysis for Software Testing (Graduate, Fall 2017)
SWE 622: Distributed Software Engineering (Graduate, Spring 2017)
SWE 432: Design and Implementation of Software for the Web (Undergraduate, Fall 2016)



A complete up-to-date list of my service activities such as journal reviewing (e.g. TOSEM, Empirical Software Engineering, and others), program committee membership (e.g. ASE, ICSE and others) and funding evaluation (e.g. NSF panels) is maintained in my CV. I have served as faculty advisor to GMU’s Student-Run Computing and Technology (SRCT) group, an undergraduate organization that supports students interested in computing. I have (in 2017, 2018, 2019, 2020) and will continue to co-organize the NSF-supported undergraduate Programming Languages Mentoring Workshop (PLMW) at SPLASH, an event focusing on broadening the participation of underrepresented groups in PL and SE research. In the years that I have been involved in the organization (2017, 2018, 2019 and 2020), PLMW has reached an incredibly diverse audience and made a demonstrable impact on undergraduate students’ decisions to pursue graduate school in computer science.

In Summer 2020, when all of our academic conferences needed to quickly pivot from being in-person events to online events, I co-founded an open source project with Crista Lopes and Benjamin Pierce to provide a software solution to support our communities. What began as a stop-gap measure to organize ICSE and ICFP turned into a healthy open source project that powered CSCW and SPLASH. To better meet the needs of academic conference organizers (who are themselves volunteers), we launched a company (specifically, a UK-based Community Interest Company) with full-time developers who deploy and operate our open source conference software for events.