“Continuous Integration” (CI) is now a standard software engineering practice, allowing large test suites to be automatically executed in the cloud.
Rather than run these large test suites on a weekly or monthly basis, they can be run as frequently as on every single change to the code under test, with complete traceability between test results and the code that generated them.
At a cost of just a few dollars per-run, engineers benefit from faster feedback, identifying bugs and performance regressions sooner and optimizing productivity.
But, CI is also a relatively new process, and it brings many challenges.
I study the problems that open-source software developers face when building software with continuous integration, design novel approaches to address those problems, and validate those solutions on the real-world problem.
I release all of my tools and datasets under an open source license, and have integrated research findings into existing, popular open source software used for testing Java programs.
Flaky Tests: One major problem faced by developers who adopt CI arises from tests that can seemingly fail entirely unexpectedly and often randomly, so called “flaky tests.”
Flaky tests undermine efficiency with CI, because developers cannot easily determine when a test failure is due to their recent changes or due to flakiness.
My earliest work in flaky tests studied flaky tests caused by dependencies on the execution orders of tests(missing reference).
My research in flaky tests has since grown to include broader empirical studies of the phenomena, answering research questions like: “What kinds of code changes cause tests to become flaky?” (Lam, Winter, Wei, Xie, Marinov, and Bell, 2020) and “How can we quickly determine if a test failure is flaky, or a true failure we should investigate?” (Bell, Legunsen, Hilton, Eloussi, Yung, and Marinov, 2018).
To answer these questions, I created “software archaeology” experiments, building software to automatically examine thousands of revisions of dozens of open-source software projects, repeatedly executing each test suite to identify and profile flaky tests (Alshammari, Morris, Hilton, and Bell, 2021).
This dataset of flaky tests has spawned a fast-growing area of research applying machine learning techniques to predict which tests are likely to be flaky.
My ongoing research studies the causes of flaky test failures, allowing us to create new approaches to help developers understand and repair them faster.
Fuzzing: Simply having computing resources does not solve testing problems if developers’ test suites are not sufficiently thorough.
Automated test generation, or “fuzz testing” aims to create a diverse suite of inputs that reveal interesting program behavior, including crashes and security vulnerabilities.
I design new techniques to analyze the behavior of programs while they run, extending program analyses like “dynamic taint tracking” to precisely identify interactions between inputs and outputs (Bell and Kaiser, 2014; Hough and Bell, 2021).
I have used these new analyses to design new approaches to detect code injection vulnerabilities in JVM software (Hough, Welearegai, Hammer, and Bell, 2020) and to generate better tests to reveal otherwise untested program behavior (Kukucka, Pina, Ammann, and Bell, 2022).
Fuzz testing relies on evolutionary algorithms, and it can be difficult to understand why a fuzzer does or doesn’t execute some code path.
My ongoing research studies the scientific foundations for fuzz testing, creating new methods for evaluating and improving fuzzers’ designs.
Software Supply Chain: Modern software built in languages like Java, Python and TypeScript increasingly relies on open-source libraries.
Developers who utilize third-party libraries must ensure that they are kept up-to-date and ideally, ensure that those libraries have no security vulnerabilities.
Developers who maintain those libraries, in turn, must ensure that any new releases that break backwards compatibility are clearly documented.
While developers inside of large companies rely on CI processes to automatically test updates for compatibility and to examine updates for security vulnerabilities, the resources needed to create such an infrastructure for the open-source community are tremendous.
My ultimate objective is to create an open-source observatory, analyzing new libraries and updates to existing libraries, detecting breaking changes and security vulnerabilities even before developers publish a release.
My first project in this area focused on formalizing the semantics of dependency resolvers, creating a replacement for NPM that allows developers to reduce bloat and vulnerabilities by optimizing the versions of each dependency (Pinckney, Cassano, Guha, Bell, Culo, and Gamblin, 2023).
Towards understanding the propagation of breaking changes, vulnerabilities, and fixes, we built a massive dataset consisting of all 28,941,927 versions of all 2,663,681 packages on NPM (Pinckney, Cassano, Guha, and Bell, 2023).
This 20TB dataset is “live” and is constantly updated as new packages are published on NPM, and is a significant contribution in itself.
My ongoing research studies automated approaches for detecting updates that introduce vulnerabilities and/or breaking changes.
Flexible and Optimal Dependency Management via Max-SMT. Donald Pinckney, Federico Cassano, Arjun Guha, Jonathan Bell, Massimiliano Culo, Todd Gamblin. Proceedings of the 2023 International Conference on Software Engineering, 2023 [pdf]. [artifact].
A Large Scale Analysis of Semantic Versioning in NPM. Donald Pinckney, Federico Cassano, Arjun Guha, Jonathan Bell. Proceedings of the 20th International Conference on Mining Software Repositories, 2023 [pdf]. [artifact].
CONFETTI: Amplifying Concolic Guidance for Fuzzers. James Kukucka, Luis Pina, Paul Ammann, Jonathan Bell. Proceedings of the 2022 International Conference on Software Engineering, 2022 [pdf]. [artifact]. [code/git].
A Practical Approach for Dynamic Taint Tracking with Control-flow Relationships. Katherine Hough, Jonathan Bell. ACM Transactions on Software Engineering and Methodology. 2021 ; 31(2). [pdf]. [code/git].
FlakeFlagger: Predicting Flakiness Without Rerunning Tests. Abdulrahman Alshammari, Christopher Morris, Michael Hilton, Jonathan Bell. Proceedings of the 2021 International Conference on Software Engineering, 2021 [pdf]. [artifact]. [code/git].
A Large-Scale Longitudinal Study of Flaky Tests. Wing Lam, Stefan Winter, Anjiang Wei, Tao Xie, Darko Marinov, Jonathan Bell. Proceedings of the ACM on Programming Languages. 2020 ; 3(OOPSLA). [pdf]. [artifact].
Revealing Injection Vulnerabilities by Leveraging Existing Tests. Katherine Hough, Gere Welearegai, Christian Hammer, Jonathan Bell. Proceedings of the 2020 International Conference on Software Engineering, 2020 [pdf]. [artifact]. [code/git].
DeFlaker: Automatically Detecting Flaky Tests. Jonathan Bell, Owolabi Legunsen, Michael Hilton, Lamyaa Eloussi, Tifany Yung, Darko Marinov. Proceedings of the 2018 International Conference on Software Engineering, 2018 [pdf]. [code/git].
Phosphor: Illuminating Dynamic Data Flow in Off-The Shelf JVMs. Jonathan Bell, Gail Kaiser. Proceeding of the 29th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications, 2014 [pdf]. [artifact]. [code/git].
A complete list of my publications is also available.