“Continuous Integration” (CI) is now a standard software engineering practice, allowing large test suites to be automatically executed in the cloud.
Rather than run these large test suites on a weekly or monthly basis, they can be run as frequently as on every single change to the code under test, with complete traceability between test results and the code that generated them.
At a cost of just a few dollars per-run, engineers benefit from faster feedback, identifying bugs and performance regressions sooner and optimizing productivity.
But, CI is also a relatively new process, and it brings many challenges.
I study the problems that open-source software developers face when building software with continuous integration, design novel approaches to address those problems, and validate those solutions on the real-world problem.
I release all of my tools and datasets under an open source license, and have integrated research findings into existing, popular open source software used for testing Java programs.
One major problem faced by developers who adopt CI arises from tests that can seemingly fail entirely unexpectedly and often randomly, so called “flaky tests.”
Flaky tests undermine efficiency with CI, because developers cannot easily determine when a test failure is due to their recent changes or due to flakiness.
My earliest work in flaky tests studied flaky tests caused by dependencies on the execution orders of tests(missing reference).
My research in flaky tests has since grown to include broader empirical studies of the phenomena, answering research questions like: “What kinds of code changes cause tests to become flaky?” (Lam, Winter, Wei, Xie, Marinov, and Bell, 2020) and “How can we quickly determine if a test failure is flaky, or a true failure we should investigate?” (Bell, Legunsen, Hilton, Eloussi, Yung, and Marinov, 2018).
To answer these questions, I created “software archaeology” experiments, building software to automatically examine thousands of revisions of dozens of open-source software projects, repeatedly executing each test suite to identify and profile flaky tests (Alshammari, Morris, Hilton, and Bell, 2021).
This dataset of flaky tests has spawned a fast-growing area of research applying machine learning techniques to predict which tests are likely to be flaky.
My ongoing research studies the causes of flaky test failures, allowing us to create new approaches to help developers understand and repair them faster.
Simply having computing resources does not solve testing problems if developers’ test suites are not sufficiently thorough.
%Software is still released with vulnerabilities and defects: manual testing is insufficient.
Automated test generation, or “fuzz testing” aims to create a diverse suite of inputs that reveal interesting program behavior, including crashes and security vulnerabilities.
I design new techniques to analyze the behavior of programs while they run, extending program analyses like “dynamic taint tracking” to precisely identify interactions between inputs and outputs (Bell and Kaiser, 2014; Hough and Bell, 2021).
I have used these new analyses to design new approaches to detect code injection vulnerabilities in JVM software (Hough, Welearegai, Hammer, and Bell, 2020) and to generate better tests to reveal otherwise untested program behavior (Kukucka, Pina, Ammann, and Bell, 2022).
Fuzz testing relies on evolutionary algorithms, and it can be difficult to understand why a fuzzer does or doesn’t execute some code path.
My ongoing research studies the scientific foundations for fuzz testing, creating new methods for evaluating and improving fuzzers’ designs.
CONFETTI: Amplifying Concolic Guidance for Fuzzers. James Kukucka, Luis Pina, Paul Ammann, Jonathan Bell. Proceedings of the 2022 International Conference on Software Engineering, 2022 [pdf]. [artifact]. [code/git].
A Practical Approach for Dynamic Taint Tracking with Control-flow Relationships. Katherine Hough, Jonathan Bell. ACM Transactions on Software Engineering and Methodology. 2021 ; 31(2). [pdf]. [code/git].
FlakeFlagger: Predicting Flakiness Without Rerunning Tests. Abdulrahman Alshammari, Christopher Morris, Michael Hilton, Jonathan Bell. Proceedings of the 2021 International Conference on Software Engineering, 2021 [pdf]. [artifact]. [code/git].
A Large-Scale Longitudinal Study of Flaky Tests. Wing Lam, Stefan Winter, Anjiang Wei, Tao Xie, Darko Marinov, Jonathan Bell. Proceedings of the ACM on Programming Languages. 2020 ; 3(OOPSLA). [pdf]. [artifact].
Revealing Injection Vulnerabilities by Leveraging Existing Tests. Katherine Hough, Gere Welearegai, Christian Hammer, Jonathan Bell. Proceedings of the 2020 International Conference on Software Engineering, 2020 [pdf]. [artifact]. [code/git].
DeFlaker: Automatically Detecting Flaky Tests. Jonathan Bell, Owolabi Legunsen, Michael Hilton, Lamyaa Eloussi, Tifany Yung, Darko Marinov. Proceedings of the 2018 International Conference on Software Engineering, 2018 [pdf]. [code/git].
Phosphor: Illuminating Dynamic Data Flow in Off-The Shelf JVMs. Jonathan Bell, Gail Kaiser. Proceeding of the 29th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications, 2014 [pdf]. [artifact]. [code/git].
A complete list of my publications is also available.