Software Engineering and Software Systems Researcher

Jon Bell

Jon is an Assistant Professor directing research in Software Engineering and Software Systems at Northeastern University.

His research in flaky tests has led to open source contributions to the Maven build system and Pit mutation testing framework. His program analysis research has resulted in several widely adopted runtime systems for the JVM, including the Phosphor taint tracking system (OOPSLA ‘14) and CROCHET checkpoint/rollback tool (ECOOP ‘18). His contributions to the object-oriented programming community were recognized with the 2020 Dahl-Nygaard Junior Researcher Prize. His research has been funded by the NSA and the NSF, and he is the recipient of the NSF CAREER award.

At Northeastern, Jon teaches Software Engineering, and previously, at George Mason, Jon received a university-wide Teacher of Distinction award for his courses in distributed systems, web development, and program analysis. Jon serves on a variety of program committees and was recently co-chair of the PLDI 2020 Artifact Evaluation Committee. As part of his efforts to broaden the participation of underrepresented groups in computing, Jon co-organized the mentoring workshop at ICSE in 2022, and the PL/SE mentoring workshop at SPLASH (in 2017, 2018, 2019 and 2020). In Summer 2020, Jon co-founded the Clowdr open source project to help support virtual academic conferences, and subsequently co-founded a startup to provide paid support and development for the project.

His other interests include photography, cooking and cycling.

More information about Jon, including a complete publications listing, is available in his Curriculum Vitae.

“Continuous Integration” (CI) is now a standard software engineering practice, allowing large test suites to be automatically executed in the cloud. Rather than run these large test suites on a weekly or monthly basis, they can be run as frequently as on every single change to the code under test, with complete traceability between test results and the code that generated them. At a cost of just a few dollars per-run, engineers benefit from faster feedback, identifying bugs and performance regressions sooner and optimizing productivity. But, CI is also a relatively new process, and it brings many challenges.

I study the problems that open-source software developers face when building software with continuous integration, design novel approaches to address those problems, and validate those solutions on the real-world problem. I release all of my tools and datasets under an open source license, and have integrated research findings into existing, popular open source software used for testing Java programs.

Flaky Tests: One major problem faced by developers who adopt CI arises from tests that can seemingly fail entirely unexpectedly and often randomly, so called “flaky tests.” Flaky tests undermine efficiency with CI, because developers cannot easily determine when a test failure is due to their recent changes or due to flakiness. My earliest work in flaky tests studied flaky tests caused by dependencies on the execution orders of tests(missing reference). My research in flaky tests has since grown to include broader empirical studies of the phenomena, answering research questions like: “What kinds of code changes cause tests to become flaky?” (Lam, Winter, Wei, Xie, Marinov, and Bell, 2020) and “How can we quickly determine if a test failure is flaky, or a true failure we should investigate?” (Bell, Legunsen, Hilton, Eloussi, Yung, and Marinov, 2018). To answer these questions, I created “software archaeology” experiments, building software to automatically examine thousands of revisions of dozens of open-source software projects, repeatedly executing each test suite to identify and profile flaky tests (Alshammari, Morris, Hilton, and Bell, 2021). This dataset of flaky tests has spawned a fast-growing area of research applying machine learning techniques to predict which tests are likely to be flaky. My ongoing research studies the causes of flaky test failures, allowing us to create new approaches to help developers understand and repair them faster.

Fuzzing: Simply having computing resources does not solve testing problems if developers’ test suites are not sufficiently thorough. Automated test generation, or “fuzz testing” aims to create a diverse suite of inputs that reveal interesting program behavior, including crashes and security vulnerabilities. I design new techniques to analyze the behavior of programs while they run, extending program analyses like “dynamic taint tracking” to precisely identify interactions between inputs and outputs (Bell and Kaiser, 2014; Hough and Bell, 2021). I have used these new analyses to design new approaches to detect code injection vulnerabilities in JVM software (Hough, Welearegai, Hammer, and Bell, 2020) and to generate better tests to reveal otherwise untested program behavior (Kukucka, Pina, Ammann, and Bell, 2022). Fuzz testing relies on evolutionary algorithms, and it can be difficult to understand why a fuzzer does or doesn’t execute some code path. My ongoing research studies the scientific foundations for fuzz testing, creating new methods for evaluating and improving fuzzers’ designs.

Software Supply Chain: Modern software built in languages like Java, Python and TypeScript increasingly relies on open-source libraries. Developers who utilize third-party libraries must ensure that they are kept up-to-date and ideally, ensure that those libraries have no security vulnerabilities. Developers who maintain those libraries, in turn, must ensure that any new releases that break backwards compatibility are clearly documented. While developers inside of large companies rely on CI processes to automatically test updates for compatibility and to examine updates for security vulnerabilities, the resources needed to create such an infrastructure for the open-source community are tremendous. My ultimate objective is to create an open-source observatory, analyzing new libraries and updates to existing libraries, detecting breaking changes and security vulnerabilities even before developers publish a release. My first project in this area focused on formalizing the semantics of dependency resolvers, creating a replacement for NPM that allows developers to reduce bloat and vulnerabilities by optimizing the versions of each dependency (Pinckney, Cassano, Guha, Bell, Culo, and Gamblin, 2023). Towards understanding the propagation of breaking changes, vulnerabilities, and fixes, we built a massive dataset consisting of all 28,941,927 versions of all 2,663,681 packages on NPM (Pinckney, Cassano, Guha, and Bell, 2023). This 20TB dataset is “live” and is constantly updated as new packages are published on NPM, and is a significant contribution in itself. My ongoing research studies automated approaches for detecting updates that introduce vulnerabilities and/or breaking changes.

References

  1. Flexible and Optimal Dependency Management via Max-SMT. Donald Pinckney, Federico Cassano, Arjun Guha, Jonathan Bell, Massimiliano Culo, Todd Gamblin. Proceedings of the 2023 International Conference on Software Engineering, 2023 [pdf]. [artifact].

    Details
  2. A Large Scale Analysis of Semantic Versioning in NPM. Donald Pinckney, Federico Cassano, Arjun Guha, Jonathan Bell. Proceedings of the 20th International Conference on Mining Software Repositories, 2023 [pdf]. [artifact].

    Details
  3. CONFETTI: Amplifying Concolic Guidance for Fuzzers. James Kukucka, Luis Pina, Paul Ammann, Jonathan Bell. Proceedings of the 2022 International Conference on Software Engineering, 2022 [pdf]. [artifact]. [code/git].

    Details
  4. A Practical Approach for Dynamic Taint Tracking with Control-flow Relationships. Katherine Hough, Jonathan Bell. ACM Transactions on Software Engineering and Methodology. 2021 ; 31(2). [pdf]. [code/git].

    Details
  5. FlakeFlagger: Predicting Flakiness Without Rerunning Tests. Abdulrahman Alshammari, Christopher Morris, Michael Hilton, Jonathan Bell. Proceedings of the 2021 International Conference on Software Engineering, 2021 [pdf]. [artifact]. [code/git].

    Details
  6. A Large-Scale Longitudinal Study of Flaky Tests. Wing Lam, Stefan Winter, Anjiang Wei, Tao Xie, Darko Marinov, Jonathan Bell. Proceedings of the ACM on Programming Languages. 2020 ; 3(OOPSLA). [pdf]. [artifact].

    Details
  7. Revealing Injection Vulnerabilities by Leveraging Existing Tests. Katherine Hough, Gere Welearegai, Christian Hammer, Jonathan Bell. Proceedings of the 2020 International Conference on Software Engineering, 2020 [pdf]. [artifact]. [code/git].

    Details
  8. DeFlaker: Automatically Detecting Flaky Tests. Jonathan Bell, Owolabi Legunsen, Michael Hilton, Lamyaa Eloussi, Tifany Yung, Darko Marinov. Proceedings of the 2018 International Conference on Software Engineering, 2018 [pdf]. [code/git].

    Details
  9. Phosphor: Illuminating Dynamic Data Flow in Off-The Shelf JVMs. Jonathan Bell, Gail Kaiser. Proceeding of the 29th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications, 2014 [pdf]. [artifact]. [code/git].

    Details

A complete list of my publications is also available.

I teach undergraduate and graduate Software Engineering and Systems classes, and make all of my teaching materials publicly available in the hope that they may be useful to others.

At Northeastern: At Gmu:

I contribute my time to service committees, and when I see an opportunity where I could make a substantive change to improve the community, volunteer for leadership positions. I serve on program committees for top conferences in Software Engineering (e.g. ICSE, ASE, FSE, ISSTA) and support the organization of these and other conferences by co-chairing committees. I review for top journals in my field (e.g. ACM’s TOSEM and IEEE’s TSE) and national funding agencies (NSF and DOE). Within my college, I co-chair the PhD admissions committee and lead the CAREER Club (a structured mentoring program for new faculty).

Service Activities in 2023:

Conference/Professional Organization Leadership
  • ISSTA Tools and Demos Track Co-Chair
  • Workshop on Software Engineering Education for the Next Generation at ICSE Workshop Co-Organizer
Conference Technical Program Committee Membership
  • Automated Software Engineering (ASE)
  • Foundations of Software Engineering (ESEC/FSE)
  • International Conference on Program Comprehension (ICPC)
  • International Symposium on Software Testing and Analysis (ISSTA)
Journal Reviewing
  • Empirical Software Engineering
  • ACM Transactions on Software Engineering and Methodology
  • IEEE Transactions on Software Engineering
Other Program Committee Activities
  • ICSE New Ideas and Emerging Results
  • FSE Software Defect Datasets Workshop
  • ISSTA Tool Demonstrations Track
  • ICSE Workshops Program Committee
Funding and Award Committee Activities
  • CRA-E Undergraduate Research Award Committee
  • National Science Foundation Grant Review Panel
Northeastern University, Khoury College Committee Leadership
  • PhD CS Admissions Co-Chair
  • CAREER Club Faculty Lead
Northeastern University, Committee Membership
  • Cadre of University Marshalls

Planned Service Commitments for 2024:

Conference/Professional Organization Leadership
  • ICSE Student Mentoring Workshop Co-Chair
Conference Technical Program Committee Membership
  • Automated Software Engineering (ASE)
  • International Conference on Program Comprehension (ICPC)
  • International Conference on Mining Software Repositories (MSR)
  • International Conference on Software Engineering (ICSE)
  • International Symposium on Software Testing and Analysis (ISSTA)
  • Object-oriented Programming, Systems, Languages, and Applications (OOPSLA)
Journal Reviewing
  • Empirical Software Engineering
Other Program Committee Activities
  • ICSE Doctoral Symposium Committee
  • ICSE Test Flakiness Workshop
  • ISSTA Tool Demonstrations Track
Northeastern University, Khoury College Committee Leadership
  • PhD CS Admissions Co-Chair
  • CAREER Club Faculty Lead
Northeastern University, Committee Membership
  • Cadre of University Marshalls
All Service Activities
Conference/Professional Organization Leadership
  • ACM SIGSOFT Open Science Initiative Co-Chair, 2019
  • PLDI Artifact PC Co-Chair, 2020
  • ICSE Student Mentoring Workshop Co-Chair, 2022, 2024
  • SPLASH Student Mentoring Workshop Co-Chair, 2017, 2018, 2019, 2020
  • SPLASH Student Volunteer Co-Chair, 2013, 2014, 2015
  • SPLASH Posters Co-Chair, 2017
  • ISSTA Publicity Co-Chair, 2020
  • SPLASH Publicity Co-Chair, 2018
  • ISSTA Tools and Demos Track Co-Chair, 2023
  • ISSTA Virtualization Chair, 2020
  • ICSE Virtualization Technology Co-Chair, 2020
  • Workshop on Software Engineering Education for the Next Generation at ICSE Workshop Co-Organizer, 2023
  • Workshop on Designing and Running Project-Based Courses in Software Engineering Education at ICSE Workshop Co-Organizer, 2022
  • Workshop on Games and Software Engineering at ICSE Workshop Co-Organizer, 2012
Conference Technical Program Committee Membership
  • Automated Software Engineering (ASE), 2018, 2019, 2020, 2021, 2022, 2023, 2024
  • Foundations of Software Engineering (ESEC/FSE), 2022, 2023
  • IEEE Secure Development Conference, 2021
  • International Conference on Program Comprehension (ICPC), 2023, 2024
  • International Conference on Mining Software Repositories (MSR), 2020, 2024
  • International Conference on Software Engineering (ICSE), 2019, 2020, 2021, 2022, 2024, 2025
  • International Conference on Software Testing (ICST), 2019
  • International Symposium on Software Testing and Analysis (ISSTA), 2021, 2022, 2023, 2024
  • International Symposium on Software Reliability Engineering (ISSRE), 2022
  • Object-oriented Programming, Systems, Languages, and Applications (OOPSLA), 2024
Journal Reviewing
  • Automated Software Engineering, 2021
  • Empirical Software Engineering, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024
  • Journal of Systems and Software, 2016, 2017, 2018, 2019, 2020, 2021, 2022
  • IEEE Software, 2017
  • ACM Transactions on Software Engineering and Methodology, 2020, 2021, 2022, 2023
  • IEEE Transactions on Software Engineering, 2020, 2021, 2022, 2023
  • IEEE Transactions on Reliability, 2020
Other Program Committee Activities
  • ICSME Artifact Evaluation Committee, 2017
  • ISSTA Artifact Evaluation Committee, 2015
  • OOPSLA Artifact Evaluation Committee, 2015, 2016
  • ISSTA Doctoral Symposium Committee, 2022
  • ICSE Doctoral Symposium Committee, 2024
  • ICSE Test Flakiness Workshop, 2024
  • ICSME Late Breaking Ideas Track, 2019
  • MSR Mining Challenge Track, 2017, 2018, 2020
  • ICSE New Ideas and Emerging Results, 2023
  • FSE Release Engineering Workshop, 2016
  • FSE Software Defect Datasets Workshop, 2023
  • ICST Testing Tools Track, 2016, 2020
  • ISSTA Tool Demonstrations Track, 2021, 2022, 2023, 2024
  • ICSE Workshops Program Committee, 2023
Funding and Award Committee Activities
  • CRA-E Undergraduate Research Award Committee, 2021, 2022, 2023
  • Department of Energy Funding Panel, 2021
  • ICSE Student Research Competition Judge, 2019, 2020
  • National Science Foundation Grant Review Panel, 2017, 2018, 2019, 2020, 2022, 2023
Northeastern University, Khoury College Committee Leadership
  • Code4Community Student Group Faculty Advisor, 2021, 2022
  • PhD CS Admissions Co-Chair, 2023, 2024
  • CAREER Club Faculty Lead, 2023, 2024
Northeastern University, Committee Membership
  • Cadre of University Marshalls , 2022, 2023, 2024
  • PhD CS Admissions , 2021, 2022
  • Faculty Engagement and Mentoring , 2021, 2022
George Mason University, CS Department Committee Membership
  • Computing , 2016, 2017, 2018, 2019, 2020
  • PhD Recruitment and Evaluation , 2017, 2018, 2019, 2020
  • MS in Software Engineering Admissions , 2016, 2017, 2018, 2019, 2020
  • Student Run Computing & Technology Faculty Advisor, 2018, 2019, 2020
  • Tenure Track Faculty Recruitment , 2018, 2019, 2020
  • Teaching Track Faculty Recruitment , 2017, 2018, 2019
Columbia University, CS Department Committee Leadership
  • Social Committee Chair, 2011, 2012, 2013, 2014, 2015, 2016

Email: [email protected]

Mastodon: [email protected]

GitHub: jon-bell