Introduction · What is BugSwarm?


High Level View

BugSwarm is a toolset that enables the creation of scalable, diverse, realistic, continuously growing set of reproducible build failures and fixes from open-source real-world systems.

The toolset has two major components: Mining and Reproducing.

Mining

We mine builds from projects on GitHub that use Continuous Integration service, Travis-CI. We mine fail-pass build pairs such that the first build in the pair fails and the second, which is the next chronologically in Git history on each branch, passes.

Reproducing

We get the orginal build environment that was used by Travis-CI, which is a docker image, and we generate scripts to build and run the regression tests for each build. We match the reproduced log, which is a transcript of everything that happens at the command line during building and testing, with the historical build log from Travis-CI. We do this five times to account for reproducibility and flakiness.

The toolset has already created a dataset of over 3,000 artifacts.


Next article