BugSwarm REST API
Dataset · Database organization
Database schema for artifact metadata
Each artifact is associated with the following attributes. The attributes are described in more detail in the tables that follow.
Database Schema
'added_version': string # e.g. '1.1.2'
'base_branch': string # e.g. 'master'
'branch': string # e.g. 'my-new-feature'
'build_system': string # e.g. 'Maven'
'cached': boolean # e.g. True or False
'ci_service': 'travis' | 'github'
'classification': {
'build': 'Yes' | 'No' | 'Partial' # e.g 'No'
'code': 'Yes' | 'No' | 'Partial' # e.g. 'Yes'
'test': 'Yes' | 'No' | 'Partial' # e.g. 'Partial'
'exceptions': string[] # e.g. ['NullPointerException', ...]
}
'creation_time': integer
'current_image_tag': string
'deprecated_version': string # e.g. '1.2.0'
'failed_job': {
'base_sha': string # e.g. '1234abc'
'build_id': integer # e.g. 12345678
'build_job': string # e.g. '140.1'
'committed_at': timestamp # e.g. '2015-08-10T14:26:08Z'
'config': dict
'failed_tests': string # e.g. 'testHelloWorld#testPrintLn'
'job_id': integer # e.g. 12345679
'message': string # e.g. '- Updated to 4.4.0\n- Added pulse icon support'
'mismatch_attrs': string[] # e.g. ['num_tests_run', 'num_tests_failed', ...]
'num_tests_failed': integer # e.g. 3
'num_tests_run': integer # e.g. 16
'patches': dict
'trigger_sha': string # e.g. '1234xyz'
}
'filtered_reason': string # e.g. 'no head sha'
'image_tag': string # e.g. '74924751'
'is_error_pass': boolean # e.g. True or False
'lang': string # e.g. 'Java'
'match': integer # e.g. 2
'merged_at': timestamp # e.g. '2015-08-18T12:30:27Z'
'metrics': {
'additions': integer
'changes': integer
'deletions': integer
'num_of_changed_files': integer
}
'passed_job': {
'base_sha': string # e.g. '5678def'
'build_id': integer # e.g. 98765432
'build_job': string # e.g. '141.1'
'committed_at': timestamp # e.g. '2015-08-10T16:21:24Z'
'config': dict
'failed_tests': string # e.g. ''
'job_id': integer # e.g. 74943870
'message': string # e.g. 'Replaced tab to white space.'
'mismatch_attrs': string[] # e.g. ['tr_log_status', ...]
'num_tests_failed': integer # e.g. 0
'num_tests_run': integer # e.g. 16
'patches': dict
'trigger_sha': string # e.g. '7890uvw'
}
'pr_num': integer # e.g. 379
'repo_mined_version': string
'repo': string # e.g. 'gwtbootstrap3/gwtbootstrap3'
'reproduce_attempts': integer # e.g. '5'
'reproduce_successes': integer # e.g. '5'
'reproduced': boolean # e.g. True or False
'reproducibility_status': {
'status': 'Reproducible' | 'Flaky' | 'Unreproducible'
'time_stamp': timestamp
}
'stability': string # e.g. '5/5'
'status': 'active' | 'candidate' | 'deprecated'
'test_framework': string # e.g. 'JUnit'
Attribute Descriptions
The following is a list of the attributes included in the artifact metadata.
Note that the timestamp
type refers to a timestamp in the ISO 8601 format (<yyyy>-<mm>-<dd>T<hh>:<mm>:<ss>Z
).
Attribute | Type | Description |
---|---|---|
added_version | string |
The version of the dataset the artifact was officially added in. Null if status = candidate . |
base_branch | string |
The branch into which pull request changes are merged. Only valid on pairs from pull requests. |
branch | string |
The branch from which pull request changes are merged. Only valid on pairs from pull requests. |
build_system | string |
The build system (e.g. Maven) used by the artifact. 'NA' if no build system is used (e.g. for Python artifacts). |
cached | bool |
Whether the artifact has been cached. If true , the artifact is present in the bugswarm/cached-images Docker repository. |
ci_service | 'travis' , 'github' |
The CI service the artifact was mined from. Either travis or github . |
classification.build | 'Yes' , 'No' , 'Partial' |
The patch classification for build related files. |
classification.code | 'Yes' , 'No' , 'Partial' |
The patch classification for code related files. |
classification.test | 'Yes' , 'No' , 'Partial' |
The patch classification for test related files. |
classification.exceptions | string[] |
The list of exceptions thrown during the failed job. |
creation_time | integer |
The Unix timestamp at which this artifact was created. Note that the API also returns a _created field, which is the same data but in the timestamp format. |
current_image_tag | string |
The same as image_tag . |
deprecated_version | string |
The version of the dataset that this artifact was deprecated in, or null if the artifact has not been deprecated. |
failed_job | dict |
Information relating to the failed job. See failed_job and passed_job . |
filtered_reason | string |
If the pair was marked as not suitable for reproducing by PairFilter, then this attribute contains a human-readable reason for PairFilter's decision. |
image_tag | string |
The tag identifying the Docker image associated with this artifact. |
is_error_pass | bool |
Whether the artifact contains an error-pass pair (rather than a fail-pass pair). |
lang | string |
The language of the build, as indicated by a project's travis.yml file or the repo's language as classified by GitHub. |
match | integer |
The match type for the pair. Only valid if reproduced is true . Otherwise, the default value is empty string '' . |
merged_at | timestamp |
The time when the pull request associated with the pair was merged. Only valid on pairs from pull requests. |
metrics.additions | integer |
The number of lines added to the code between the failed and passed jobs. |
metrics.changes | integer |
The number of lines changed (additions + deletions) between the failed and passed jobs. |
metrics.deletions | integer |
The number of lines deleted from the code between the failed and passed jobs. |
metrics.num_of_changed_files | integer |
The number of files changed between the failed and passed jobs. |
passed_job | dict |
Information relating to the passed job. See failed_job and passed_job . |
pr_num | integer |
The number uniquely identifying the pull request within this project. Only valid on pairs from pull requests. The default value is -1 if pairs are not from pull requests. |
repo | string |
The repository slug that identifies a project on GitHub. |
reproduce_attempts | integer |
The number of times the reproducer ran. |
reproduce_successes | integer |
The number of times the job was completed as expected. |
reproduced | bool |
Whether the Reproducer attempted to build the pair. This attribute will be false if a pair was marked as not suitable for reproducing by PairFilter. |
reproducibility_status.status | 'Reproducible' , 'Flaky' , Unreproducible |
The artifact's reproducibility: Unreproducible , Flaky , or Reproducible . |
reproducibility_status.time_stamp | timestamp |
The date at which reproducibility_status.status was last calculated. |
stability | string |
The proportion of times the job completed as expected. The format is reproduce_successes /reproduce_attempts |
status | 'active' , 'candidate' , 'deprecated' |
The artifact's status in the dataset. One of active (an official artifact), candidate (not officially added to the dataset), or deprecated (removed from the dataset). |
test_framework | string |
The test framework for both jobs. Empty string if the Analyzer failed to find the framework. |
failed_job
and passed_job
The following attributes are contained in the failed_job
and passed_job
attributes.
Attribute | Type | Description |
---|---|---|
base_sha | string |
(PR jobs only) The SHA of the commit that was merged with trigger_sha to create the Travis virtual commit used for the Travis build. |
build_id | integer |
The number uniquely identifying the Travis build/GitHub Actions workflow run. |
build_job | string |
The dot-separated pair of numbers uniquely identifying the job within this project. |
committed_at | timestamp |
The timestamp associated with base_sha . |
config | dict |
Job-specific configuration. In Travis artifacts, this is the Travis job config. In GitHub Actions artifacts, this is the contents of the jobs section in the workflow file corresponding to the job that was run. |
failed_tests | string |
A list of the tests that failed during this job, separated by the # symbol. |
job_id | integer |
The number uniquely identifying this job on Travis/GitHub Actions. |
message | string |
The commit message associated with trigger_sha . |
mismatch_attrs | string[] |
The attributes, if any, that did not match when extracted from the original build log and the reproduced build log. |
num_tests_failed | integer |
The number of tests that failed during the job. |
num_tests_run | integer |
The number of tests that ran during the job. |
patches | dict |
A log of the patches applied to this artifact to keep it reproducible. Each key is the name of the patch, and each value is the date the patch was applied. |
trigger_sha | string |
The SHA of the commit that, after being pushed to GitHub, triggered the Travis build or GitHub Actions run. |