Dataset · Database organization


Database schema for artifact metadata

Each artifact is associated with the following attributes. The attributes are described in more detail in the tables that follow.

Database Schema

'added_version':            string                      # e.g. '1.1.2'
'base_branch':              string                      # e.g. 'master'
'branch':                   string                      # e.g. 'my-new-feature'
'build_system':             string                      # e.g. 'Maven'
'cached':                   boolean                     # e.g. True or False
'ci_service':               'travis' | 'github'
'classification': {
    'build':                'Yes' | 'No' | 'Partial'    # e.g  'No'
    'code':                 'Yes' | 'No' | 'Partial'    # e.g. 'Yes'
    'test':                 'Yes' | 'No' | 'Partial'    # e.g. 'Partial'
    'exceptions':           string[]                    # e.g. ['NullPointerException', ...]
}
'creation_time':            integer
'current_image_tag':        string
'deprecated_version':       string                      # e.g. '1.2.0'
'failed_job': {
    'base_sha':             string                      # e.g. '1234abc'
    'build_id':             integer                     # e.g. 12345678
    'build_job':            string                      # e.g. '140.1'
    'committed_at':         timestamp                   # e.g. '2015-08-10T14:26:08Z'
    'config':               dict
    'failed_tests':         string                      # e.g. 'testHelloWorld#testPrintLn'
    'job_id':               integer                     # e.g. 12345679
    'message':              string                      # e.g. '- Updated to 4.4.0\n- Added pulse icon support'
    'mismatch_attrs':       string[]                    # e.g. ['num_tests_run', 'num_tests_failed', ...]
    'num_tests_failed':     integer                     # e.g. 3
    'num_tests_run':        integer                     # e.g. 16
    'patches':              dict
    'trigger_sha':          string                      # e.g. '1234xyz'
}
'filtered_reason':          string                      # e.g. 'no head sha'
'image_tag':                string                      # e.g. '74924751'
'is_error_pass':            boolean                     # e.g. True or False
'lang':                     string                      # e.g. 'Java'
'match':                    integer                     # e.g. 2
'merged_at':                timestamp                   # e.g. '2015-08-18T12:30:27Z'
'metrics': {
    'additions':            integer
    'changes':              integer
    'deletions':            integer
    'num_of_changed_files': integer 
}
'passed_job': {
    'base_sha':             string                      # e.g. '5678def'
    'build_id':             integer                     # e.g. 98765432
    'build_job':            string                      # e.g. '141.1'
    'committed_at':         timestamp                   # e.g. '2015-08-10T16:21:24Z'
    'config':               dict
    'failed_tests':         string                      # e.g. ''
    'job_id':               integer                     # e.g. 74943870
    'message':              string                      # e.g. 'Replaced tab to white space.'
    'mismatch_attrs':       string[]                    # e.g. ['tr_log_status', ...]
    'num_tests_failed':     integer                     # e.g. 0
    'num_tests_run':        integer                     # e.g. 16
    'patches':              dict
    'trigger_sha':          string                      # e.g. '7890uvw'
}
'pr_num':                   integer                     # e.g. 379
'repo_mined_version':       string
'repo':                     string                      # e.g. 'gwtbootstrap3/gwtbootstrap3'
'reproduce_attempts':       integer                     # e.g. '5'
'reproduce_successes':      integer                     # e.g. '5'
'reproduced':               boolean                     # e.g. True or False
'reproducibility_status': {
    'status':               'Reproducible' | 'Flaky' | 'Unreproducible'
    'time_stamp':           timestamp
}
'stability':                string                      # e.g. '5/5'
'status':                   'active' | 'candidate' | 'deprecated'
'test_framework':           string                      # e.g. 'JUnit'

Attribute Descriptions

The following is a list of the attributes included in the artifact metadata. Note that the timestamp type refers to a timestamp in the ISO 8601 format (<yyyy>-<mm>-<dd>T<hh>:<mm>:<ss>Z).

Attribute Type Description
added_version string The version of the dataset the artifact was officially added in. Null if status = candidate.
base_branch string The branch into which pull request changes are merged. Only valid on pairs from pull requests.
branch string The branch from which pull request changes are merged. Only valid on pairs from pull requests.
build_system string The build system (e.g. Maven) used by the artifact. 'NA' if no build system is used (e.g. for Python artifacts).
cached bool Whether the artifact has been cached. If true, the artifact is present in the bugswarm/cached-images Docker repository.
ci_service 'travis', 'github' The CI service the artifact was mined from. Either travis or github.
classification.build 'Yes', 'No', 'Partial' The patch classification for build related files.
classification.code 'Yes', 'No', 'Partial' The patch classification for code related files.
classification.test 'Yes', 'No', 'Partial' The patch classification for test related files.
classification.exceptions string[] The list of exceptions thrown during the failed job.
creation_time integer The Unix timestamp at which this artifact was created. Note that the API also returns a _created field, which is the same data but in the timestamp format.
current_image_tag string The same as image_tag.
deprecated_version string The version of the dataset that this artifact was deprecated in, or null if the artifact has not been deprecated.
failed_job dict Information relating to the failed job. See failed_job and passed_job.
filtered_reason string If the pair was marked as not suitable for reproducing by PairFilter, then this attribute contains a human-readable reason for PairFilter's decision.
image_tag string The tag identifying the Docker image associated with this artifact.
is_error_pass bool Whether the artifact contains an error-pass pair (rather than a fail-pass pair).
lang string The language of the build, as indicated by a project's travis.yml file or the repo's language as classified by GitHub.
match integer The match type for the pair. Only valid if reproduced is true. Otherwise, the default value is empty string ''.
merged_at timestamp The time when the pull request associated with the pair was merged. Only valid on pairs from pull requests.
metrics.additions integer The number of lines added to the code between the failed and passed jobs.
metrics.changes integer The number of lines changed (additions + deletions) between the failed and passed jobs.
metrics.deletions integer The number of lines deleted from the code between the failed and passed jobs.
metrics.num_of_changed_files integer The number of files changed between the failed and passed jobs.
passed_job dict Information relating to the passed job. See failed_job and passed_job.
pr_num integer The number uniquely identifying the pull request within this project. Only valid on pairs from pull requests. The default value is -1 if pairs are not from pull requests.
repo string The repository slug that identifies a project on GitHub.
reproduce_attempts integer The number of times the reproducer ran.
reproduce_successes integer The number of times the job was completed as expected.
reproduced bool Whether the Reproducer attempted to build the pair. This attribute will be false if a pair was marked as not suitable for reproducing by PairFilter.
reproducibility_status.status 'Reproducible', 'Flaky', Unreproducible The artifact's reproducibility: Unreproducible, Flaky, or Reproducible.
reproducibility_status.time_stamp timestamp The date at which reproducibility_status.status was last calculated.
stability string The proportion of times the job completed as expected. The format is reproduce_successes/reproduce_attempts
status 'active', 'candidate', 'deprecated' The artifact's status in the dataset. One of active (an official artifact), candidate (not officially added to the dataset), or deprecated (removed from the dataset).
test_framework string The test framework for both jobs. Empty string if the Analyzer failed to find the framework.

failed_job and passed_job

The following attributes are contained in the failed_job and passed_job attributes.

Attribute Type Description
base_sha string (PR jobs only) The SHA of the commit that was merged with trigger_sha to create the Travis virtual commit used for the Travis build.
build_id integer The number uniquely identifying the Travis build/GitHub Actions workflow run.
build_job string The dot-separated pair of numbers uniquely identifying the job within this project.
committed_at timestamp The timestamp associated with base_sha.
config dict Job-specific configuration. In Travis artifacts, this is the Travis job config. In GitHub Actions artifacts, this is the contents of the jobs section in the workflow file corresponding to the job that was run.
failed_tests string A list of the tests that failed during this job, separated by the # symbol.
job_id integer The number uniquely identifying this job on Travis/GitHub Actions.
message string The commit message associated with trigger_sha.
mismatch_attrs string[] The attributes, if any, that did not match when extracted from the original build log and the reproduced build log.
num_tests_failed integer The number of tests that failed during the job.
num_tests_run integer The number of tests that ran during the job.
patches dict A log of the patches applied to this artifact to keep it reproducible. Each key is the name of the patch, and each value is the date the patch was applied.
trigger_sha string The SHA of the commit that, after being pushed to GitHub, triggered the Travis build or GitHub Actions run.

Previous article

Next article