Skip to content
forked from copernico/msr2019

Dataset and sample code for the paper "A Manually-Curated Dataset of Fixes to Vulnerabilities of Open-Source Software"

License

Notifications You must be signed in to change notification settings

sariforo/msr2019

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Open-source vulnerability assessment knowledge base License PRs Welcome

The open-source vulnerability assessment knowledge base aggregates public information about security vulnerabilities in open source components. This information is the fuel required to run the vulnerability assessment tool.

For each security vulnerability, the knowledge base comprises the following information:

  • A vulnerability identifier, typically a CVE
  • The URL of the source code repository of the affected open source component, typically one hosted on GitHub
  • One or more commits fixing the respective vulnerability (the so-called fix commit)

Using the patch-analyzer, one of the components of the vulnerability assessment tool, this information is processed and eventually imported into a PostgrSQL database used for the actual analysis of Java and Python applications. Please refer to the tutorial and manual of the vulnerability assessment tool explaining how to perform the import.

MSR 2019 DATA SHOWCASE SUBMISSION

A description of the dataset and its possible applications (on top of fueling the vulerability assessment tool) can be found in the following paper, please cite it if you use the dataset for your research work:

@MISC{ponta2019dataset,
  author={Serena E. Ponta and Henrik Plate and Antonino Sabetta and Michele Bezzi and C\'edric Dangremont},
  url={https://arxiv.org/pdf/1902.02595.pdf},
  title={A Manually-Curated Dataset of Fixes to Vulnerabilities of Open-Source Software},
  year={2019},
  month={February},
}  

The Jupyter notebook used to analyze the dataset and to produce the statistics and the plots shown in the paper can be found here.

Motivation

The motivation to open source this dataset is two-fold:

  • First, researchers have access to a manually curated dataset of high quality in order to conduct further research in the areas of software security and software engineering.
  • Second, users of open source components can use this dataset to update and maintain their local instance of the vulnerability assessment tool as to check whether their Java and Python applications are affected by open source vulnerabilities.

Eventually, we hope that this knowledge base will be maintained in a collaborative manner.

Credits

Note that 3rd party information from NVD and MITRE has been used input for compiling this knowledge base. See MITRE's Terms of Use for more information.

Features

See here to learn about features of the vulnerability assessment tool that become possible due to the dataset available in this knowledge base.

Requirements

To process the information of the knowledge base, one has to have a running local instance of the vulnerability assessment tool.

Furthermore, you need the Java 8 JRE in order to run the patch-analyzer that processes commit information, uploads the analysis results to the rest-backend, which eventually stores the data in PostgreSQL database. The following image provides an overview about all the involved components.

Configuration

Not applicable

Limitations

As of today, this knowledge base only contains information about vulnerabilities in Java and Python open source components. Even though the vulnerability assessment tool has been designed with extensibility in mind, other programming languages are not yet supported.

Known Issues

The list of current issues is available here

How to obtain support

Use the following link to Stack Overflow to search for FAQs or to request help.

Bug reports shall be submitted as GitHub issues, please refer to the next section for more details.

Contributing

Until we have defined a structured process to share the maintenance of the knowledge base, we invite you to just create informal pull requests in order to submit new open source vulnerabilities. Such pull requests should contain a vulnerability identifier, the URL of the source code repository of the affected component and one or more identifiers of the commits used to fix the vulnerability.

To-Do (upcoming changes)

Process description and tooling to support the shared maintenance of the knowledge base, and to support the automated synchronization of local instances with this GitHub repository.

License

Copyright (c) 2019 SAP SE or an SAP affiliate company. All rights reserved.

This project is licensed under the Apache Software License, v.2 except as noted otherwise in the LICENSE file.

About

Dataset and sample code for the paper "A Manually-Curated Dataset of Fixes to Vulnerabilities of Open-Source Software"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 64.1%
  • Python 35.9%