This repository contains source code and data of the paper "Understanding and Remediating Open-Source License Incompatibilities in the PyPI Ecosystem" published at 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE).
- Weiwei Xu, Hao He, Kai Gao, Minghui Zhou. Understanding and Remediating Open-Source License Incompatibilities in the PyPI Ecosystem. ASE'2023
📢 A tool based on SILENCE for compliance analysis and incompatibility remediation for Python projects is now available at https://licenserec.com/#/compliance.
In this paper, we first conduct a large-scale empirical study of license incompatibility in PyPI ecosystem. Inspired by our findings, we propose SILENCE, an SMT-solver-based incompatibility remediator for licenses in the dependency graph. Given a release and its dependency graph with one or more license incompatibilities, SILENCE
- finds alternative licenses that are compatible with the dependency graph, and
- searches for alternative graphs with no license incompatibilities and minimal difference with the original graph. The results are aggregated as a report of recommended remediations (i.e., migrations, removals, version pinnings, or license changes) for developers to consider.
licensing_data_collection
: licensing information collectiondep_resolve
: Python dependency tree resolutionknowledge_base
contains license compatibility matrix, migration patterns, license keywords and so on.analysis.py
: data analysis of empirical studyRQ1_license_distribution.ipynb
: results of RQ1RQ1_license_evolution.ipynb
: results of RQ1RQ2_license_incompatibility.ipynb
: results of RQ2RQ3_license_remediation_practice.md
: note of RQ3remediator.py
: implementation of SILENCE's SMT-based partrelicenser.py
: implementation of SILENCE's relicenserres
contains results and evaluation of SILENCE.data
contains our datasetSILENCE.py
contains the main function of SILENSE.
- The dataset is stored in the
package
collection in thelicense
database. You can get the dataset indata
directory (due to the file size limit on GitHub, please download the dataset at here) and you need to import it into MongoDB. Run:
mongorestore --db=license --gzip data/package.bson.gz
- For results of the empirical study, you can run:
RQ1_license_distribution.ipynb
RQ1_license_evolution.ipynb
RQ2_license_incompatibility.ipynb
- To get remediations of all incompatibilities in top 5,000 downloaded packages, you can run:
python remediator.py all
python relicenser.py
- If you want to get remediations in dependency graph for a specific package version, run:
python SILENCE.py -n name -v version
For example, if you want to get remediations for fiftyone 0.18.0 in the paper, you can run:
python SILENCE.py -n fiftyone -v 0.18.0
you will get the output as follows:
Possible Remediations for fiftyone 0.18.0:
1. Change project license to GPL-3.0-only, GPL-3.0-or-later, or AGPL-3.0-only;
2. Or make the following dependency changes:
a) Remove ndjson;
b) Pin voxel51-eta to 0.1.9;
c) Pin pillow to 6.2.2;
d) Pin imageio to 2.9.0;
e) Pin h11 to 0.11.0.
3. Or make the following dependency changes:
a) Remove voxel51-eta;
b) Remove ndjson;
c) Pin h11 to 0.11.0.
The project is licensed under MulanPubL-2.0.
For citing, please use the following BibTex citation:
@inproceedings{SILENCE2023,
title={Understanding and Remediating Open-Source License Incompatibilities in the PyPI Ecosystem},
author={Xu, Weiwei and He, Hao and Gao, Kai and Zhou, Minghui},
booktitle={2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)},
pages={178--190},
year={2023},
organization={IEEE}
}