Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce standardized license identifiers #3699

Merged
merged 2 commits into from
Dec 17, 2024
Merged

Conversation

Biscgit
Copy link
Member

@Biscgit Biscgit commented Nov 20, 2024

Introduce standardized license identifiers

With this PR we are moving to use standardized licenses (as listed in SPDX license list) for all records. I also added a test and GH Action to ensure that new records will have the license identifiers set correctly before merging.

Once this has been merged, the compatibility implementations in the portal can be removed safely.

Warning

Only merge this AFTER cernopendata/cernopendata-portal#87 has been deployed to production! Otherwise, it might break the current implementation of displaying licenses.

Edit: This has been deployed by now

Important

Since not all licenses could be clearly transfered to which they concretly refer to (for example which BSD-X-Clause), please especially check the changes in the following comments.

Closes #3111

@Biscgit Biscgit linked an issue Nov 20, 2024 that may be closed by this pull request
@@ -115,7 +115,7 @@
"CMS"
],
"license": {
"attribution": "BSD License"
"attribution": "BSD-3-Clause"
Copy link
Member Author

@Biscgit Biscgit Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No further information on which exact BSD license is used. Assuming most commonly used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking into the source code repository linked in that record: https://github.com/cms-outreach/ispy-webgl/blob/master/LICENSE I see that the license is actually MIT now, not BSD.

@@ -129,7 +129,7 @@
"CMS"
],
"license": {
"attribution": "GNU General Public License (GPL) version 3"
"attribution": "GPL-3.0-only"
Copy link
Member Author

@Biscgit Biscgit Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No further information on if only the current GPLv3 license should be used, or also apply to changes in upcoming versions. Assuming only the current license should be valid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should be OK. However if we want to be very precise, you can check the license body that exists in the linked repository: https://github.com/cms-opendata-analyses/pattuples2011/blob/master/COPYING

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to leave it with -only. It should be much easier to expand that sometime in the future if needed, than later revert the changes.

@Biscgit Biscgit requested a review from tiborsimko November 20, 2024 13:35
@Biscgit Biscgit changed the title Standardize license ids Introduce standardized license identifiers Nov 20, 2024
@Biscgit Biscgit force-pushed the standardize-license-ids branch from cc4df0e to 996a6f7 Compare November 20, 2024 14:00
@Biscgit Biscgit force-pushed the standardize-license-ids branch 2 times, most recently from fe2aab2 to 4810ff3 Compare December 12, 2024 13:48
@@ -0,0 +1,92 @@
"""Check if license fields are valid in all records."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: ‏Can you please introduce leading shebang?

#!/usr/bin/env python

And make this script executable by itself?

chmod u+x scripts/check_licenses.py

This will be good for consistency with the other helper scripts.

import pathlib
import time

VALID_LICENSE_IDENTIFIERS = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: ‏I appreciate that you included the checker tool together with fixture changes 👍

run-tests.sh Outdated
@@ -99,13 +99,18 @@ check_isort () {
isort -rc -c -df --profile black -- **/*.py
}

check_licenses () {
python3 scripts/check_licenses.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick (non-blocking): ‏After chmod-ding the check license script, see the other comment, you can remove reading python3 here.


if attr not in VALID_LICENSE_IDENTIFIERS:
recid = record.get("recid", "UNSET")
message = f"Invalid license `{attr}` identifier in file {path.name} with recid {recid}! "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: ‏The order "invalid license FOO identifier" could be inverted for readability: "invalid license identifier FOO".

todo: Instead of "with recid RECID", we should say "for recid RECID".

logging.error(
f"Validation completed with {errors} errors!\n"
f"\tPlease ensure the licenses are one of the following: {VALID_LICENSE_IDENTIFIERS}.\n"
f"\tIn case it is a valid license, please contact the Open Data Team."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: ‏The last phrase may not be clear to users. Perhaps say: "If you use a valid SPDX license string that is not in the above list, please contact [email protected]".

@@ -115,7 +115,7 @@
"CMS"
],
"license": {
"attribution": "BSD License"
"attribution": "BSD-3-Clause"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking into the source code repository linked in that record: https://github.com/cms-outreach/ispy-webgl/blob/master/LICENSE I see that the license is actually MIT now, not BSD.

@@ -129,7 +129,7 @@
"CMS"
],
"license": {
"attribution": "GNU General Public License (GPL) version 3"
"attribution": "GPL-3.0-only"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should be OK. However if we want to be very precise, you can check the license body that exists in the linked repository: https://github.com/cms-opendata-analyses/pattuples2011/blob/master/COPYING

@tiborsimko
Copy link
Member

todo: ‏I'd also have some commit log cosmetics comments. The current status looks like:

  • Licenses: Rebased on main branch and adjusted licenses for new delphi datasets
  • checks: improved final log messages
  • license: changed licenses current ids to standardized format
  • checks: fixed missing error exit
  • checks: created new action for running the license check
  • checks: applied linter changes
  • checks: created check for upcoming license changes

It is not necessary to have all these commits separate, since chances are we won't start any branch from those "intermediate" commits that are not really necessary and that may only pollute git commit history. (Which could be counterproductive for later bisecting). This is just cosmetics for the pull request at hand, but may be important for big feature changes for the application. It's good to use the same principles here, for consistency.

You may want to squash everything into basically two different commits:

  • ci(check-licenses): add licence value checker
  • feat(records): switch to SPDX license attribution values

Note also the usage of Conventional Commits style for the commit messages. It's a nice wide-spread standard, we use it for REANA or for cernopendata-client, and I started to use it lately for the open data content repository as well. It make things tidier when one inspects the commit headers of a repository.

@tiborsimko
Copy link
Member

todo: ‏Also please add yourself to the AUTHORS.rst file so that your contributions are nicely and visibly acknowledged 😄

@Biscgit Biscgit force-pushed the standardize-license-ids branch 2 times, most recently from 36c9766 to 7d1843b Compare December 13, 2024 10:16
@Biscgit
Copy link
Member Author

Biscgit commented Dec 13, 2024

I merged today's introduced atlas records into this branch as well 👍

@tiborsimko tiborsimko force-pushed the standardize-license-ids branch from 7d1843b to 1feff5a Compare December 17, 2024 12:48
@tiborsimko tiborsimko force-pushed the standardize-license-ids branch 2 times, most recently from 3843794 to 1a79680 Compare December 17, 2024 13:45
@tiborsimko tiborsimko force-pushed the standardize-license-ids branch from 1a79680 to 69b149e Compare December 17, 2024 13:51
@tiborsimko tiborsimko merged commit 69b149e into master Dec 17, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

records: standardise license strings
2 participants