Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow custom licensee behavior overrides #455

Merged
merged 7 commits into from
Feb 24, 2022
Merged

Conversation

jonabc
Copy link
Contributor

@jonabc jonabc commented Feb 13, 2022

closes #453

cc @villelahdenvuo 👋 this turned out to be really straightforward to implement so I went ahead and made the change rather than putting details in #453

This change allows users to customize licensee's behavior based on configuration set in the licensed configuration file. Currently the only allowed customization is on the confidence threshold, however it could make sense to allow users to disable matching licenses from readme and package manager files if they choose to do so 🤷

Licensee supports changing the confidence threshold via a global setting Licensee.confidence_threshold. I'm setting (and restoring) the global value when evaluating each app config for the licensed cache command. I don't believe any other licensed commands are affected by customizing licensee. Setting the threshold per app config allows users to make use of licensed's inherited configuration and overrides to change the value for all, some, or only one configured app.

I've added documentation for customizing licensee behavior, however I'm intentionally not adding an example to the common configuration values example. Changing this value should be done intentionally, and I don't want anyone setting up a new project that might copy/paste the example configuration shown on that page to inadvertently customize licensee's behavior.

cc @mlinksva @benbalter FYI on this change to licensed<>licensee interactions

@jonabc
Copy link
Contributor Author

jonabc commented Feb 13, 2022

The licensed-ci GitHub Action has created a pull request containing license metadata updates based on the changes in this branch.

Please review the pull request for any additional changes required and merge when ready.

License updates for configurable-confidence
Copy link
Contributor

@villelahdenvuo villelahdenvuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@benbalter
Copy link

This approach makes sense to me and I could see being able to define further configurations down the line, if there was interest (e.g., a more fully supported .licesee.yml pattern that licensed could plug into the Ruby API for). Happy to help with things on the licensee side if/when that happens.

@jonabc
Copy link
Contributor Author

jonabc commented Feb 14, 2022

@benbalter is there an ETA on the next licensee release? I could probably get by on merging and releasing this change without licensee/licensee#533, but I'd prefer to wait if a new release could happen in the near term

@benbalter
Copy link

is there an ETA on the next licensee release?

Looking at the changelog, it looks like it could be a patch release, so I don't see any reason we couldn't get one out this week (which reminds me we should automate that to happen automatically when a tag is pushed up).

@jonabc
Copy link
Contributor Author

jonabc commented Feb 23, 2022

@benbalter 👋 ping on the licensee release. Do you have time to put out a new release this week?

@benbalter
Copy link

Do you have time to put out a new release this week?

Done. https://github.com/licensee/licensee/releases/tag/v9.15.2

@jonabc
Copy link
Contributor Author

jonabc commented Feb 24, 2022

The licensed-ci GitHub Action has created a pull request containing license metadata updates based on the changes in this branch.

Please review the pull request for any additional changes required and merge when ready.

@jonabc jonabc merged commit fbb53f3 into master Feb 24, 2022
@jonabc jonabc deleted the configurable-confidence branch February 24, 2022 18:53
jonabc added a commit that referenced this pull request Feb 24, 2022
## 3.5.0

2022-02-24

### Added

- [Licensee](https://github.com/licensee/licensee) confidence thresholds can be configured in the licensed configuration file (#455)
@jonabc jonabc mentioned this pull request Feb 24, 2022
@@ -0,0 +1,13 @@
# Customize Licensee's behavior

Licensed uses [Licensee](https://github.com/licensee/licensee) to detect and evaluate OSS licenses for project dependencies found during source enumeration. Licensed can optionally [customize Licensee's behavior](https://github.com/licensee/licensee/blob/jonabc-patch-1/docs/customizing.md#customizing-licensees-behavior) based on options set in the configuration file.
Copy link
Contributor

@villelahdenvuo villelahdenvuo Feb 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed you link to your own patch branch, probably should be linked to the main branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah good call

@villelahdenvuo
Copy link
Contributor

villelahdenvuo commented Feb 28, 2022

@jonabc I just updated to the latest version, but it seems like it's not working.

I'm running the licensed-ci workflow on my repo and getting these results:

Checking cached dependency records for ecom-api
Bundle install step:
  Using licensee 9.15.2
  Using licensed 3.5.0

..................................................................F............................................
...............................F.............F.......F............FF.FF......................F........F..
........F.F.......FF................F...............................................F......................
............................F........F..........FF......F........F........F......FF.F....FF............
.................F....F..............................FF..
Errors:
* ecom-api.npm.crypto-js
  filename: /home/runner/work/ecom-api/ecom-api/.licenses/npm/crypto-js.dep.yml, version: 4.1.1, license: other, allowed: false
    - license needs review: other

* ecom-api.npm.dom-serializer
  filename: /home/runner/work/ecom-api/ecom-api/.licenses/npm/dom-serializer.dep.yml, version: 1.3.2, license: other, allowed: false
    - license needs review: other
...
Licensed config
# spell-checker: disable
root: true
cache_path: ../.licenses

licensee:
  # the confidence threshold is an integer between 1 and 100. the value represents
  # the minimum percentage confidence that Licensee must have to report a matched license
  # https://github.com/licensee/licensee/blob/jonabc-patch-1/docs/customizing.md#adjusting-the-confidence-threshold
  confidence_threshold: 85

sources:
  npm: true

allowed:
  - 0bsd
  - apache-2.0
  - bsd-2-clause
  - bsd-3-clause
  - cc0-1.0
  - isc
  - mit
  - mpl-2.0
  - unlicense
  - wtfpl

ignored:
  npm:
    - '@grano/**' # Duh.
    # Paid licenses.
    - '@fortawesome/**'
    - font-awesome
    - '@devexpress/**'
    - devexpress*
    - '@devextreme/**'
    - devextreme*
    # No license text found
    - ap # MIT/X11
    - map-values # Public Domain
    - jsonify # Public Domain
    # Others
    - fsevents # MIT
    - lodash* # MIT

reviewed:
  npm:
    - '@elastic/elasticsearch' # Apache-2.0
    - amazon-cognito-identity-js # Apache-2.0
    - argparse # Python-2.0
    - color-convert # MIT

However, if I run licensee myself it seems to work:

➜  node_modules git:(develop) ✗ licensee detect --confidence 85 crypto-js
License:        MIT
Matched files:  LICENSE, package.json
LICENSE:
  Content hash:  f3b84bd5191f29915441aeadfdeb905509b97d90
  Confidence:    85.32%
  Matcher:       Licensee::Matchers::Dice
  License:       MIT
  Closest non-matching licenses:
    MIT similarity:    85.32%
    MIT-0 similarity:  66.67%
    NCSA similarity:   60.90%
package.json:
  Confidence:  90.00%
  Matcher:     Licensee::Matchers::NpmBower
  License:     MIT
➜  node_modules git:(develop) ✗ licensee detect --confidence 85 dom-serializer
License:        MIT
Matched files:  LICENSE, package.json
LICENSE:
  Content hash:  224318321aaf83faf7df25efcaad6ce4936fcf43
  Confidence:    89.42%
  Matcher:       Licensee::Matchers::Dice
  License:       MIT
  Closest non-matching licenses:
    MIT similarity:    89.42%
    MIT-0 similarity:  69.42%
    NCSA similarity:   60.54%
package.json:
  Confidence:  90.00%
  Matcher:     Licensee::Matchers::NpmBower
  License:     MIT

P.S. Maybe it would be nice to list all the license matches in the cache output to ease the review process.

@villelahdenvuo
Copy link
Contributor

After running licensed env it looks like the licensee config is ignored:

---
git_repo: true
apps:
- name: ecom-api
  source_path: "/home/runner/work/ecom-api/ecom-api"
  cache_path: "/home/runner/work/ecom-api/ecom-api/.licenses"
  sources:
  - name: ecom-api.npm
  allowed:
  - 0bsd
  - apache-2.0
  - bsd-2-clause
  - bsd-3-clause
  - cc0-1.0
  - isc
  - mit
  - mpl-2.0
  - unlicense
  - wtfpl
  ignored:
    npm:
    - "@grano/**"
    - "@fortawesome/**"
    - font-awesome
    - "@devexpress/**"
    - devexpress*
    - "@devextreme/**"
    - devextreme*
    - ap
    - map-values
    - jsonify
    - fsevents
    - lodash*
  reviewed:
    npm:
    - "@elastic/elasticsearch"
    - amazon-cognito-identity-js
    - argparse
    - color-convert
  version_strategy: git
  root: "/home/runner/work/ecom-api/ecom-api/grano-best-practices"

@jonabc
Copy link
Contributor Author

jonabc commented Feb 28, 2022

👋 @villelahdenvuo the licensed env output is not necessarily accurate, it looks like it's hardcoded to include the information you see ☝️ . I'll take a look at whats going on ASAP

@villelahdenvuo
Copy link
Contributor

@jonabc Did you have a chance to look at this? Also I think you missed my earlier comment about the doc links being broken.

@jonabc
Copy link
Contributor Author

jonabc commented Mar 2, 2022

Sorry no I haven't gotten a chance yet. I'm going to try to look at it today or tomorrow but I can't promise anything, I've been very busy this week. Is this blocking you from being able to use licensed?

@villelahdenvuo
Copy link
Contributor

No worries, in the end I added all the exceptions in the config, so I can use it, but it would be nice to remove as many of the manually checked licenses as possible. If you want I can open a new issue about this so you have it as a reminder.

@jonabc
Copy link
Contributor Author

jonabc commented Mar 2, 2022

@villelahdenvuo I took a quick look - have you recached licenses since updating the license config with the licensee threshold? License determination is only done when caching dependencies, where licensed status is primarily reading/reporting on the cached files that are available.

I tested [email protected] in a repro project. With the confidence threshold value set to 85 it was able to set the "mit" license, vs "other" without the threshold set.

One note - I found a bug while testing this where using the licensed cache --force option will not set the new license classification when the license text is unchanged. So if you changed the confidence threshold and a different license was detected the new license wouldn't be written to the cached file because the existing cached license text wouldn't have changed. The short term workaround is to delete any files that you need updated and let licensed regenerate the file with the new classification 👍

@villelahdenvuo
Copy link
Contributor

That makes sense, I'll delete the cache records and try again.

@villelahdenvuo
Copy link
Contributor

@jonabc Do you think it would make sense to calculate a hash of the licensee config and add it as a part of the license cache records to make the cache invalidation automatic? If so, I can make a new issue about that.

@jonabc
Copy link
Contributor Author

jonabc commented Mar 3, 2022

@villelahdenvuo 🤔 I'm not certain. It's definitely useful when the classification would change from other to non-other, but there's still a chance that invalidating based on licensee config changes could cause a manually classified non-other to be changed to other.

I'm also not sure if you are suggesting that licensed should invalidate the cache on other operations like licensed status. I think in those cases I'd still prefer to avoid updating cached files during a status check.

please do open an issue though! this is an interesting problem. In the meantime I'm going to try to fix the direct issue where --force isn't working as intended.

@villelahdenvuo
Copy link
Contributor

@jonabc I see, I think the use of "cache" here is a bit misleading, because cache is usually considered something that is meant to speed up processing, but I guess with licensed people can use it to manually review licenses, so it's more like a license database.

I thought that the reviewing was meant to be done in the licensed configuration file and the cached records shouldn't be touched.

I'm not sure what kind of issue I should make for this. Maybe it's a documentation issue after all.

@jonabc
Copy link
Contributor Author

jonabc commented Mar 4, 2022

I'm not sure what kind of issue I should make for this. Maybe it's a documentation issue after all.

Yeah I'm not too sure either 😅 . I guess the question I'd ask is "what could be improved?", and if anything comes to mind then please do open an issue. Terminology and words in general can be a hard problem.

@villelahdenvuo
Copy link
Contributor

@jonabc I opened #475 to address the documentation issue. I think for me the biggest issue in the beginning was that there was no "getting started" section, only list of commands I had to guess how they work together to build a workflow (licensed-ci helped there), but it still didn't clarify what's the expected/default workflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NPM: Some licenses not detected correctly
3 participants