Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependencies cache all available license files #64

Merged
merged 4 commits into from
Jul 10, 2018
Merged

Conversation

jonabc
Copy link
Contributor

@jonabc jonabc commented Jul 9, 2018

Fixes #22.

This PR updates the licensee usage to cache all license texts associated with a dependency. The implementation changes

  • #license_text to account for using project.license_files over project.license_file
  • #matched_project_file -> project_license, which checks for (in order)
    1. A non-other licensee project.license 👍
    2. Multiple non-unique license files to match as other (should this match to "multiple"?)
    3. Legacy behavior that ends up calling into project.matched_files...
  • removes #project_file and instead moves #remote_license_file usage directly into #license_key

This is not a breaking change, however it will cause a mismatch between license text cached from licensed <= 1.2.0 and license text cached licensed > 1.2.0 for dependencies with multiple license files. In that scenario, the cached license metadata field will not be reused per #21

/cc @benbalter added you as reviewer for changes to how we're using licensee

@jonabc jonabc requested review from mlinksva and benbalter July 9, 2018 21:11
elsif project.license_files.map(&:license).uniq.size > 1
# if there are multiple license types found from license files,
# return 'other'
Licensee::License.find('other')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the elsif branch be deleted if && !project.license.other? were also deleted?

Copy link
Contributor Author

@jonabc jonabc Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it can. The call chain in licensee for this looks like

  • Project#license
  • Project#licenses_without_copyright
  • Project#matched_files
  • Project#project_files

#project_files pulls from #license_files, #package_manager_file and #readme_file. For the purposes of this PR I was calling out a special case when there were multiple, conflicting, license files.

Note that this doesn't include a scenario where there are multiple license files but they're all other. In that scenario there is still only a single license found from license files and we fall into the else case to look at the package manager and readme files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the purposes of this PR I was calling out a special case when there were multiple, conflicting, license files.

Doesn't licensee handle that case directly by itself?

https://github.com/benbalter/licensee/blob/d606b06860fd290693280060faee9c61cbf41c4b/lib/licensee/projects/project.rb#L26-L27

Copy link
Contributor Author

@jonabc jonabc Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really similar, but not exactly the same. What you linked is evaluating licenses from license files, package manager files and readme files. The one in my change is only looking at license files, where I use the term to mean files matching that regex e.g. LICENSE or COPYING.

Before this change, licensed used the first project.matched_files file that had a non-other license (ref). The issue I saw was that when using multiple license files, is it ok to say that one of them is more relevant than another?

For instance if there are both LICENSE and COPYING files that contain different licenses, is it ok to say that LICENSE takes precedence over COPYING?

The logic here is meant to check this scenario. It makes the assumption that rather than checking [license file 1, license file 2, ... , package manager file, README] one at a time as licensed was previously doing, it should be checking [[all license files], package manager file, README].

Though based on ☝️ I should probably replace the existing if check with project.license_files.map(&:license).uniq.size == 1.

Does that make any more sense? 💭 ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm talking about https://github.com/github/licensed/pull/64/files/4f04102e07388a2926d8362485aacd40641ee0d7#diff-cbc7d35cbf60fa522f24e7b0a2f73e6dR78

Does project.license there not evaluate licenses from license files, package manager files and readme files? If multiple are found and don't all indicate the same license, other is returned. I'm not sure when the code in 81-84 would obtain a different result, except where project.license returned other due to conflicting info in non-license files rather than multiple license files with conflicting info, but in that case other is still the right answer. I suspect I'm missing something obvious and should walk through more of the code to understand. 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does project.license there not evaluate licenses from license files, package manager files and readme files?

project.license from the if check there does, but project.license_files from the elsif check does not.

I'm not sure when the code in 81-84 would obtain a different result, except where project.license returned other due to conflicting info in non-license files rather than multiple license files with conflicting info, but in that case other is still the right answer.

Yep! That's exactly the case I was trying to describe. I don't know that my logic is correct either in implementation or intentions but maybe an example would be better.

Looking at the tests, I think the relevant test cases that describe this logic are finding licenses from non-license files and how to handle license files that don't match. Are the outcomes correct for those scenarios?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the logic a bit more conservative in ac4b4ce. If there are any potentially conflicting license indicators, we get other, requiring human review. Not attached to implementation of course.

@@ -96,21 +96,28 @@ def remote_license_file
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not part of this change, but just idly wondering why above two lines can't just be

@remote_license_file ||= Licensed.from_github(self["homepage"])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! It's because ||= doesn't stop Licensed.from_github(self["homepage"]) from being called again when a previous call returned nil

@project_license ||= if project.license && !project.license.other?
# if Licensee has determined a project to have a specific license, use it!
project.license
elsif project.license_files.map(&:license).uniq.size > 1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea @mlinksva and I were discussing that at #64 (comment).

TL;DR it's not quite the same due to project.license including additional sources over project.license_files

Licensee::License.find("other")
else
# check other project files if we haven't yet found a license type
project.licenses.reject(&:other?).first

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a project is dual licensed, it will pick the first? What if it's dual licensed as GPL and Apache, in that order? Perhaps we should look for a non-copyleft license first to avoid false positives when a project is dual licensed (e.g., "choose" to use Apache over GPL in that case).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added to maintain legacy behavior where licensed previously chose the first non-other license found from project.matched_files. If it's a good idea to intentionally change that behavior, 👍

leave one convenience: pull license text from github if no local
text and license on github is the same as asserted in package manager
@@ -99,25 +89,16 @@ def remote_license_file
# from the local LICENSE-type files, remote LICENSE, or the README, in that order
def license_text
content_files = Array(project.license_files)
content_files << remote_license_file if content_files.empty? && remote_license_file
content_files << remote_license_file if content_files.empty? && remote_license_file && remote_license_file.license.key == license_key
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're not using a remote file to find the license key, maybe we can 🔪 the remote file entirely?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me. As I wrote in the commit message, it is a minor convenience:

leave one convenience: pull license text from github if no local
text and license on github is the same as asserted in package manager

@jonabc
Copy link
Contributor Author

jonabc commented Jul 10, 2018

@mlinksva included a tiny cleanup to remove Dependency#project_license because project.license is already a cached value. If it looks good, want to 👍 ?

I'm going to open a separate PR for removing the remote fetching. That change is a bit more broad and shouldn't be in this PR.

@mlinksva mlinksva self-requested a review July 10, 2018 21:05
@jonabc jonabc merged commit 5d16b95 into master Jul 10, 2018
@jonabc jonabc deleted the cache-all-licenses branch July 10, 2018 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants