-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dependencies cache all available license files #64
Conversation
lib/licensed/dependency.rb
Outdated
elsif project.license_files.map(&:license).uniq.size > 1 | ||
# if there are multiple license types found from license files, | ||
# return 'other' | ||
Licensee::License.find('other') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the elsif branch be deleted if && !project.license.other?
were also deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it can. The call chain in licensee
for this looks like
Project#license
Project#licenses_without_copyright
Project#matched_files
Project#project_files
#project_files
pulls from #license_files
, #package_manager_file
and #readme_file
. For the purposes of this PR I was calling out a special case when there were multiple, conflicting, license files.
Note that this doesn't include a scenario where there are multiple license files but they're all other. In that scenario there is still only a single license found from license files and we fall into the else
case to look at the package manager and readme files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the purposes of this PR I was calling out a special case when there were multiple, conflicting, license files.
Doesn't licensee handle that case directly by itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's really similar, but not exactly the same. What you linked is evaluating licenses from license files, package manager files and readme files. The one in my change is only looking at license files, where I use the term to mean files matching that regex e.g. LICENSE or COPYING.
Before this change, licensed
used the first project.matched_files
file that had a non-other license (ref). The issue I saw was that when using multiple license files, is it ok to say that one of them is more relevant than another?
For instance if there are both LICENSE
and COPYING
files that contain different licenses, is it ok to say that LICENSE
takes precedence over COPYING
?
The logic here is meant to check this scenario. It makes the assumption that rather than checking [license file 1, license file 2, ... , package manager file, README]
one at a time as licensed
was previously doing, it should be checking [[all license files], package manager file, README]
.
Though based on ☝️ I should probably replace the existing if
check with project.license_files.map(&:license).uniq.size == 1
.
Does that make any more sense? 💭 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm talking about https://github.com/github/licensed/pull/64/files/4f04102e07388a2926d8362485aacd40641ee0d7#diff-cbc7d35cbf60fa522f24e7b0a2f73e6dR78
Does project.license
there not evaluate licenses from license files, package manager files and readme files? If multiple are found and don't all indicate the same license, other
is returned. I'm not sure when the code in 81-84 would obtain a different result, except where project.license
returned other
due to conflicting info in non-license files rather than multiple license files with conflicting info, but in that case other
is still the right answer. I suspect I'm missing something obvious and should walk through more of the code to understand. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does project.license there not evaluate licenses from license files, package manager files and readme files?
project.license
from the if check there does, but project.license_files
from the elsif check does not.
I'm not sure when the code in 81-84 would obtain a different result, except where project.license returned other due to conflicting info in non-license files rather than multiple license files with conflicting info, but in that case other is still the right answer.
Yep! That's exactly the case I was trying to describe. I don't know that my logic is correct either in implementation or intentions but maybe an example would be better.
Looking at the tests, I think the relevant test cases that describe this logic are finding licenses from non-license files and how to handle license files that don't match. Are the outcomes correct for those scenarios?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the logic a bit more conservative in ac4b4ce. If there are any potentially conflicting license indicators, we get other
, requiring human review. Not attached to implementation of course.
@@ -96,21 +96,28 @@ def remote_license_file | |||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not part of this change, but just idly wondering why above two lines can't just be
@remote_license_file ||= Licensed.from_github(self["homepage"])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question! It's because ||=
doesn't stop Licensed.from_github(self["homepage"])
from being called again when a previous call returned nil
lib/licensed/dependency.rb
Outdated
@project_license ||= if project.license && !project.license.other? | ||
# if Licensee has determined a project to have a specific license, use it! | ||
project.license | ||
elsif project.license_files.map(&:license).uniq.size > 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this logic is already encapsulated in project.license
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea @mlinksva and I were discussing that at #64 (comment).
TL;DR it's not quite the same due to project.license
including additional sources over project.license_files
lib/licensed/dependency.rb
Outdated
Licensee::License.find("other") | ||
else | ||
# check other project files if we haven't yet found a license type | ||
project.licenses.reject(&:other?).first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a project is dual licensed, it will pick the first? What if it's dual licensed as GPL and Apache, in that order? Perhaps we should look for a non-copyleft license first to avoid false positives when a project is dual licensed (e.g., "choose" to use Apache over GPL in that case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was added to maintain legacy behavior where licensed
previously chose the first non-other license found from project.matched_files
. If it's a good idea to intentionally change that behavior, 👍
leave one convenience: pull license text from github if no local text and license on github is the same as asserted in package manager
lib/licensed/dependency.rb
Outdated
@@ -99,25 +89,16 @@ def remote_license_file | |||
# from the local LICENSE-type files, remote LICENSE, or the README, in that order | |||
def license_text | |||
content_files = Array(project.license_files) | |||
content_files << remote_license_file if content_files.empty? && remote_license_file | |||
content_files << remote_license_file if content_files.empty? && remote_license_file && remote_license_file.license.key == license_key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we're not using a remote file to find the license key, maybe we can 🔪 the remote file entirely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine with me. As I wrote in the commit message, it is a minor convenience:
leave one convenience: pull license text from github if no local
text and license on github is the same as asserted in package manager
@mlinksva included a tiny cleanup to remove I'm going to open a separate PR for removing the remote fetching. That change is a bit more broad and shouldn't be in this PR. |
Fixes #22.
This PR updates the
licensee
usage to cache all license texts associated with a dependency. The implementation changes#license_text
to account for usingproject.license_files
overproject.license_file
#matched_project_file
->project_license
, which checks for (in order)project.license
👍other
(should this match to "multiple"?)project.matched_files...
#project_file
and instead moves#remote_license_file
usage directly into#license_key
This is not a breaking change, however it will cause a mismatch between license text cached from
licensed <= 1.2.0
and license text cachedlicensed > 1.2.0
for dependencies with multiple license files. In that scenario, the cached license metadata field will not be reused per #21/cc @benbalter added you as reviewer for changes to how we're using
licensee