-
-
Notifications
You must be signed in to change notification settings - Fork 579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Override license detection by checksum #1281
Comments
* add new Apache detection rule to handle overlap case found in SignalR such that no unknown is returned * add new Apache detection rule to handle non-standard notice found in SignalR to return a higher score Reported-by: Jonas Öberg @jonasob Signed-off-by: Philippe Ombredanne <[email protected]>
* ms-net-library-2018-11 newly found in SignalR at same older URL https://www.microsoft.com/web/webpi/eula/net_library_eula_ENU.htm This is a modification of the previous ms-net-library license but is not versioned by Microsft and found exactly at the same URL * also rename and update older version rules Reported-by: Jonas Öberg @jonasob Signed-off-by: Philippe Ombredanne <[email protected]>
* improve detection based on SignalR scans with new rules and/or adding a stored relevance if needed Reported-by: Jonas Öberg @jonasob Signed-off-by: Philippe Ombredanne <[email protected]>
* Adssign unknown-license-reference to rules that were unknown by mistake Reported-by: Jonas Öberg @jonasob Signed-off-by: Philippe Ombredanne <[email protected]>
* improve detection based on SignalR repos scans with new rules and/or adding or updating the rule "relevance" as needed Reported-by: Jonas Öberg @jonasob Signed-off-by: Philippe Ombredanne <[email protected]>
@jonasob Thank you for the detailed report! See my answers inline. You wrote:
In these cases, the best way out is to add new license detection rules. This is what I did in this commit adf79c2#diff-8cc1a6e276d5f2058a1ca559b757b055R1 where I added a new rule that covers both Apache and the With this no I also added a few more rules refinement to adapt some of the peculiar things in this package. So adding new license rules / improving existing ones may be a better way IMHO as this works also across code changes and would typically apply to a class of packages from the same team/company that uses the same conventions. Now in the special case of things such as
Think of it as a way to de-reference such a license mention to the actual file it is referencing and to the license(s) present in these files. We are not yet there, but not far since all the bits are there data-wise and @gerv (RIP, Gervase!) has shown us that this can be done in his
This is rather involved and can surely work ... until you upgrade your package to a new version. In this case, all the whitelisting work is lost as the sha1 will have quite likely changed in many cases.
Note that @MaJuRG and @johnmhoran have been working on DeltaCode and tracking and detecting these kind of changes "in the large" is eventually part of the goal there. (Ping: correct?) Also, one thing could to also track the detected rule which will be always more specific than the detected license key. In anycase, there is something needed alright in this area, as some files may just have a reference to some file at the root such as here: https://github.com/SignalR/java-client/blob/master/signalr-client-sdk/src/main/java/microsoft/aspnet/signalr/client/ConnectionBase.java#L4
Definitely fits the roadmap, and we are on the same page. The diffing/change tracking could IMHO be best in DeltaCode https://github.com/nexB/deltacode or alternatively as a ScanCode plugin... BUT since in the end this is about storing a conclusion of sorts for a given package, I would consider using AboutCode toolkit for this (see for instance the .ABOUT files in https://github.com/nexB/scancode-toolkit/tree/develop/thirdparty ) and store the actual license that you determined to be the right one there (and or in a database-backed system ;) ) |
* add new Apache detection rule to handle overlap case found in SignalR such that no unknown is returned * add new Apache detection rule to handle non-standard notice found in SignalR to return a higher score Reported-by: Jonas Öberg @jonasob Signed-off-by: Philippe Ombredanne <[email protected]>
* ms-net-library-2018-11 newly found in SignalR at same older URL https://www.microsoft.com/web/webpi/eula/net_library_eula_ENU.htm This is a modification of the previous ms-net-library license but is not versioned by Microsft and found exactly at the same URL * also rename and update older version rules Reported-by: Jonas Öberg @jonasob Signed-off-by: Philippe Ombredanne <[email protected]>
* improve detection based on SignalR scans with new rules and/or adding a stored relevance if needed Reported-by: Jonas Öberg @jonasob Signed-off-by: Philippe Ombredanne <[email protected]>
* Adssign unknown-license-reference to rules that were unknown by mistake Reported-by: Jonas Öberg @jonasob Signed-off-by: Philippe Ombredanne <[email protected]>
* improve detection based on SignalR repos scans with new rules and/or adding or updating the rule "relevance" as needed Reported-by: Jonas Öberg @jonasob Signed-off-by: Philippe Ombredanne <[email protected]>
* add new Apache detection rule to handle overlap case found in SignalR such that no unknown is returned * add new Apache detection rule to handle non-standard notice found in SignalR to return a higher score Reported-by: Jonas Öberg @jonasob Signed-off-by: Philippe Ombredanne <[email protected]>
* ms-net-library-2018-11 newly found in SignalR at same older URL https://www.microsoft.com/web/webpi/eula/net_library_eula_ENU.htm This is a modification of the previous ms-net-library license but is not versioned by Microsft and found exactly at the same URL * also rename and update older version rules Reported-by: Jonas Öberg @jonasob Signed-off-by: Philippe Ombredanne <[email protected]>
* improve detection based on SignalR scans with new rules and/or adding a stored relevance if needed Reported-by: Jonas Öberg @jonasob Signed-off-by: Philippe Ombredanne <[email protected]>
* Adssign unknown-license-reference to rules that were unknown by mistake Reported-by: Jonas Öberg @jonasob Signed-off-by: Philippe Ombredanne <[email protected]>
* improve detection based on SignalR repos scans with new rules and/or adding or updating the rule "relevance" as needed Reported-by: Jonas Öberg @jonasob Signed-off-by: Philippe Ombredanne <[email protected]>
* improve detection primarily for license file references and mostly for things from Microsoft Reported-by: Jonas Öberg @jonasob Signed-off-by: Philippe Ombredanne <[email protected]>
@jonasob any feedback on my reply? |
@jonasob Several new licenses rules have been merged in develop. As for the checksum/whitelisting, I am waiting for you feedback |
Thanks @pombredanne! I agree that improving the matching rules is preferred in most cases, though that's slightly more involved than whitelisting, and I do fear it's a bit beyond my capabilities as I can easily see how changing some matching rules might have unforeseen consequences elsewhere! Regarding where to place the whitelisting, I'm also undetermined about that. I agree that ScanCode doesn't quite fit the bill, and I would expect ScanCode to report what it finds, and no more or less, and then other tools can pick that information up and do changes in post-processing, if an Upstream First strategy turned out not to work :) So I guess I land in:
|
I think there is still a need for whitelisting somehow for some use cases. |
I don't know the architecture, but while I think an output filter makes sense, I also can not help but shake the feeling that the earlier something like that is introduced in the pipeline, the less problematic it becomes. Do we have any structure to hook this into a cache of sorts? If there was the possibility to hook in configurable cache options, potentially with multiple caches, then we could end up with a situation where there's one r/w cache for caching data between runs, and one could add separate r/o databases which is essentially whitelists or overrides. |
While scanning some repositories, notably SignalR, we routinely come across files identified with an unknown license. Typically, this is because the file both mentions a license, and makes a reference to an external license file, where the latter is then matched as unknown. See for instance
unknown_19.RULE
for a popular one in SignalR.In conjunction with defining a policy with
--license-policy
, this is not ideal: you hardly want to claim neither yay nor nay about unknown licenses, without looking at the files in question individually.What I've ended up doing is that I adapted our wrapper around ScanCode with a post-process step:
unknown
license match, check inwhitelist.txt
, which for each line contains a sha1:license tuple.unknown
, output this as part of the report in the same format for manual identification and addition towhitelist.txt
.This could probably be improved quite a bit if done in ScanCode, but it would move ScanCode further in the direction of being a compliance toolkit, rather than a scanner, so it might be that it wouldn't fit the roadmap.
The text was updated successfully, but these errors were encountered: