-
Notifications
You must be signed in to change notification settings - Fork 316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve unmanaged file checks #3145
Conversation
analyzer/src/main/kotlin/Analyzer.kt
Outdated
@@ -40,6 +40,7 @@ import org.ossreviewtoolkit.model.VcsInfo | |||
import org.ossreviewtoolkit.model.config.AnalyzerConfiguration | |||
import org.ossreviewtoolkit.model.config.RepositoryConfiguration | |||
import org.ossreviewtoolkit.model.readValue | |||
import org.ossreviewtoolkit.spdx.VCS_DIRECTORIES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe doing this commit as-is
is wrong. Let's consider an example:
project/package.json project/src/include/x.h project/src/include/x.cpp
If the above is a native project (where we do not support any package manager) applying this commit would eliminate the unmnaged project. All files would be associated with the package.json
. Then you want to exclude the package.json
, because the project is native only. If you do so, you effectively exclude also all native code as it gets only associated with the package-json
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it's correct that before this PR an unmanaged project would have been created, the project/src/include/x.h
and project/src/include/x.cpp
files from your example would not be associated to it, but to the NPM project defined by project/package.json
. In general, every file is associated to the "closest" project (defined by a definition file) when walking along the parents in directory tree.
So when you say
All files would be associated with the
package.json
.
then that's correct, but unless I'm terribly mistaken, it's exactly how it was before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would not be associated to it, but to the NPM project defined by
project/package.json
.
Are you sure this is true? I believe the files are assigned to both projects, unmanaged
and project/package.json
.
In general, every file is associated to the "closest" project
mind sharing a reference for this?
then that's correct, but unless I'm terribly mistaken, it's exactly how it was before.
I believe you are mistaken, but it could also be me. Let's figure it out. I should look into it again myself, but I have some urgent things right now todo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually all files are associated to all definition files found in their parent directories. For example if you have:
- README.md
- a
- build.gradle
- b
- package.json
- c
- main.cpp
Then main.cpp
would be associated with the Unmanaged, Gradle and NPM projects, because we never filter subdirectories that contain other projects. I have suggested we do that at least for Unmanaged in the past, because the idea of this fake package manager was to make sure that also files that cannot be associated to a project get scanned, but it was never implemented.
@fviernau The scenario you describe above would be a weird project setup, but if the package.json was in the root directory you would have exactly the same issue already with the existing code. A workaround could be to not exclude the project but only all dependency scopes, and I wonder if that would not be sufficient, because Unmanaged was never intended to solve the problem you describe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scenario you describe above would be a weird project setup
because?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scenario you describe above would be a weird project setup
because?
Because you shouldn't put *.h
/ *.cpp
files which are C++ files in the directory tree of an (unrelated) NPM project where only *.js
files are expected. Or is this some funky-special NPM project that implements a native addon and the *.h
/ *.cpp
files are in fact part of the NPM project?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because you shouldn't put *.h / *.cpp files which are C++ files in the directory tree of an (unrelated) NPM project where only *.js files are expected. Or is this some funky-special NPM project that implements a native addon and the *.h / *.cpp files are in fact part of the NPM project?
Agree that one shouldn't do that setup (probably) but should ORT really say that it doesn't support such setups ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that one shouldn't do that setup (probably) but should ORT really say that it doesn't support such setups ?
Actually, I indeed would be fine with ORT not supporting such setups. Also in other areas (like maintaining metdata) we leverage ORT to foster best engineering practices. So why not here?
Anyway, that was also not your original question; your original question was why it's weird. And you seem to agree it's weird now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scenario you describe above would be a weird project setup
because?
Because why do you have a package.json lying around in your native code project?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would not be associated to it, but to the NPM project defined by
project/package.json
.Are you sure this is true? I believe the files are assigned to both projects,
unmanaged
andproject/package.json
.
For reference, turns out that @fviernau was right, and scan findings are not only associated to the "closest" project, but to all projects on the way up to the root... also see #5365.
852ebe3
to
0d56264
Compare
I've split out the uncontroversial changes to #3377. |
0d56264
to
e6b9dc7
Compare
e6b9dc7
to
e213806
Compare
e213806
to
7a160f3
Compare
d183daa
to
d7f54ea
Compare
@fviernau I've found a bug in the implementation and have slightly rewritten it for more clarity. Please have another look. |
I gave this some thought again. The only issue I found so far is related to the semantics of Currently the semantics is: "A project is excluded if and only if the definition file is excluded". See [1], [2]. This means that if I only exclude a the definition file of a project, but not the > 0 additionally associated files, This exact situation I believe may break the NOTICE files in our internal setup. I haven't checked the evaluator but I'd suspect an analog issue there. What do you think @sschuberth @mnonnenmacher ? Note: I believe it should be possible to exclude a definition file without excluding all project files. So, change the implementation so that a (project) file should be included in the notices if there is no matching path exclude for it.
[3]
[4]
|
Codecov Report
@@ Coverage Diff @@
## main #3145 +/- ##
=============================================
- Coverage 54.62% 43.92% -10.70%
+ Complexity 1504 513 -991
=============================================
Files 260 186 -74
Lines 13899 9584 -4315
Branches 1960 1467 -493
=============================================
- Hits 7592 4210 -3382
+ Misses 5581 4902 -679
+ Partials 726 472 -254
Continue to review full report at Codecov.
|
@sschuberth Unfortunately we would like to put this on hold from HERE side until we have found another solution for a special case where we have to exclude definition files, but must not exclude the source code from the same directory. Currently the |
8ff2d46
to
5dbbe2f
Compare
As a side note, this PR also fixes the case where two projects (a real one and an unmanaged one) are created when analyzing only a single definition file. I.e.
becomes just
|
5dbbe2f
to
5a21b04
Compare
Concerns are probably obsolete by now.
@mnonnenmacher please have a look again. |
So far, if the directory structure was like root-dir | +-.git | +-project-dir still an unmanaged project would have been created because no definition file is in the root directory, and the project directory is regarded as an unmanaged "file", even though there are not really any unmanaged files in the root. Fix this by a introducing a check that removes all managed paths (or ignored files) from all files in the root. Signed-off-by: Sebastian Schuberth <[email protected]>
5a21b04
to
0d903e4
Compare
I have marked this with the breaking change label, because it could lead to unexpected results if the ort.yml contains configuration for the unmanaged project. |
Please have a look at the individual commit messages for the details.