Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor & restructure LicenseEvidenceFetcher #1

Conversation

jkowalleck
Copy link

@jkowalleck jkowalleck commented Nov 8, 2024

this copies the existing art from https://github.com/CycloneDX/cyclonedx-webpack-plugin/blob/master/src/extractor.ts

I'd rather take it as is - with all possible flaws - and improve it in a later issue/PR, than mixing too many things at a time.

@jkowalleck jkowalleck marked this pull request as draft November 8, 2024 10:26
@jkowalleck jkowalleck changed the title refactor & restructure LicenseEvidenceFetcher [WIP] refactor & restructure LicenseEvidenceFetcher Nov 8, 2024
src/_helpers.ts Show resolved Hide resolved
}
const LICENSE_FILENAME_PATTERN = this.#LICENSE_FILENAME_PATTERN
const logger = this.console
return async function * (pkg: Package): AsyncGenerator<License> {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coppied from webpack implementation with the needed adjustments for yarn

@jkowalleck jkowalleck marked this pull request as ready for review November 8, 2024 11:23
@jkowalleck
Copy link
Author

@AugustusKling reverted most of your work and dump copied over the existing art.

it showes some minor flaws, that i would love to tackle in an independent issue/PR

@jkowalleck jkowalleck changed the title [WIP] refactor & restructure LicenseEvidenceFetcher refactor & restructure LicenseEvidenceFetcher Nov 8, 2024
src/_helpers.ts Show resolved Hide resolved

interface BomBuilderOptions {
omitDevDependencies?: BomBuilder['omitDevDependencies']
metaComponentType?: BomBuilder['metaComponentType']
reproducible?: BomBuilder['reproducible']
shortPURLs?: BomBuilder['shortPURLs']
gatherLicenseTexts: BomBuilder['gatherLicenseTexts']
gatherLicenseTexts?: BomBuilder['gatherLicenseTexts']
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to have this optionally. The parameter definition of Clipanion has a default value of false set for the gather-license-texts parameter. Therefore the boolean is always present, either true or false.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

understood.
I went with the code "style" the existing interface had.
I think it should be changed/improved in a dedicated pullrequest.

Changing everything at once makes review hard; keeping up the changes and intended outcome gets harder when modifying out-of-scope code.


this.console = console_
}

async buildFromWorkspace (workspace: Workspace): Promise<Bom> {
// @TODO make switch to disable load from fs
const fetchManifest: ManifestFetcher = await this.makeManifestFetcher(workspace.project)
const fetchLicenseEvidences: LicenseEvidenceFetcher = await this.makeLicenseEvidenceFetcher(workspace.project)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get your argument from CycloneDX#193 (review) but 2 separate fetchers means that all packages have to be extracted twice. This is a waste resource-wise.

We should therefore have only 1 fetcher but not call it manifest fetcher nor license evidence fetcher. The fetcher is something general that makes package contents available. Depending on the linker used it goes to a real file system or a virtual one like the content of ZIP files.

What do you think about letting the fetcher take multiple extractors? Then the fetcher could open the packages (using fetcher.fetch(...)), call all extractors and close the packages (using releaseFs()). In between one extractor could get the package.json while the other could search for license evidence files.

Copy link
Author

@jkowalleck jkowalleck Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally understand. During development of this patch, I've made attempts of extracting/refactoring the fetcher. I rolled back these results, as I thought they were premature optimizations.

I would like to have the unoptimized most-basic solution in place first, so any optimization is measurable easily, and can be introduced in an isolated pullrequest.
A process for this would be:

  1. introduce a new feature as a minimum viable product (MVP)- merge it
  2. test/benchmark this product and baseline your outcome
  3. identify what can/must be improved - and write a very brief ticket/report for it
  4. pullrequest a scoped fix for the issues

Changing all at once might be appealing, but it makes reviews harder and collaboration processes slower.
This is why I encourage small/scoped increments. :-D

src/builders.ts Outdated
// option `withFileTypes:true` is not supported and casues crashes
files = packageFs.readdirSync(prefixPath)
} catch (e) {
logger?.warn('collecting license evidence in', prefixPath, 'failed:', e)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whilst I prefer to simply call console.warn(...) like here, it does not follow the style of cyclonedx-node-yarn which uses console.warn('WARN | ...) in the other code.

Copy link
Author

@jkowalleck jkowalleck Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for spotting this.
fixed it via 2973e59

src/builders.ts Outdated
try {
// option `withFileTypes:true` is not supported and casues crashes
files = packageFs.readdirSync(prefixPath)
} catch (e) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This catch should not be here as it hides errors. The warning message is not enough.

I'd assume users of cyclonedx-node-yarn would call it as part of a build pipeline and expect a non zero exit code if anything fails that affects the produced SBOM. It's better to explode than to log a human-friendly message.

Copy link
Author

@jkowalleck jkowalleck Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this over from webpack implementation, without thinking too much.
With webpack, the SBOM is a side-artifact that is generated during build time, and it is designed to not break build processes.

the yarn plugin is called explicitly, and - like you said - it is expected to explode on errors.
I will change this accordingly.
PS: done via 643f95e


const contentType = getMimeForTextFile(file)
if (contentType === undefined) {
continue
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not silently skip over files matching LICENSE_FILENAME_PATTERN.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current implementation is a copy/past of existing art.
lets discuss improvements in a different issue/pullrequest (see #1 (comment))

src/builders.ts Outdated
manifest.name = pkg.scope ? `@${pkg.scope}/${pkg.name}` : pkg.name
manifest.version = pkg.version
const component = this.makeComponent(pkg, manifest, type)
if (component instanceof Component) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how makeComponent could conditionally return an instance. I'd guess something is wrong if we do not get an Component instance here and thus the code should abort execution.

Copy link
Author

@jkowalleck jkowalleck Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets discuss the out-of-scope improvements in a different issue/pullrequest (see #1 (comment))

Signed-off-by: Jan Kowalleck <[email protected]>
Signed-off-by: Jan Kowalleck <[email protected]>
Signed-off-by: Jan Kowalleck <[email protected]>
Signed-off-by: Jan Kowalleck <[email protected]>
@jkowalleck
Copy link
Author

exising art got some fixes: CycloneDX/cyclonedx-webpack-plugin#1339

will update this very PR accordingly, soon

@jkowalleck
Copy link
Author

updated to match latest existing art - via 1dad58e

@jkowalleck
Copy link
Author

@AugustusKling ,

how do you see the current state of this PR? Would you be okay with merging it and have it published and tested downstream?

I mean, this software is not meant to be final, it is intended to be improved step-by-step over time, right?

@@ -35997,20 +35993,12 @@
],
"evidence": {
"licenses": [
{
"license": {
"name": "file: LICENSE.APACHE2",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be found. The library allows using it with BSD-2-Clause OR MIT OR Apache-2.0

@AugustusKling
Copy link
Owner

@AugustusKling ,

how do you see the current state of this PR? Would you be okay with merging it and have it published and tested downstream?

I mean, this software is not meant to be final, it is intended to be improved step-by-step over time, right?

Firstly, sorry for leaving this for many days. I would like to spend more time to improve this tool but time is sadly limited.

We can merge this in my opinion as it is. When I compare with my PR to your repo https://github.com/CycloneDX/cyclonedx-node-yarn/pull/193/files then I think these things should be addressed soon (before or after merge):

  • Add --gather-license-texts to /README.md file, maybe with a warning that it's not reliable yet
  • Take over changes from /tests/README.md file

These improvement could be addressed in follow-up PRs:

Just make sure to call the license extraction experimental until we can trust it does not silently skip over errors.

@jkowalleck
Copy link
Author

[...] sorry for leaving this for many days.

No need to apologize, I totally understand that everybody has their own time/budget for working on OSS. You are doing great.

We can merge this in my opinion as it is. [...] I think these things should be addressed soon (before or after merge):

Feel free to merge my changes to your own branch, and apply the mentioned changes as needed.
We then might continue discussions in CycloneDX#193 - to identify follow-up tasks and changes for improvement.

merging this PR is expected to automatically update the existing CycloneDX#193

These improvement could be addressed in follow-up PRs. [...]

Fully agree.

Just make sure to call the license extraction experimental until we can trust it does not silently skip over errors.

Thank you for calling this out.
Stating the experimental nature in the README and in the CLI help page is probably the best we can do.

@AugustusKling AugustusKling merged commit 7696ebf into AugustusKling:gather-license-texts Dec 4, 2024
@jkowalleck jkowalleck deleted the jk_gather-license-texts_changes branch December 4, 2024 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants