-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Validate Format functionality #580
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…for a specified file. This is a hidden verb that I imagine being used during development, and in future by DRI to explore an SBOM or troubleshoot a validation failure.
…es section from an SPDX 2.x format SBOM. This is a naive implementation that does *not* leverage the JSON object streaming mechanism that is used for drop validation. The reason for this is that it is fairly complex to amend that logic to customize the validation behavior. Since we are operating on a very tight timeline, the preliminary redaction implementation will rely on deserializing the entire JSON file at once, on the assumption that will cover a large proportion of our in-the-wild use cases. A follow-on investigation will be conducted to determine the practical limits of this technique and necessity for extending to support larger SBOMs. The criteria for a "valid" SBOM include: - well-formed JSON document - includes required SPDX elements - SPDX v2.x Validating SBOMs for SPDX elements covering the NTIA definition of an SBOM is out of scope for this change. Such validation may not be practical when processing 3P SBOMs since an SBOM creator could choose to use different attributes to satisfy the NTIA requirements. In context of redaction, we will skip deserializing the files section (since removing it is the purpose of redaction anyway - this is also expected to reduce the size of an SBOM by ~50%). We will also require Packages and Relationships, since we need to operate on those in order to successfully redact. These ignore/require expectations are set at runtime but currently hardcoded; making them configurable is out of scope for this feature but definitely feasible. The JSON validation and SPDX required elements are explicitly and implicitly enforced by the JSON model definition and deserialization. This also implicitly enforces the SPDX version since v3 is so different in format that the deserialization would fail before we ever got to a version check. Nonetheless, we make our expectations explicit by also parsing and verifying the spdxVersion value. Expanding support for all SPDX 2.x was in scope for this feature, and was accomplished by eliminating the use of enums for deserialization (an earlier change) and adding to the JSON model all properties found in the SPDX 2.3 exemplar document (https://github.com/spdx/spdx-spec/blob/development/v2.3.1/examples/SPDXJSONExample-v2.3.spdx.json). SPDX 2.3 is backwards compatible with earlier 2.x versions. Note that this expanded SPDX version support *only* applies to the Validate Format and Redact functionality, *not* to Generate or Drop Validation. Adding all documented properties to the model also ensures that we do not lose data when deserializing/re-serializing after redaction.
…es section from an SPDX 2.x format SBOM. This is a naive implementation that does *not* leverage the JSON object streaming mechanism that is used for drop validation. The reason for this is that it is fairly complex to amend that logic to customize the validation behavior. Since we are operating on a very tight timeline, the preliminary redaction implementation will rely on deserializing the entire JSON file at once, on the assumption that will cover a large proportion of our in-the-wild use cases. A follow-on investigation will be conducted to determine the practical limits of this technique and necessity for extending to support larger SBOMs. The criteria for a "valid" SBOM include: - well-formed JSON document - includes required SPDX elements - SPDX v2.x Validating SBOMs for SPDX elements covering the NTIA definition of an SBOM is out of scope for this change. Such validation may not be practical when processing 3P SBOMs since an SBOM creator could choose to use different attributes to satisfy the NTIA requirements. In context of redaction, we will skip deserializing the files section (since removing it is the purpose of redaction anyway - this is also expected to reduce the size of an SBOM by ~50%). We will also require Packages and Relationships, since we need to operate on those in order to successfully redact. These ignore/require expectations are set at runtime but currently hardcoded; making them configurable is out of scope for this feature but definitely feasible. The JSON validation and SPDX required elements are explicitly and implicitly enforced by the JSON model definition and deserialization. This also implicitly enforces the SPDX version since v3 is so different in format that the deserialization would fail before we ever got to a version check. Nonetheless, we make our expectations explicit by also parsing and verifying the spdxVersion value. Expanding support for all SPDX 2.x was in scope for this feature, and was accomplished by eliminating the use of enums for deserialization (an earlier change) and adding to the JSON model all properties found in the SPDX 2.3 exemplar document (https://github.com/spdx/spdx-spec/blob/development/v2.3.1/examples/SPDXJSONExample-v2.3.spdx.json). SPDX 2.3 is backwards compatible with earlier 2.x versions. Note that this expanded SPDX version support *only* applies to the Validate Format and Redact functionality, *not* to Generate or Drop Validation. Adding all documented properties to the model also ensures that we do not lose data when deserializing/re-serializing after redaction.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #580 +/- ##
==========================================
+ Coverage 58.81% 59.36% +0.54%
==========================================
Files 254 266 +12
Lines 7894 8101 +207
Branches 922 947 +25
==========================================
+ Hits 4643 4809 +166
- Misses 2834 2870 +36
- Partials 417 422 +5 ☔ View full report in Codecov by Sentry. |
sfoslund
approved these changes
May 21, 2024
test/Microsoft.Sbom.Api.Tests/FormatValidator/FormatValidatorTests.cs
Outdated
Show resolved
Hide resolved
Remove unreferenced objects from FormatValidationService Modify MultilineSummary method to initialize the SBOM if not yet initialized. Since the underlying document is lazy-loaded, all public methods should call Initialize before doing their work.
sfoslund
approved these changes
May 21, 2024
sfoslund
approved these changes
May 21, 2024
tarun06
pushed a commit
to tarun06/sbom-tool
that referenced
this pull request
Jul 21, 2024
* Wire up a validate-format verb that runs the format validation logic for a specified file. This is a hidden verb that I imagine being used during development, and in future by DRI to explore an SBOM or troubleshoot a validation failure. * Fix PR comment - removing duplicated parameter validation code. * Implement a format validation service in support of redacting the Files section from an SPDX 2.x format SBOM. This is a naive implementation that does *not* leverage the JSON object streaming mechanism that is used for drop validation. The reason for this is that it is fairly complex to amend that logic to customize the validation behavior. Since we are operating on a very tight timeline, the preliminary redaction implementation will rely on deserializing the entire JSON file at once, on the assumption that will cover a large proportion of our in-the-wild use cases. A follow-on investigation will be conducted to determine the practical limits of this technique and necessity for extending to support larger SBOMs. The criteria for a "valid" SBOM include: - well-formed JSON document - includes required SPDX elements - SPDX v2.x Validating SBOMs for SPDX elements covering the NTIA definition of an SBOM is out of scope for this change. Such validation may not be practical when processing 3P SBOMs since an SBOM creator could choose to use different attributes to satisfy the NTIA requirements. In context of redaction, we will skip deserializing the files section (since removing it is the purpose of redaction anyway - this is also expected to reduce the size of an SBOM by ~50%). We will also require Packages and Relationships, since we need to operate on those in order to successfully redact. These ignore/require expectations are set at runtime but currently hardcoded; making them configurable is out of scope for this feature but definitely feasible. The JSON validation and SPDX required elements are explicitly and implicitly enforced by the JSON model definition and deserialization. This also implicitly enforces the SPDX version since v3 is so different in format that the deserialization would fail before we ever got to a version check. Nonetheless, we make our expectations explicit by also parsing and verifying the spdxVersion value. Expanding support for all SPDX 2.x was in scope for this feature, and was accomplished by eliminating the use of enums for deserialization (an earlier change) and adding to the JSON model all properties found in the SPDX 2.3 exemplar document (https://github.com/spdx/spdx-spec/blob/development/v2.3.1/examples/SPDXJSONExample-v2.3.spdx.json). SPDX 2.3 is backwards compatible with earlier 2.x versions. Note that this expanded SPDX version support *only* applies to the Validate Format and Redact functionality, *not* to Generate or Drop Validation. Adding all documented properties to the model also ensures that we do not lose data when deserializing/re-serializing after redaction. * Sanitize test strings to remove PII Remove unreferenced objects from FormatValidationService Modify MultilineSummary method to initialize the SBOM if not yet initialized. Since the underlying document is lazy-loaded, all public methods should call Initialize before doing their work.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Implement a format validation service in support of redacting the Files section from an SPDX 2.x format SBOM.
This is a naive implementation that does not leverage the JSON object streaming mechanism that is used for drop validation. The reason for this is that it is fairly complex to amend that logic to customize the validation behavior. Since we are operating on a very tight timeline, the preliminary redaction implementation will rely on deserializing the entire JSON file at once, on the assumption that will cover a large proportion of our in-the-wild use cases. A follow-on investigation will be conducted to determine the practical limits of this technique and necessity for extending to support larger SBOMs.
The criteria for a "valid" SBOM include:
Validating SBOMs for SPDX elements covering the NTIA definition of an SBOM is out of scope for this change. Such validation may not be practical when processing 3P SBOMs since an SBOM creator could choose to use different attributes to satisfy the NTIA requirements.
In context of redaction, we will skip deserializing the files section (since removing it is the purpose of redaction anyway - this is also expected to reduce the size of an SBOM by ~50%). We will also require Packages and Relationships, since we need to operate on those in order to successfully redact. These ignore/require expectations are set at runtime but currently hardcoded; making them configurable is out of scope for this feature but definitely feasible.
The JSON validation and SPDX required elements are explicitly and implicitly enforced by the JSON model definition and deserialization. This also implicitly enforces the SPDX version since v3 is so different in format that the deserialization would fail before we ever got to a version check. Nonetheless, we make our expectations explicit by also parsing and verifying the spdxVersion value.
Expanding support for all SPDX 2.x (previously only SPDX 2.2.1) was in scope for this feature, and was accomplished by eliminating the use of enums for deserialization (an earlier PR) and adding to the JSON model all properties found in the SPDX 2.3 exemplar document (https://github.com/spdx/spdx-spec/blob/development/v2.3.1/examples/SPDXJSONExample-v2.3.spdx.json). SPDX 2.3 is backwards compatible with earlier 2.x versions. Note that this expanded SPDX version support only applies to the Validate Format and Redact functionality, not to Generate or Drop Validation. Adding all documented properties to the model also ensures that we do not lose data when deserializing/re-serializing after redaction.