-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move dependency tree storage from the database to S3 #48
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Storing the SBOM dependecy tree in the database turned out to not be the right decision due to performance issues at scale. Previous changes to improve performance reduced the usage of the dependency table to just generating SBOM reports. This change moves the storage of the dependency tree from the database to S3, removing the need to deconstruct and reconstruct the tree and removes the overhead that goes along with that. The S3 key is structured so that other SBOM file formats, such as SPDX or CycloneDX, could also be stored alongside. - Update engine SBOM processing to write the dependency tree to a JSON file in S3 instead of the database. The dependency tree is still processed in order to store component and license information in the database. - Update sbom_report Lambda to pull the dependency tree JSON file from S3. If the file is not found in S3 it falls back to pulling the tree from the database. This allows for the gradual migration of the dependency tree data from the database to S3 as new scans are run and old scans are purged by the db_cleanup Lambda. - Update the db_cleanup to identify and remove dependency files that were orphaned when their associated scans were deleted. Deleting scans via the ORM will clean up the dependency files from S3. This is a backstop just in case a scan is deleted directly or something else happens that prevents the cleanup at deletion time from succeeding. - Update localstack config to add an S3 bucket that can store dependency tree files during local testing and update AWSConnect in artemislib so that it can be configured to use this S3 bucket for scan data. - Update IAM permssions in Terraform configuration so that the right things can read and write to the scans/ portion of the S3 bucket. - Add sbom_dependency_migration utility to migrate the dependency trees from existing scans from the database to S3. This is useful for testing and also if there are key scans that need the performance improvement and can't wait for the scan replacement and cleanup process. Unrelated to the SBOM dependency changes but included out of necessity: - Pin urllib3 version to 1.x because of compatability issue with botocore: boto/botocore#2926
davakos
reviewed
May 10, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- do any new tests need to be added here, such as to test sbom load from file or migration?
- will this break any of the existing
search
APIs that use component data?
backend/terraform/modules/analyzer/modules/engine_cluster/permissions.tf
Show resolved
Hide resolved
backend/utilities/sbom_dependency_migration/sbom_dependency_migration/main.py
Show resolved
Hide resolved
|
davakos
approved these changes
May 10, 2023
jlegarreta
reviewed
May 10, 2023
8 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Unrelated to the SBOM dependency changes but included out of necessity:
Motivation and Context
Storing the SBOM dependency tree in the database turned out to not be the right decision due to performance issues at scale. Previous changes to improve performance reduced the usage of the dependency table to just generating SBOM reports. This change moves the storage of the dependency tree from the database to S3, removing the need to deconstruct and reconstruct the tree and removes the overhead that goes along with that. The S3 key is structured so that other SBOM file formats, such as SPDX or CycloneDX, could also be stored alongside.
How Has This Been Tested?
Types of changes
Checklist
Pic