Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move dependency tree storage from the database to S3 #48

Merged
merged 2 commits into from
May 10, 2023

Conversation

pizen
Copy link
Contributor

@pizen pizen commented May 10, 2023

Description

  • Update engine SBOM processing to write the dependency tree to a JSON file in S3 instead of the database. The dependency tree is still processed in order to store component and license information in the database.
  • Update sbom_report Lambda to pull the dependency tree JSON file from S3. If the file is not found in S3 it falls back to pulling the tree from the database. This allows for the gradual migration of the dependency tree data from the database to S3 as new scans are run and old scans are purged by the db_cleanup Lambda.
  • Update the db_cleanup to identify and remove dependency files that were orphaned when their associated scans were deleted. Deleting scans via the ORM will clean up the dependency files from S3. This is a backstop just in case a scan is deleted directly or something else happens that prevents the cleanup at deletion time from succeeding.
  • Update localstack config to add an S3 bucket that can store dependency tree files during local testing and update AWSConnect in artemislib so that it can be configured to use this S3 bucket for scan data.
  • Update IAM permissions in Terraform configuration so that the right things can read and write to the scans/ portion of the S3 bucket.
  • Add sbom_dependency_migration utility to migrate the dependency trees from existing scans from the database to S3. This is useful for testing and also if there are key scans that need the performance improvement and can't wait for the scan replacement and cleanup process.

Unrelated to the SBOM dependency changes but included out of necessity:

Motivation and Context

Storing the SBOM dependency tree in the database turned out to not be the right decision due to performance issues at scale. Previous changes to improve performance reduced the usage of the dependency table to just generating SBOM reports. This change moves the storage of the dependency tree from the database to S3, removing the need to deconstruct and reconstruct the tree and removes the overhead that goes along with that. The S3 key is structured so that other SBOM file formats, such as SPDX or CycloneDX, could also be stored alongside.

How Has This Been Tested?

  • Local testing
  • Tested in live environment

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • My code follows conforms to the coding standards.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

Pic

Embed something funny here

Storing the SBOM dependecy tree in the database turned out to not be the
right decision due to performance issues at scale. Previous changes to
improve performance reduced the usage of the dependency table to just
generating SBOM reports. This change moves the storage of the dependency
tree from the database to S3, removing the need to deconstruct and
reconstruct the tree and removes the overhead that goes along with that.
The S3 key is structured so that other SBOM file formats, such as SPDX
or CycloneDX, could also be stored alongside.

- Update engine SBOM processing to write the dependency tree to a JSON
  file in S3 instead of the database. The dependency tree is still
  processed in order to store component and license information in the
  database.
- Update sbom_report Lambda to pull the dependency tree JSON file from
  S3. If the file is not found in S3 it falls back to pulling the tree
  from the database. This allows for the gradual migration of the
  dependency tree data from the database to S3 as new scans are run and
  old scans are purged by the db_cleanup Lambda.
- Update the db_cleanup to identify and remove dependency files that
  were orphaned when their associated scans were deleted. Deleting scans
  via the ORM will clean up the dependency files from S3. This is a
  backstop just in case a scan is deleted directly or something else
  happens that prevents the cleanup at deletion time from succeeding.
- Update localstack config to add an S3 bucket that can store dependency
  tree files during local testing and update AWSConnect in artemislib so
  that it can be configured to use this S3 bucket for scan data.
- Update IAM permssions in Terraform configuration so that the right
  things can read and write to the scans/ portion of the S3 bucket.
- Add sbom_dependency_migration utility to migrate the dependency trees
  from existing scans from the database to S3. This is useful for
  testing and also if there are key scans that need the performance
  improvement and can't wait for the scan replacement and cleanup
  process.

Unrelated to the SBOM dependency changes but included out of necessity:
- Pin urllib3 version to 1.x because of compatability issue with
  botocore: boto/botocore#2926
Copy link
Collaborator

@davakos davakos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. do any new tests need to be added here, such as to test sbom load from file or migration?
  2. will this break any of the existing search APIs that use component data?

@pizen
Copy link
Contributor Author

pizen commented May 10, 2023

  1. do any new tests need to be added here, such as to test sbom load from file or migration?
  2. will this break any of the existing search APIs that use component data?

@davakos

  1. I think those would be integration tests that have dependencies that make them outside of the scope of this repository.
  2. No, this does not affect component data.

@pizen pizen merged commit cdcebee into main May 10, 2023
@pizen pizen deleted the pizen/sbom_dependencies branch May 10, 2023 17:06
@breedenc breedenc mentioned this pull request Dec 8, 2023
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants