Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: KMP periodic retrieval with k8s requeue #1625

Merged
merged 13 commits into from
Aug 16, 2024

Conversation

duffney
Copy link
Contributor

@duffney duffney commented Jul 10, 2024

Description

Implements a periodic refresh for KMP resources deployed to Kubernetes by using the controller's re-queue functionality.

What this PR does / why we need it:

Adds two new properties to the KMP CRD spec Interval and Refreshable. Interval determines the amount of time in-between refreshes. Refreshable determines if the resources is eligible for refresh.

Logic from the controller's reconcile method has been abstracted into a new refresh interface. TODO: more details

Which issue(s) this PR fixes (optional, using fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when the PR gets merged):

Fixes #1131

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Helm Chart Change (any edit/addition/update that is necessary for changes merged to the main branch)
  • This change requires a documentation update

How Has This Been Tested?

Unit Tests:

  • CRs with nonrefreshable provider
  • CRs with refreshable provider using default interval (disabled)
  • CRs with an invalid interval
  • CRs with user provided interval
  • CRs with user updated interval

Manual Tests:

  • Key Management Provider with Azure Key Vault Certificate (Version not specified)
apiVersion: config.ratify.deislabs.io/v1beta1
kind: KeyManagementProvider
metadata:
  name: keymanagementprovider-akv
spec:
  type: azurekeyvault
  interval: 1m
  parameters:
    vaultURI: $AKV_URI
    certificates:
      - name: $certName
    tenantID: $tenantID
    clientID: $clientID

Behavior:

  • When the certificate is rotated (new version created) the CR errors unable to pull secrets w/ disabled state
    • pkg/keymanagementprovider/azurekeyvault/provider.go line:147
      disabled-causes-error
  • After refresh certificate version is updated
    cert-refresh-after-rotation
  • artifacts with old cert signatures fail verification (expected)
  • Verification was successful after artifacts were resigned with new certificate 
  • Unsure if previous versions of the certificate are removed from certs & keys maps

  • Key Management Provider with Azure Key Vault Certificate (with Specified Version)
apiVersion: config.ratify.deislabs.io/v1beta1
kind: KeyManagementProvider
metadata:
  name: kmpprovider-akv-versioned
spec:
  type: azurekeyvault
  refreshinterval: 1m
  parameters:
    vaultURI: $AKV
    certificates:
      - name: $certName 
        version: $version
    tenantID: $tenantID
    clientID: $clientID

Behavior:

  • When the KMP is configured with a versioned certificate refreshing the resource does not update the version of the certificate stored in the resource. For example. if a new latest version of the certificate the resource will not update to use the latest version.

Screenshot from 2024-08-09 09-13-20
Screenshot from 2024-08-09 09-13-04

  • Does verification fail when the cert or key is disabled?
    • No, when a the KMP is created with an enabled cert that later becomes disabled it remains in the cert/key maps and allows for verification to succeed. See [Bug] Disabled certificates pass verification  #1699 for details.
    • Yes, when a KMP is created with a disabled certificate the provider returns an error before the cert/key maps are populated.

E2E

  • Validation with refreshed identity permission
  • Validation with refreshed cert/key
  • Validation with disabled cert/key

Checklist:

  • Does the affected code have corresponding tests?
  • Are the changes documented, not just with inline documentation, but also with conceptual documentation such as an overview of a new feature, or task-based documentation like a tutorial? Consider if this change should be announced on your project blog.
  • Does this introduce breaking changes that would require an announcement or bumping the major version?
  • Do all new files have appropriate license header?

Post Merge Requirements

  • MAINTAINERS: manually trigger the "Publish Package" workflow after merging any PR that indicates Helm Chart Change

@akashsinghal akashsinghal changed the title KMP periodic retrieval with Kubernetes requeue feat: KMP periodic retrieval with k8s requeue Jul 10, 2024
@duffney duffney force-pushed the issue-1131/refresher branch from dae0f30 to 234e835 Compare July 17, 2024 14:10
@susanshi
Copy link
Collaborator

Would like to understand the scope of this PR. If docs and e2e tests will be included in this PR

@susanshi
Copy link
Collaborator

susanshi commented Jul 18, 2024

Please complete test coverage section once PR is ready for Review. Here are some suggestion, let us know if you have any questions, thanks!

  • CRs ( various provider) with default provider interval
  • CRs with user provided interval
  • CRs with user updated interval
  • Validation with refreshed identity permission
  • Validation with refreshed cert/key
  • Validation with disabled cert/key
  • backward compat: what's the behavior on existing kmp

@duffney
Copy link
Contributor Author

duffney commented Jul 23, 2024

Would like to understand the scope of this PR. If docs and e2e tests will be included in this PR

My thinking was this PR would include the changes to the KMP resource that enabled the refresh possible for clustered and namedspaced resources of the KMP. I was thinking the e2e tests would be a separate PR but, I don't have a strong opinion. So, if it's preferred to have them in this PR, that's fine with me also.

@susanshi
Copy link
Collaborator

I think e2e can be a separate PR if this PR is covered by manual testing. @duffney , do you think this PR will be ready for review this week?

@duffney duffney force-pushed the issue-1131/refresher branch from ca14514 to 7f195ad Compare July 24, 2024 14:11
@duffney
Copy link
Contributor Author

duffney commented Jul 24, 2024

I think e2e can be a separate PR if this PR is covered by manual testing. @duffney , do you think this PR will be ready for review this week?

@susanshi the PR is ready for another around of review. :) I just pushed changes that add the refresh interface to namespaced resources. I'll work on manually testing these changes and documenting the steps. That'll give me a better idea of how to build the e2e test for the next PR.

@duffney duffney force-pushed the issue-1131/refresher branch 3 times, most recently from a237ec3 to ef0ec99 Compare July 30, 2024 00:01
@susanshi
Copy link
Collaborator

susanshi commented Jul 30, 2024

discussed in PR review , for 1.3 default to empty string ( disable refresh) , in 1.4, once we support N version of cert , default to 1min. (Note the breaking scenario of disabled key/cert)

@yizha1
Copy link
Collaborator

yizha1 commented Jul 30, 2024

@susanshi is the default to 1 min too short? The key/cert is not likely to be rotated in such a short period normally.

@duffney duffney force-pushed the issue-1131/refresher branch from ef0ec99 to 20f09ea Compare July 30, 2024 19:04
@duffney
Copy link
Contributor Author

duffney commented Jul 30, 2024

CRs with user updated interval

@susanshi, can you elaborate on what you're meaning by this test? Does this mean that the user defined and interval and then replaced it with a different value later on?

@susanshi
Copy link
Collaborator

@susanshi is the default to 1 min too short? The key/cert is not likely to be rotated in such a short period normally.

Hi Yi, the scenario in mind how long does customer have to wait for the refresh. In scenarios like changes in permission role assignment, customer might want to wait short amount of time to validate that the cert retrieval succeeded. KV providers have thousands of transaction limit per 10 sec, there is no concern from the request throttling side.

@susanshi susanshi marked this pull request as ready for review August 1, 2024 01:40
@susanshi
Copy link
Collaborator

susanshi commented Aug 2, 2024

CRs with user updated interval

@susanshi, can you elaborate on what you're meaning by this test? Does this mean that the user defined and interval and then replaced it with a different value later on?

thanks for confirming @joshuaphelpsms. Yes that is correct, exactly , "the user defined and interval and then replaced it with a different value later on"

@akashsinghal
Copy link
Collaborator

@duffney thanks for PR. Overall looks good. Do you think we need to add a new interval value in the helm chart for AKV?

@duffney
Copy link
Contributor Author

duffney commented Aug 8, 2024

@duffney thanks for PR. Overall looks good. Do you think we need to add a new interval value in the helm chart for AKV?

You're most welcome! And yes, adding a refreshinterval to the helm chart makes a lot of sense to me. Having it there prevents the user from having to redeploy the CRD configuration to enable refreshing. Would that involve adding a new value to charts/ratify/values.yml and adding an if to the kmp templates?

@duffney duffney force-pushed the issue-1131/refresher branch from a0540d2 to 060fd3e Compare August 9, 2024 17:50
@akashsinghal
Copy link
Collaborator

@duffney thanks for PR. Overall looks good. Do you think we need to add a new interval value in the helm chart for AKV?

You're most welcome! And yes, adding a refreshinterval to the helm chart makes a lot of sense to me. Having it there prevents the user from having to redeploy the CRD configuration to enable refreshing. Would that involve adding a new value to charts/ratify/values.yml and adding an if to the kmp templates?

@duffney yes that's correct. Maybe you can add the refresh interval for AKV KMP here:

azurekeyvault:
. You would also update the default AKV KMP template and add the new value to the README of the charts folder

@susanshi
Copy link
Collaborator

please review factory at , 85d7006

@duffney duffney force-pushed the issue-1131/refresher branch from 08d651f to 1e59e53 Compare August 13, 2024 17:23
Copy link
Collaborator

@akashsinghal akashsinghal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for addressing my comments!

Copy link
Collaborator

@binbin-li binbin-li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks for adding the feature and unit tests!

@binbin-li binbin-li enabled auto-merge (squash) August 16, 2024 08:00
duffney and others added 12 commits August 16, 2024 05:55
feat: refresher interface

fix: formatting and license

test: kuberefresher

test: kuberefresher helpers

refactor: IsRefreshable KMP interface method

fix: add IsRefreshable method

refactor: kmp spec interval as string

mod: rm todo

mod: rm comment

style: use embedded type

feat: namespaced refresher interface

test: mock factory & test refreshable logic

refactor: use kubeRefreshNamespaced in controller

mod: add license blocks

refactor: kmp.spec.interval default to empty string disabling refresh

tests: increase refresher coverage

refactor(kmp): use factory for creating refresher objects

This change introduces a RefresherFactory that is responsible for creating the
appropriate refresher object based on a configuration that's passed in. This decouples the
reconcile logic from the object creation, making the code more modular and easier to test

mod: rename interval to refreshinterval & adds crd descriptions

mod: remove verbose logging from refresher

docs: kmp refresh interval samples

doc: corrected resource name of akv samples

fix: kmp spec json tag using incorrect values & add refresh interval description

fix: add licenses & update interval name in tests

mod: comments for exported interfaces

test: improve refresher test coverage
Signed-off-by: Joshua Duffney <[email protected]>
Bumps [anchore/sbom-action](https://github.com/anchore/sbom-action) from 0.17.0 to 0.17.1.
- [Release notes](https://github.com/anchore/sbom-action/releases)
- [Commits](anchore/sbom-action@d94f46e...ab9d16d)

---
updated-dependencies:
- dependency-name: anchore/sbom-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Joshua Duffney <[email protected]>
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.26.0 to 3.26.1.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](github/codeql-action@eb055d7...29d86d2)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Joshua Duffney <[email protected]>
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.26.1 to 3.26.2.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](github/codeql-action@29d86d2...429e197)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Joshua Duffney <[email protected]>
auto-merge was automatically disabled August 16, 2024 10:59

Head branch was pushed to by a user without write access

@duffney duffney force-pushed the issue-1131/refresher branch from 3f2eaef to 7a89eac Compare August 16, 2024 10:59
@binbin-li binbin-li merged commit ad92eb8 into ratify-project:dev Aug 16, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

No retry mechanism the cert fetch
6 participants