Skip to content
This repository has been archived by the owner on Jul 11, 2023. It is now read-only.

feat(certificates) begin to abstract the cert manager patterns #4580

Merged
merged 1 commit into from
Mar 15, 2022

Conversation

steeling
Copy link
Contributor

@steeling steeling commented Mar 9, 2022

This change pulls out the shared logic between the cert providers into
a new manager struct. This is done in pieces, to avoid a single large PR
with O(1000)'s of LOC changed. The goal is to reduce all of the existing
"providers" to clients that only deal with certificates (no caching, msg broker,
logging, or timing).

Once all of these are gone, we will remove the certificate.Manager interface
and promote the certificate.manager struct to the exported certificate.Manager
(struct).

Eventually the rotor functionality will also be on the certificate.Manager. It looks like this
was done as an artifact of the current split. I had to move rotor.ShouldRotate to certificate
within this PR as opposed to later to avoid circular dependencies.

Signed-off-by: Sean Teeling [email protected]

Please answer the following questions with yes/no.

  1. Does this change contain code from or inspired by another project?
    • Did you notify the maintainers and provide attribution?

No

  1. Is this a breaking change?

No

Part of #4533

@steeling steeling force-pushed the feature/cert-manager-1 branch 2 times, most recently from a1ef803 to 6907a7e Compare March 9, 2022 04:43
@codecov-commenter
Copy link

codecov-commenter commented Mar 9, 2022

Codecov Report

Merging #4580 (f55309e) into main (2d5d065) will decrease coverage by 0.23%.
The diff coverage is 70.96%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4580      +/-   ##
==========================================
- Coverage   69.01%   68.78%   -0.24%     
==========================================
  Files         217      217              
  Lines       14760    14951     +191     
==========================================
+ Hits        10187    10284      +97     
- Misses       4521     4615      +94     
  Partials       52       52              
Flag Coverage Δ
unittests 68.78% <70.96%> (-0.24%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/certificate/types.go 100.00% <ø> (+100.00%) ⬆️
pkg/certificate/manager.go 65.21% <65.21%> (ø)
pkg/certificate/providers/config.go 76.22% <66.66%> (-0.54%) ⬇️
...icate/providers/certmanager/certificate_manager.go 28.70% <90.00%> (-50.90%) ⬇️
pkg/certificate/certificate.go 71.42% <100.00%> (+71.42%) ⬆️
...ertificate/providers/tresor/certificate_manager.go 79.42% <100.00%> (ø)
...certificate/providers/vault/certificate_manager.go 57.89% <100.00%> (ø)
pkg/certificate/rotor/rotor.go 83.78% <100.00%> (-3.18%) ⬇️
pkg/certificate/providers/certmanager/helpers.go 20.40% <0.00%> (-53.07%) ⬇️
... and 20 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2d5d065...f55309e. Read the comment docs.

Copy link
Member

@shashankram shashankram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the refactor makes sense to me. Since this is refactoring sensitive parts of the code pertaining to certificates, could you kindly describe in your commit && PR description the testing that was done? Since we use 24h expiration for short-lived certs by default and we do not have an e2e tests to verify cert rotation, so manually testing cert rotation should suffice. You can do this by changing the serviceCertValidityDuration install option to small duration such as 2m.

pkg/certificate/certificate.go Outdated Show resolved Hide resolved
pkg/certificate/manager.go Outdated Show resolved Hide resolved
pkg/certificate/manager.go Outdated Show resolved Hide resolved
pkg/certificate/manager.go Outdated Show resolved Hide resolved
pkg/certificate/manager.go Outdated Show resolved Hide resolved
pkg/certificate/manager.go Outdated Show resolved Hide resolved
pkg/certificate/providers/config.go Outdated Show resolved Hide resolved
Copy link
Contributor

@nojnhuh nojnhuh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes absolutely make sense, looking forward to seeing the rest.

pkg/certificate/manager.go Outdated Show resolved Hide resolved
Comment on lines 48 to 51
if cert.ShouldRotate() {
log.Trace().Msgf("Certificate found in cache but has expired SerialNumber=%s", cert.GetSerialNumber())
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is how it was before, but why doesn't this check the cert's expiration directly? ShouldRotate() will return true for a period while the cert has not yet expired.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getFromCache is called in IssueCertificate. I thought ShouldRotate() was used instead of the cert's expiration directly to avoid returning a certificate that would need to be rotated soon after being issued.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Jon here, I think returning a soon-to-expired cert is better than no cert.

Particularly, it allows us to bump up the ShouldRotate time to say 1/3 of total lifetime of cert, without affecting this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I spoke to soon. It may make sense to trigger a re-issue on certificates if it is expired, even on a GetCertificate call. Will dig through this a bit more, but since this PR isn't changing existing functionality I propose we don't block. Thoughts?

Will keep this open for further discussion

Comment on lines +112 to +115
m.msgBroker.GetCertPubSub().Pub(events.PubSubMessage{
Kind: announcements.CertificateRotated,
NewObj: newCert,
OldObj: oldCert,
}, announcements.CertificateRotated.String())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you have a plan for this already, but I'd be interested in seeing if there's a better way we can handle cert rotation events. As-is, there are several mistakes that can be made here:

  • You grab the wrong pubsub instance from the broker instead of GetCertPubSub()
  • You forget or use the wrong Kind in the message
  • You pass the wrong type to NewObj or OldObj which could cause a panic if the subscribers aren't checking for that
  • You forget or use the wrong topic for the extra arguments to .Pub()

I have a feeling something as simple as a plain channel or handler functions passed in (like for k8s informers) totally separate from the message broker would be a lot simpler to deal with while removing most or all of those concerns. At a minimum, much of that could be also be addressed by instead of exposing the pubsub instance directly from the broker, define our own .PublishRotatedCert(*Certificate, *Certificate) method on the broker that will handle those finer details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're going down to a single place where we do this, I'm think we can be less concerned about introducing bugs than the current state where there was one instance per cert provider.

I think you highlight a good refactoring opportunity for the msgbroker though.

It may make more sense to provide more targeted methods, ie: PublishCertRotation, or provide generic methods that type switch based on the incoming type. Haven't looked at this code too much though

pkg/certificate/manager.go Outdated Show resolved Hide resolved
pkg/certificate/manager.go Show resolved Hide resolved
pkg/certificate/manager.go Show resolved Hide resolved
@steeling steeling force-pushed the feature/cert-manager-1 branch 3 times, most recently from a70d7c0 to 95743fd Compare March 15, 2022 16:50
@steeling
Copy link
Contributor Author

Overall the refactor makes sense to me. Since this is refactoring sensitive parts of the code pertaining to certificates, could you kindly describe in your commit && PR description the testing that was done? Since we use 24h expiration for short-lived certs by default and we do not have an e2e tests to verify cert rotation, so manually testing cert rotation should suffice. You can do this by changing the serviceCertValidityDuration install option to small duration such as 2m.

Good call! Done both by adding an automated test, and verifying manually through the demo with cert-manager deployed, and the settings as you suggested. Certs are successfully being rotated and present in envoy configs

This change pulls out the shared logic between the cert providers into
a new manager struct. This is done in pieces, to avoid a single large PR
with O(1000)'s of LOC changed. The goal is to reduce all of the existing
"providers" to clients that only deal with certificates (no caching, msg broker,
logging, or timing).

Once all of these are gone, we will remove the certificate.Manager interface
and promote the certificate.manager struct to the exported certificate.Manager
(struct).

Signed-off-by: Sean Teeling <[email protected]>
@steeling steeling force-pushed the feature/cert-manager-1 branch from 95743fd to ff49723 Compare March 15, 2022 21:30
@nojnhuh nojnhuh merged commit caaa189 into openservicemesh:main Mar 15, 2022
nshankar13 pushed a commit to nshankar13/osm that referenced this pull request Apr 26, 2022
…ervicemesh#4580)

This change pulls out the shared logic between the cert providers into
a new manager struct. This is done in pieces, to avoid a single large PR
with O(1000)'s of LOC changed. The goal is to reduce all of the existing
"providers" to clients that only deal with certificates (no caching, msg broker,
logging, or timing).

Once all of these are gone, we will remove the certificate.Manager interface
and promote the certificate.manager struct to the exported certificate.Manager
(struct).

Signed-off-by: Sean Teeling <[email protected]>
nshankar13 pushed a commit to nshankar13/osm that referenced this pull request Apr 26, 2022
…ervicemesh#4580)

This change pulls out the shared logic between the cert providers into
a new manager struct. This is done in pieces, to avoid a single large PR
with O(1000)'s of LOC changed. The goal is to reduce all of the existing
"providers" to clients that only deal with certificates (no caching, msg broker,
logging, or timing).

Once all of these are gone, we will remove the certificate.Manager interface
and promote the certificate.manager struct to the exported certificate.Manager
(struct).

Signed-off-by: Sean Teeling <[email protected]>
Signed-off-by: nshankar13 <[email protected]>
jaellio pushed a commit that referenced this pull request Apr 26, 2022
Builds on the work done in #4580 - this removes the old
Manager interface, and exports the new manager interface,
adds fake_manager.go to help with testing where needed
- as the previous mockCertManager could not be used
anymore. Rotor was moved into manager.go and is started
whenever NewManager() is called.

Signed-off-by: Sarah Christoff <[email protected]>
Co-authored-by: Jon Huhn <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants