Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Certgen should generate secrets that are compatible with cert-manager #2494

Closed
jpeach opened this issue May 4, 2020 · 12 comments · Fixed by #2547
Closed

Certgen should generate secrets that are compatible with cert-manager #2494

jpeach opened this issue May 4, 2020 · 12 comments · Fixed by #2547
Assignees
Labels
area/deployment Issues or PRs related to deployment tooling or infrastructure. kind/feature Categorizes issue or PR as related to a new feature. release-note Denotes a PR that will be considered when it comes time to generate release notes.

Comments

@jpeach
Copy link
Contributor

jpeach commented May 4, 2020

As noted by @tsaarni in #2149, the secret that certgen generates isn't interchangeable with how cert-manager generates secrets.

Maybe we can update certgen to be compatible with cert-manager, which would make it easier for sites to switch.

@jpeach jpeach added kind/feature Categorizes issue or PR as related to a new feature. area/deployment Issues or PRs related to deployment tooling or infrastructure. labels May 4, 2020
@tsaarni
Copy link
Member

tsaarni commented May 12, 2020

Aligning with cert-manager would fix the problem between certgen and cert-manager.

Since there is no standard, the way how secrets are generated is implementation specific and cert-manager is just another take on this, just like certgen. Aligning would allow us to support both certgen and cert-manager, but obviously it still isn't a generic solution - but I doubt there is good generic solutions for this.

Depending how people are using certgen and/or example manifest, there can be backwards compatibility issues to be considered if doing the change.

@jpeach
Copy link
Contributor Author

jpeach commented May 14, 2020

This patch switches contour certgen over to generating standard TLS secrets compatible with cert-manager.

As @tsaarni points out, the trick here is migrating upgrading existing clusters. Since the certgen job only runs once, the xDS secrets from the previous version will not be updated. This means that any changes we make to the deployment will also break since the files referenced by the command flags won't be present.

Since the new command doesn't generate a separate secret for the CA cert, online migration is a problem. The only alternative I can think of is forcing the updated certgen job to run (maybe by adding a version suffix to the job name). This will also force an envoy restart, which might not be desirable.

/cc @youngnick

@youngnick
Copy link
Member

I don't think there's any way around this being a breaking change. The only way here is to just make it all explicit:

  • certgen Job gets a new name
  • secrets get new names
  • Contour deployment and Envoy Daemonset updated to use new secrets

Upgrade instructions include clear direction about what the difference is, why it's been done, why you won't need to do this again (ie now the certs are compatible with cert-manager, can be rotated transparently, etc), and that upgrading requires restarts of Contour and Envoy,

Then, instructions on how to clean up the old stuff.

That way, operators who are using our examples directly will understand what they need to do, and others who aren't will have a clear picture of the steps to do.

@jpeach
Copy link
Contributor Author

jpeach commented May 15, 2020

OK I went down a bit of a rabbit hole on this. You can't use generateName on the certgen job as per kubernetes/kubernetes#44501. However if we publish YAML via kustomize, we can use that to make an equivalent change. If we did this, then certgen could run on each Contour install. Although the TTL Controller is probably not enabled on most clusters, I expect operators would be able to deal with one extra Job object per month.

However, in the kustomize PR, since we place certgen in a different base (to make it more easily decoupled), it's not so easy to change the name of the secrets (which need to be consistent across the certgen, contour and envoy kustomize bases). However, if xDS certificate rotation is working well after landing all of @tsaarni 's changes, maybe each certgen run could overwrite the secrets, which would be a general improvement.

@jpeach
Copy link
Contributor Author

jpeach commented May 18, 2020

I'm working on this, because we probably need to rationalize this in order to ship a release where operators can reload xDS certificates in a reasonable way (i.e. without having to vendor large chunks of Contour YAML).

Basically working, barring a race around doing the update.

  • There's some issue in my kind setup where it doesn't run the right Contour version in the certgen job which means that the new --overwrite flag doesn't work
  • If the new Secret isn't laid down when Envoy is restarted, then it will never pick it up because th bootstrap config is rejected
[2020-05-18 07:14:46.193][1][debug][misc] [source/common/filesystem/posix/filesystem_impl.cc:139] Unable to determine canonical path for /certs/ca.crt: No such file or directory
[2020-05-18 07:14:46.194][1][warning][config] [source/common/config/filesystem_subscription_impl.cc:40] Filesystem config update rejected: Invalid path: /certs/ca.crt

@tsaarni
Copy link
Member

tsaarni commented May 18, 2020

I guess it could be rather complicated to have Envoy started, but remain in a special "uninitialized" state, waiting for cert/key files to appear. Having a new secret name would be easy way to "synchronize" and keep Envoy pods from starting before secret with the a new file layout is created.

However, in the kustomize PR, since we place certgen in a different base (to make it more easily decoupled), it's not so easy to change the name of the secrets (which need to be consistent across the certgen, contour and envoy kustomize bases).

I did not understand it fully, can you explain it a bit more? Aren't all manifest components expected to come from single Contour version and combined together before applying?

@jpeach
Copy link
Contributor Author

jpeach commented May 18, 2020

I did not understand it fully, can you explain it a bit more? Aren't all manifest components expected to come from single Contour version and combined together before applying?

In the kustomize proposal, the Contour deployment is broken into 4 components:

  • base types
  • Contour
  • certgen
  • envoy

The problem is that the xDS secret needs to be configured for both contour and envoy components, leading to an implicit coupling. In kustomize, there is syntax to copy the value of one field into other fields, but that has to be done in the base, and can't retro-actively be applied when you consume the base. You would have to do it all with JSON patches or something.

I guess it could be rather complicated to have Envoy started, but remain in a special "uninitialized" state, waiting for cert/key files to appear.

The bootstrap can actually do this, because it should know whether the files are available.

@tsaarni
Copy link
Member

tsaarni commented May 19, 2020

The problem is that the xDS secret needs to be configured for both contour and envoy components, leading to an implicit coupling. In kustomize, there is syntax to copy the value of one field into other fields, but that has to be done in the base, and can't retro-actively be applied when you consume the base.

I thought that the secret would just be renamed and those new names would still remain hardcoded in contour and envoy volume mounts like they are today (even in the kustomize PR pod specs). The change in certgen code, contour & envoy kustomize components would be done at once.

But I think I'm still missing some complexity here. Why it becomes necessary to retro-actively modify / parametrize the kustomize step?

The bootstrap can actually do this, because it should know whether the files are available.

Envoy knows when files become available and in theory it could lazily initialize xDS upstream cluster at that point. The outcome would be like a chain of dependencies in pending state:

  1. xDS cluster is uninitialized and pending its dependencies
  2. TLS transport socket is uninitialized and pending its dependencies
  3. SDS and the file watch is pending its dependencies (actual cert/key files)

I think currently the file watch is simply associated to reloading cert/key files into existing SSL context and there might not be mechanism to do lazy initialization like this.

@tsaarni
Copy link
Member

tsaarni commented May 19, 2020

Ah sorry, do you mean contour bootstrap would also introduce a file watcher?

@jpeach
Copy link
Contributor Author

jpeach commented May 19, 2020

The problem is that the xDS secret needs to be configured for both contour and envoy components, leading to an implicit coupling. In kustomize, there is syntax to copy the value of one field into other fields, but that has to be done in the base, and can't retro-actively be applied when you consume the base.

I thought that the secret would just be renamed and those new names would still remain hardcoded in contour and envoy volume mounts like they are today (even in the kustomize PR pod specs). The change in certgen code, contour & envoy kustomize components would be done at once.

But I think I'm still missing some complexity here. Why it becomes necessary to retro-actively modify / parametrize the kustomize step?

What I'm saying is that it is hard to do the transform after the fact in kustomize. We can just change the original sources, but in the original statement I was thinking about how we would do it after the fact.

The bootstrap can actually do this, because it should know whether the files are available.

Envoy knows when files become available and in theory it could lazily initialize xDS upstream cluster at that point. The outcome would be like a chain of dependencies in pending state:

  1. xDS cluster is uninitialized and pending its dependencies
  2. TLS transport socket is uninitialized and pending its dependencies
  3. SDS and the file watch is pending its dependencies (actual cert/key files)

I think currently the file watch is simply associated to reloading cert/key files into existing SSL context and there might not be mechanism to do lazy initialization like this.

We don't need to change envoy. I have a 5 line change to the bootstrapped that exits if the files aren't already present. There's no point continuing because envoy will reject the static config update.

@tsaarni
Copy link
Member

tsaarni commented May 19, 2020

Thanks for explaining :)

If we are willing to do non-backwards compatible change, would it be feasible to simply rename the secret in original sources?

@jpeach
Copy link
Contributor Author

jpeach commented May 19, 2020

Thanks for explaining :)

If we are willing to do non-backwards compatible change, would it be feasible to simply rename the secret in original sources?

We don't even need to do that. I have a working change to fix everything, but it does require generating a unique name for the certgen job.

@jpeach jpeach added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label May 21, 2020
jpeach added a commit to jpeach/contour that referenced this issue May 21, 2020
This change adds new options to certgen that will allow it to overwrite
existing Kubernetes Secrets, and to generate Secrets in a format that
is compatible with cert-manager (we call this "compact" format).

The `--overwrite` flag can be used to let certgen update existing secrets,
and this will enable both the transition to the new format and certificate
rotation on a per-release granularity.

The `--secrets-format` flag switches the Secrets output format from the
current format to the compatible "compact" format. Once this is enabled
in the deployment YAML, operators will easily be able to substitute
cert-manager certificates for certgen certificates.

Finally, the bootstrap command is updated to ensure that the specified
certificate files are actually present. This prevents Envoy starting up
and rejecting the static configuration, which is unrecoverable.

This fixes projectcontour#2494.

Signed-off-by: James Peach <[email protected]>
jpeach added a commit to jpeach/contour that referenced this issue May 24, 2020
This change adds new options to certgen that will allow it to overwrite
existing Kubernetes Secrets, and to generate Secrets in a format that
is compatible with cert-manager (we call this "compact" format).

The `--overwrite` flag can be used to let certgen update existing secrets,
and this will enable both the transition to the new format and certificate
rotation on a per-release granularity.

The `--secrets-format` flag switches the Secrets output format from the
current format to the compatible "compact" format. Once this is enabled
in the deployment YAML, operators will easily be able to substitute
cert-manager certificates for certgen certificates.

Finally, the bootstrap command is updated to ensure that the specified
certificate files are actually present. This prevents Envoy starting up
and rejecting the static configuration, which is unrecoverable.

This fixes projectcontour#2494.

Signed-off-by: James Peach <[email protected]>
jpeach added a commit to jpeach/contour that referenced this issue May 26, 2020
This change adds new options to certgen that will allow it to overwrite
existing Kubernetes Secrets, and to generate Secrets in a format that
is compatible with cert-manager (we call this "compact" format).

The `--overwrite` flag can be used to let certgen update existing secrets,
and this will enable both the transition to the new format and certificate
rotation on a per-release granularity.

The `--secrets-format` flag switches the Secrets output format from the
current format to the compatible "compact" format. Once this is enabled
in the deployment YAML, operators will easily be able to substitute
cert-manager certificates for certgen certificates.

Finally, the bootstrap command is updated to ensure that the specified
certificate files are actually present. This prevents Envoy starting up
and rejecting the static configuration, which is unrecoverable.

This fixes projectcontour#2494.

Signed-off-by: James Peach <[email protected]>
jpeach added a commit to jpeach/contour that referenced this issue May 26, 2020
This change adds new options to certgen that will allow it to overwrite
existing Kubernetes Secrets, and to generate Secrets in a format that
is compatible with cert-manager (we call this "compact" format).

The `--overwrite` flag can be used to let certgen update existing secrets,
and this will enable both the transition to the new format and certificate
rotation on a per-release granularity.

The `--secrets-format` flag switches the Secrets output format from the
current format to the compatible "compact" format. Once this is enabled
in the deployment YAML, operators will easily be able to substitute
cert-manager certificates for certgen certificates.

Finally, the bootstrap command is updated to ensure that the specified
certificate files are actually present. This prevents Envoy starting up
and rejecting the static configuration, which is unrecoverable.

This fixes projectcontour#2494.

Signed-off-by: James Peach <[email protected]>
jpeach added a commit that referenced this issue May 26, 2020
This change adds new options to certgen that will allow it to overwrite
existing Kubernetes Secrets, and to generate Secrets in a format that
is compatible with cert-manager (we call this "compact" format).

The `--overwrite` flag can be used to let certgen update existing secrets,
and this will enable both the transition to the new format and certificate
rotation on a per-release granularity.

The `--secrets-format` flag switches the Secrets output format from the
current format to the compatible "compact" format. Once this is enabled
in the deployment YAML, operators will easily be able to substitute
cert-manager certificates for certgen certificates.

Finally, the bootstrap command is updated to ensure that the specified
certificate files are actually present. This prevents Envoy starting up
and rejecting the static configuration, which is unrecoverable.

This fixes #2494.

Signed-off-by: James Peach <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/deployment Issues or PRs related to deployment tooling or infrastructure. kind/feature Categorizes issue or PR as related to a new feature. release-note Denotes a PR that will be considered when it comes time to generate release notes.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants