Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] mullti-certs tls upsteram transport socket #22726

Closed
lambdai opened this issue Aug 16, 2022 · 16 comments
Closed

[feature request] mullti-certs tls upsteram transport socket #22726

lambdai opened this issue Aug 16, 2022 · 16 comments
Labels
area/sds SDS related area/tls enhancement Feature requests. Not bugs or questions. stale stalebot believes this issue/PR has not been touched recently

Comments

@lambdai
Copy link
Contributor

lambdai commented Aug 16, 2022

scenario

Envoy in our deployment acts as a multi tenant proxy. We have the use case to select 1 client TLS cert from ~100 certs according to the downstream request attributes. To understand why multicert, see Delegated Identity API and a previous brainstorms on support dynamic certs in Envoy

I am considering achieving the client cert selection by a multi-certs-tls-transport-socket.

Inspired by the existing downstream-upstream relay attributes such as ALPN/proxyprotocol/tunnel, my desired cert selection sequence would be

  1. The downstream request is instantiated. Some filter states are populated to guide the future transport socket. This can be done via any new L4/L7 extension.

  2. The router selects a cluster and an endpoint is selected. No change is needed.

  3. The filter states are extracted from the downstream request. multi-certs-tls-transport-socket finalizes the upstream cert from the filter states.

A couple of questions

Does anyone have similar use cases? Namely, selecting client cert, or generalized to select from transport sockets
Does the above flow and design generally make sense?
Do we want to expand the use case to select both upstream L4 filter chain along with the transport socket? The downstream transport socket and filter chain mapping has been 1:1 since 4 or 5 years ago. We can reuse the abstraction and part of the implementation.

Alternative solutions

  • Solution 1 Use duplicated clusters.
    Each of the clusters are distinguished by name and the cert.
    You can imagine the same cluster, metric, endpoints were duplicated 100 times. Particularly, on demand cluster offers another potential approach to reduce the number of clusters. We have to address the control plane SLO concerns before we adopt on-demand cluster.

  • Solution 2 TransportSocketMatch.
    TransportSocketMatch offers the ability to maintain multiple transport sockets. However, the match criteria only contains endpoint metadata. Following this path, we need to duplicate each endpoint for 100 times, each endpoint is distinguished by metadata. Additionally we have to predefine 100 subset of endpoints and select one of them per downstream request.

@lambdai lambdai added enhancement Feature requests. Not bugs or questions. triage Issue requires triage labels Aug 16, 2022
@kyessenov
Copy link
Contributor

I think solution 2 with a more generalized matching that could use a shared filter state or explicit transport socket options could work without duplication of endpoints.

@ggreenway
Copy link
Contributor

If you go with the proposed solution, the step The router selects a cluster and an endpoint is selected. No change is needed. isn't correct; you need to know which cert to use when you get the conn pool. Each possible client cert will need to have it's own conn pool. See all the hashing-related bits in ClusterManagerImpl::ThreadLocalClusterManagerImpl::ClusterEntry::httpConnPoolImpl().

@lambdai
Copy link
Contributor Author

lambdai commented Aug 16, 2022

Yeah, the filter state and transport socket options are the input to decide the final transport socket.

The existing transport socket match is executed at config update. It has no effect upon downstream request handling.

My goal with the TransportSocketMatch seems similar from high level, I am very hesitated to reuse that term.

@lambdai
Copy link
Contributor Author

lambdai commented Aug 16, 2022

you need to know which cert to use when you get the conn pool. Each possible client cert will need to have it's own conn pool. See all the hashing-related bits in

Thanks. I agree. Within connpool(), an endpoint is selected and then a pool is chosen or created.

I intentionally avoid to use the connpool because the confusion here.

Yes, a hash is generated in the pool selection. I think Kuat already encapsulate the per transport option hash calculation #19809. To some extent, httpConnPoolImpl() can embrace the new hashes with no code change.

@lizan lizan added area/tls area/sds SDS related and removed triage Issue requires triage labels Aug 25, 2022
@lizan
Copy link
Member

lizan commented Aug 25, 2022

Will the ongoing certificate provider mechanism #19308 fit in this use case? cc @LuyaoZhong

@LuyaoZhong
Copy link
Contributor

Will the ongoing certificate provider mechanism #19308 fit in this use case? cc @LuyaoZhong

I think the answer should be Yes. This is focus on multiple certs selection in upstream transport. And cert provider mechanism is going to support dynamic certs.

@kyessenov
Copy link
Contributor

I think #19308 is not sufficient. The missing piece is the selection of the certificate name based on the downstream request. There needs to be some way to:

  1. set a certain filter state for the certificate name in the transport socket options based on the downstream;
  2. use the filter state as the name in the dynamic cert provider

For 2) we need the name field to be not a constant but instead some variable over the transport socket options. Maybe even hard-coding to some well known filter state name. IMHO, SDS using this variable as the resource name could also work here if the machinery supports that.

@lambdai
Copy link
Contributor Author

lambdai commented Aug 30, 2022

@lizan @LuyaoZhong The goal is very close. We both try to achieve the goal that a transport socket could pick the cert from a list.

Because of the use cases, the nuance are

  1. What is the owner of the candidate certs? My use case is more like a multi tenancy. So the cluster specified the candidates instead of a centralized the security manager as bootstrap. Your current prototype is more advanced than mine regarding the cert lists update, either adding a new cert or removing a certs.
  2. My concrete use case to guide the upstream cert selection from the downstream stream info. That's omitted in your current PR.

It would be great if we can build a single mechanism for both our use cases!

@kyessenov
Copy link
Contributor

@lambdai In your use case, do you expect the set of all certificates per one cluster to be relatively stable? I think it's much easier to pre-fetch all certs during cluster warming, but that also means adding a new cert means a cluster update.

@lizan
Copy link
Member

lizan commented Aug 30, 2022

2. My concrete use case to guide the upstream cert selection from the downstream stream info. That's omitted in your current PR.

I think that's the OnDemandMetadataUpdate does, while the name is super confusing, to provide stream info to certificate provider, and the callback gets the cert back.

@LuyaoZhong
Copy link
Contributor

  1. My concrete use case to guide the upstream cert selection from the downstream stream info. That's omitted in your current PR.

I think that's the OnDemandMetadataUpdate does, while the name is super confusing, to provide stream info to certificate provider, and the callback gets the cert back.

Current Stream info might fit this use case, but for bumping case where we need real server cert information to mimic cert for downstream that does not exist in stream info for now. So if we use stream info, could we change stream info based on diff features accordingly?

@LuyaoZhong
Copy link
Contributor

@lambdai cert provider is just to provide multi-certs in one transport socket dynamically, and what you proposed is to select client cert. That's two parts to support your multi-tenant use case.

Does this proposal only propose the cert selection in upstream transport? If so, then I think it has nothing to do cert provider.
If this proposal contains the part that needs some component can help generates multiple certs, then it might need cert provider gets involved.

Just like bumping case, we has two parts as well,
first part is dynamic certs generating, we have bumping filter and cert provider work together to finish this task, bumping filter invokes the cert provider to generates certs, cert provider applies the certs to downstream transport.
second part is cert selection based on SNI, we have SNI-based cert selection.

PS, TLS bumping

@lambdai
Copy link
Contributor Author

lambdai commented Aug 30, 2022

@lambdai In your use case, do you expect the set of all certificates per one cluster to be relatively stable?

Not too stable. In our case, the cluster (or my proposed transport socket) handles all the clients on the same host. New client, as long as the client cert, is added or removed when the client pod is started or stopped.

I think it's much easier to pre-fetch all certs during cluster warming, but that also means adding a new cert means a cluster update.

Exactly! The natural solutions can be smart cluster update, similar to smart listener update.

Or, we avoid updating the cluster. @LuyaoZhong has a great example to leave the cluster not updating.
Instead, the cluster refers to a volatile provider.

@lambdai
Copy link
Contributor Author

lambdai commented Aug 30, 2022

@lizan @LuyaoZhong
I agree the source of the truth can be stream_info.

I want to clarify what the gap between the stream info and the list of certs.

From high level, the cluster MUST specify a range of interested certs. It's OK a singleton bootstrap extension receives all the certs. The subset of certs is not only for performance. The security concern is the key.

Another requirement is opt out on demand. The cluster can announce warmed only if the given cert lists are ready, as what the current TLSUpstreamTransportSocket promises.

The last thing is more about SDS. Currently the cluster works well with SDS server. Correct me if I am wrong: Luyao needs more from stream info while subscribing certs, because your potential certs can be unlimited and secret name is not best describer. for my case, the proved SDS server is all need out of envoy

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Sep 30, 2022
@github-actions
Copy link

github-actions bot commented Oct 7, 2022

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

@github-actions github-actions bot closed this as completed Oct 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/sds SDS related area/tls enhancement Feature requests. Not bugs or questions. stale stalebot believes this issue/PR has not been touched recently
Projects
None yet
Development

No branches or pull requests

5 participants