Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLS bumping: decrypting communications between internal and external services #18928

Open
LuyaoZhong opened this issue Nov 8, 2021 · 71 comments
Labels

Comments

@LuyaoZhong
Copy link
Contributor

LuyaoZhong commented Nov 8, 2021

TLS Bumping in Envoy Design Doc

2022.10.31

PoC: #23192
README and configurations are in tls_bumping subdir


2022.07.13
4 work items were worked out.

  1. Certificate Provider framework
    Implement CertificateProvider mechanism #19308
    Implement Certificate Provider Framework (#19308) #19582
  2. SNI-based cert selection in tls transport socket
    SNI-based cert selection in TLS transport socket #21739
    SNI-based cert selection during TLS handshake #22036
  3. A new network filter - BumpingFilter
    Add a bumping filter for cert mimicking #22581
    WiP Add Bumping filter #22582
  4. Certificate Provider instance - LocalMimicCertProvider
    WIP: Local cert provider #23063

2022.04.24 update

Mimicking certs only based on SNI is probably not enough, we require server real certificate and ensure to copy subject, subject alt name, extensions, knowing about the RSA key strength and many more. Original proposal was to set up client-first secure connection, to meet above requirements we need server-first secure connection.

Therefore, we expect the workflow like this:

  1. downstream requires accessing some external websites like "google.com", the traffic is routed to Envoy
  2. Envoy receive the CLIENT_HELLO but don't do handshake with downstream until step5
  3. Envoy connects "google.com" (upstream) and get real server certificate
  4. Envoy copies the subject, subject alt name, extensions, etc from real server certificate and generates mimic certificate
  5. Envoy does TLS handshake with downstream using mimic certificate
  6. traffic is decrypted and go through Envoy network filters, especially HCM, there are many http filters and user can also expand http filter easily with WASM to plugin in many security functions.
  7. traffic is encrypted and sent to upstream.

Original Proposal

Title: decrypting communications between internal and external services

Description:

When Envoy works as sidecar or egress gateway in service mesh, Istio takes responsibility of certification generation and pushing the configs to Envoy via xDS. But when it works like typical proxy, the internal services on the edge might access many different external websites such as Google or Bing etc, Envoy does't provide the ability to terminate this kind of TLS traffic.
For this scenario, we propose a method to let Envoy generate certs dynamically and do TLS handshake. Then if the client trusts the root ca that the certs signed from, it can access external services under the control of Envoy.

Changes (straw man)

  1. introduce an API to enable this feature and configure ca crt and key for signing
  2. get sni from tls inpector (we need sni to generate certs, just utilize tls inspector, probably no changes)
  3. generate certs according to sni
  4. set the certs to SSL object and then do handshake

Any comments are welcome.

@LuyaoZhong LuyaoZhong added enhancement Feature requests. Not bugs or questions. triage Issue requires triage labels Nov 8, 2021
@rojkov rojkov added area/tls and removed triage Issue requires triage labels Nov 15, 2021
@rojkov
Copy link
Member

rojkov commented Nov 15, 2021

/cc @lizan @asraa @ggreenway

@ggreenway
Copy link
Contributor

Can you please elaborate on the desired traffic flow (client envoy's possition, server, which connections are TLS vs plaintext)?

@lambdai
Copy link
Contributor

lambdai commented Nov 16, 2021

I am curious what kind of cert is needed for the google/bing access.

If the upstream is google/bing, envoy doesn't terminate tls but initiate tls.

The straw man flow confuses me: is the cert applied in downstream connection or upstream connection?

@LuyaoZhong
Copy link
Contributor Author

LuyaoZhong commented Nov 16, 2021

@ggreenway @lambdai
The desired traffic flow is like this:
<downstream/internal service> ---- TLS(mimic cert generated by Envoy) ---- < Envoy> ---- TLS ---- <upstream/external service>

I mean envoy needs to terminate downstream TLS first, then we can apply many filters to control internal service accessing external network, and after that envoy initiates TLS to upstream. The mimic cert will be applied to downstream connection. There is no change to upstream connection.
I'm not sure if I was using a proper word "terminate", if not please correct me.

Thanks for your comments.

@ggreenway
Copy link
Contributor

Ok, I think I understand now. Let me paraphrase to make sure I understand: you'd like for envoy to have a CA cert/key, trusted by the downstream client, and for envoy to dynamically generate a TLS cert signed by the CA cert/key for whatever name is in the SNI of a connection?

@LuyaoZhong
Copy link
Contributor Author

LuyaoZhong commented Nov 17, 2021

@ggreenway Yes, exactly. Does it make sense for you?

@LuyaoZhong
Copy link
Contributor Author

I wrote some PoC code for dynamically generating cert, and I tested the downstream TLS handshake using the mimic cert.

For API change, envoy currently requires certs(static or sds) set in config yaml file, and the code path doesn't take the case I mentioned into consideration. To support this feature I need a proper API introduced to indicate we will do TLS handshake using dynamic cert . I would like you could help me on this new API definition, I'm thinking about adding "tls_root_certificates" to CommonTlsContext, and it is only valid when the commonTlsContext is part of DownstreamTlsContext:

[extensions.transport_sockets.tls.v3.CommonTlsContext]

{
"tls_params": "{...}",
"tls_certificates": [],
"tls_root_certificates": [],
"tls_certificate_sds_secret_configs": [],
"validation_context": "{...}",
"validation_context_sds_secret_config": "{...}",
"combined_validation_context": "{...}",
"alpn_protocols": [],
"custom_handshaker": "{...}"
}

[extensions.transport_sockets.tls.v3.TlsRootCertificate]

{
"root_ca_cert": "{...}",
"root_ca_key": "{...}"

}

Do you think it is reasonable?

@ggreenway
Copy link
Contributor

I think a more general approach would be to implement this as a listener filter. It could either run after tls_inspector (which reads the SNI value), or re-implement that part. It can then generate the needed cert, and we can add an API for a listener filter to signal to the TLS transport_socket which certificate to use.

There have been other feature requests to support extremely large numbers of fixed/pre-generated certs and to choose the correct one at runtime, and this implementation could support that use case as well.

Does that sound workable to you?

@LuyaoZhong
Copy link
Contributor Author

Generating certs in a listener filter sounds workable. But an API for a listener filter might not be enough, the old DownstreamTlsContext still requires user setting tls certificates, we can't avoid touching DownstreamTlsContext or its sub apis.

@ggreenway
Copy link
Contributor

I think we could add a FilterState from the listener filter which contains the cert/key to use, and have SslSocket check for it's presence and set the cert on the SSL* (not SSL_CTX*).

@LuyaoZhong
Copy link
Contributor Author

LuyaoZhong commented Nov 19, 2021

Yes, it's SSL*(not SSL_CTX).

Let me list several questions and answers to make the design clear:

  1. Where to generate certs?
    After deliberation, I think tls_inspector is not a good place for generating certs, because we don't want dynamically generating certs for all SNI, we want tls_inspector to detect SNI first, then dispatch to different filterchain according to SNI. This will be more flexible, since we can have different certs conifg policy for different filterchain, static, sds or dynamic.
    In my PoC I generate the certs in SslSocket::setTransportSocketCallbacks [1].

[1]

void SslSocket::setTransportSocketCallbacks(Network::TransportSocketCallbacks& callbacks) {

  1. Why we can't avoid touching DownstreamTlsContext API.
    [2] shows Envoy requiring user setting tls certificates otherwise it exits during bootstrap. I went through some code, a easy way is to introduce an API to indicate it has the capability to provide certificates[3].

[2]

throw EnvoyException("No TLS certificates found for server context");

[3]
if (!capabilities().provides_certificates) {

  1. Where to set CA cert/key?
    Since we have to modify DownstreamTlsContext(2nd question), I prefer it's for per transportsocket but not per listener, what do you think?

@lizan
Copy link
Member

lizan commented Nov 19, 2021

  1. Where to generate certs?
    After deliberation, I think tls_inspector is not a good place for generating certs, because we don't want dynamically generating certs for all SNI, we want tls_inspector to detect SNI first, then dispatch to different filterchain according to SNI. This will be more flexible, since we can have different certs conifg policy for different filterchain, static, sds or dynamic.

Yeah this all makes sense, having generating part in transport socket sounds reasonable to me. We might need a cache to store generated cert so they aren't generated for every connection.

@lambdai
Copy link
Contributor

lambdai commented Nov 20, 2021

Perhaps SDS should be acts as that counted cache. RDS/ECDS/EDS maintains the N:1 mapping (N subscription 1 config) and it's not surprising to introduce to SDS.

@LuyaoZhong My understanding is that your POC is generating CSR, if this functionality can be moved to SDS, some of the SDS server could be leveraged

@LuyaoZhong
Copy link
Contributor Author

@lizan @lambdai
Thanks for your comments. A cache sounds good.
SDS could be one option to cache the dynamic certs, we are supposed to support both local cache and SDS, right?
If so, I want to start with local cache and then introduce SDS later. Does it make sense for you?

LuyaoZhong pushed a commit to LuyaoZhong/envoy that referenced this issue Nov 30, 2021
1. Introduce API to set root CA cert/key to enable this feature

   e.g.
   common_tls_context:
     tls_root_ca_certificate:
       cert: {"filename": "root-ca.pem"}
       private_key: {"filename": "root-ca.key"}

2. Generate/reuse dynamic certificates pair in TLS transport socket and set SSL*

   a. if there is no corresponding cached certs, create CSR and
      create certs signed from root CA, then cache the generated
      certs to local cache looked up by host name
   b. if there is corresponding cached certs, reuse them

Signed-off-by: Luyao Zhong <[email protected]>
@LuyaoZhong
Copy link
Contributor Author

I investigated the API, related classes and workflow, and completed the first version of code, see #19137.

In this code version, we have done:

  1. introduce an API to set root CA cert/key
 common_tls_context:
  tls_root_ca_certificate:
       cert: {"filename": "root-ca.pem"}
       private_key: {"filename": "root-ca.key"}
  1. implement a local cache to store generated certs pair
  2. Generate/reuse dynamic certificates pair in TLS transport socket and set SSL*
    a. if there is no corresponding cached certs, create certs signed from root CA, then store the generated certs to local cache
    b. if there is corresponding cached certs, reuse them according to host name

I'll split the patch, polish the code, reword the original proposal description after some design details settle down.
Could you help review the design items I listed above. What's your suggestion?

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Dec 30, 2021
@LuyaoZhong
Copy link
Contributor Author

LuyaoZhong commented Dec 31, 2021

This is no stale.

@github-actions github-actions bot removed the stale stalebot believes this issue/PR has not been touched recently label Dec 31, 2021
@github-actions
Copy link

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Jan 30, 2022
@github-actions
Copy link

github-actions bot commented Feb 6, 2022

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

@ggreenway
Copy link
Contributor

@LuyaoZhong I think you're missing a chunk of work in your proposed solution: You will need a way to delay the TLS handshake until you have the cert. This will probably involve a custom handshaker, which will have the integration points with your other code that fetches/generates the cert.

@LuyaoZhong
Copy link
Contributor Author

LuyaoZhong commented Jun 17, 2022

I think both SDS (I don't think on-demand SDS should be an issue) and certificate provider work for you. As my understanding, it is always fine with an extension for the non-core use-case, also certificate provider seems already have other usecase ( #21292). then it becomes reasonable extension point.

The certificate provider you defined https://github.com/envoyproxy/envoy/pull/19582/files#diff-57c305aa5cc3e7196c5c808a13ff7819ab9dd089cabffda303d885dfde43ce13R19 seems strange for me, or maybe I didn't understand that correclty.

I think you needn't define a new custom certificate provider interface. The custom certificate provider should implement the existing interface

using TlsCertificateConfigProvider =
SecretProvider<envoy::extensions::transport_sockets::tls::v3::TlsCertificate>;

@soulxu
We definitely need a new interface for certificate provider, see protobuf api #18928 (comment), certficate provider needs to provide certificates based on one cert name.

SDS can not satisfy my requirement, The lack of on-demand in Istio is one of the gaps. Envoy does not support that as well. Besides, each sds config is corresponding to one secret provider in transport socket which can only fetch one single extensions.transport_sockets.tls.v3.TlsCertificate, if we need multiple certificates we will require the control plane to distribute more sds configs, I don't know how to implement this functionality. How to carry information from dataplane to sds server is another problem, these information is used to mimicking certs, but we have request format limitations when use xDS protocol.

@LuyaoZhong I think you're missing a chunk of work in your proposed solution: You will need a way to delay the TLS handshake until you have the cert. This will probably involve a custom handshaker, which will have the integration points with your other code that fetches/generates the cert.

@ggreenway We can delay the TLS handshake until we have the cert with current proposal. I give more details about how it work and address your comments in bumping doc.

@soulxu
Copy link
Member

soulxu commented Jun 17, 2022

@soulxu We definitely need a new interface for certificate provider, see protobuf api #18928 (comment), certficate provider needs to provide certificates based on one cert name.

Thanks! Not sure I understand correctly, Is the problem that currently each TlsCertificateConfigProvider only return one secret? Not sure if that possible to change TlsCertificateConfigProvider to enable to return multiple secrets.

Actually, I'm thinking it will be great if the custom certificate provider can return the same tls certicate config (https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/transport_sockets/tls/v3/common.proto#envoy-v3-api-msg-extensions-transport-sockets-tls-v3-tlscertificate)

This is will be consistent with SDS provider and static cert provider, then you can utilize most of other part codes I thought. Like you also can config the private key provider, and the other part of tls transport will help make it works.

Apologize if I still do not understand correctly.

SDS can not satisfy my requirement, The lack of on-demand in Istio is one of the gaps. Envoy does not support that as well. Besides, each sds config is corresponding to one secret provider in transport socket which can only fetch one single extensions.transport_sockets.tls.v3.TlsCertificate, if we need multiple certificates we will require the control plane to distribute more sds configs, I don't know how to implement this functionality. How to carry information from dataplane to sds server is another problem, these information is used to mimicking certs, but we have request format limitations when use xDS protocol.

I got it. Seems your key requirement is mimicking certs on-demand, which lead to you to consider the on-demand SDS. I'm just thinking that the admin or operator can pre-defined the allowed sites to access, then the control plane generates those mimicking certs first before deploying the Envoy.

But yes, I'm not sure that matching your original requirement or not.

be curious, in your use case, would you allow your admin/operator to control which site can be mimick?

@LuyaoZhong
Copy link
Contributor Author

LuyaoZhong commented Jun 17, 2022

Actually, I'm thinking it will be great if the custom certificate provider can return the same tls certicate config (https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/transport_sockets/tls/v3/common.proto#envoy-v3-api-msg-extensions-transport-sockets-tls-v3-tlscertificate)

Yes, my plan is to let the cert provider return a tls certificate config list #19582 (comment). This is going to be updated to the cert provider PR.

I got it. Seems your key requirement is mimicking certs on-demand, which lead to you to consider the on-demand SDS. I'm just thinking that the admin or operator can pre-defined the allowed sites to access, then the control plane generates those mimicking certs first before deploying the Envoy.

Control plane can not mimic the certs based on real server cert, so this must be handle in Envoy after connecting upstream.

But yes, I'm not sure that matching your original requirement or not.

be curious, in your use case, would you allow your admin/operator to control which site can be mimick?

Yes, we will allow admin/operator to set a bumping list or a list that we don't want bumping.

@LuyaoZhong
Copy link
Contributor Author

@LuyaoZhong I think you're missing a chunk of work in your proposed solution: You will need a way to delay the TLS handshake until you have the cert. This will probably involve a custom handshaker, which will have the integration points with your other code that fetches/generates the cert.

@ggreenway We can delay the TLS handshake until we have the cert with current proposal. I give more details about how it work and address your comments in bumping doc.

@ggreenway kindly ping :)

@LuyaoZhong
Copy link
Contributor Author

@ggreenway Thanks for your comments. Please have a look at my reply in bumping doc.

@LuyaoZhong
Copy link
Contributor Author

LuyaoZhong commented Jun 23, 2022

@ggreenway Thanks for your comments to help make the design details clear, I reply in bumping doc. Please have a look and let me know if you have other concern about this design.

@LuyaoZhong
Copy link
Contributor Author

@ggreenway Thanks for your comments to help make the design details clear, I reply in bumping doc. Please have a look and let me know if you have other concern about this design.

@ggreenway kindly ping :)

@ggreenway
Copy link
Contributor

I don't have any other comments. I think more will come out as you explore implementation.

@LuyaoZhong
Copy link
Contributor Author

@ggreenway Thanks. We will explore implementation based on design clarified in bumping doc.

4 work items were worked out:

  1. Certificate Provider framework
  2. SNI-based cert selection in tls transport socket
  3. A new network filter - BumpingFilter
  4. Certificate Provider instance - LocalMimicCertProvider

We will create issue or PR for these work item, let's we discuss the implementation details there.

cc @mattklein123 @lambdai @soulxu Thanks for your comments.

@rohrit
Copy link

rohrit commented Jul 20, 2022

Hi @LuyaoZhong, thanks for raising this as it is an important security use case. I have a question about the proposal and the intended support - would this work with HTTP CONNECT like you describe in your presentation video (https://events.istio.io/istiocon-2022/sessions/introducing-tls-bumping/) as well?

@LuyaoZhong
Copy link
Contributor Author

LuyaoZhong commented Jul 20, 2022

Hi @LuyaoZhong, thanks for raising this as it is an important security use case. I have a question about the proposal and the intended support - would this work with HTTP CONNECT like you describe in your presentation video (https://events.istio.io/istiocon-2022/sessions/introducing-tls-bumping/) as well?

Yes, making this work with HTTP CONNECT is one of our goals. We are going to utilize internal listener feature to handle HTTP CONNECT. The data flow looks like this: Client -> ListenerA(HCM with CONNECT termination) -> InternalListenerB(TLS Bumping) -> Upstream server.

Would you mind share your use case or scenario? We'd like to know more about requirements from real world.

@rohrit
Copy link

rohrit commented Jul 20, 2022

Forward proxy is the main use case where the user points the browser/system proxy at Envoy. One of things that is not optimal in Envoy is the need for the internal listener for the bumping TLS termination but I guess a new custom filter would be needed which can satisfy both CONNECT termination and the payload TLS termination and mimicking.

@LuyaoZhong
Copy link
Contributor Author

LuyaoZhong commented Jul 20, 2022

Forward proxy is the main use case where the user points the browser/system proxy at Envoy. One of things that is not optimal in Envoy is the need for the internal listener for the bumping TLS termination but I guess a new custom filter would be needed which can satisfy both CONNECT termination and the payload TLS termination and mimicking.

A new custom filter was an alternative, we had #19346 as PoC before, but this can only work for HTTP/1.0. To support HTTP/2 even QUIC, a complicated network filter is required, and we can predict that we need to copy a lot of logic from HCM to process http protocol, but the filter is just for handling CONNECT eventually. So we need to evaluate if it is worth a new filter.

It's ideal to make CONNECT termination and TLS bumping together, but due to the limitation of envoy framework, it is hard to implement currently. :(

@amthorn
Copy link

amthorn commented Oct 21, 2022

@LuyaoZhong do you know if this implementation will be compatible with the custom handshaker extension ?? I.E. to be able to send some of the relevant server cert information (SANs, CN, etc...) or the certificate itself to a custom handshaker extension??

@LuyaoZhong
Copy link
Contributor Author

@amthorn Sorry for late response, I missed your comment. I think the answer is yes, just let the custom handshaker extension to interact with cert provider.

@amthorn
Copy link

amthorn commented Nov 1, 2022

@LuyaoZhong What if my custom handshaker already implemented a cert-provider interface?? Can i disable the envoy one and use the one my handshaker provides?? (We pass the SNI currently which forwards to the cert provider, but with this change we'd forward the SANs/CN into the handshaker instead of the SNI)

@LuyaoZhong
Copy link
Contributor Author

@amthorn If you are asking if cert provider can work with custom handshaker in general, Yes, you need to implement a custom handshaker which can fetch certs from a cert provider instance. We can discuss on #19308 and #19582

If you are asking if tls bumping can work with custom handshaker, I think it depends on custom handshaker, in our current design we rely on envoy to loading certs and create a default handshaker to select cert based on SNI, when switching custom handshaker, we need to consider implementing similar functionalities to support tls bumping.

@amthorn
Copy link

amthorn commented Nov 7, 2022

I was asking about the latter. You answered my question, I think changes would be required to a handshaker that already does it's own SNI-based certificate lookup (It is not enabled for certificate bumping).

If i understand correctly, the two options we have are that either:

  1. The TLS bumping logic could send SNI to custom handshakers to generate the certificate (But would then need to ignore any certificate providers within the TLS bumping logic since that would be provided by the handshaker) which is something that is not implemented in the current plan for TLS bumping.
  2. We would need to change the custom handshaker to pull out the certificate provider logic and implement it against envoy's certificate provider interface; decoupling the handshaker from the certificate provider.

I'm not expecting a response, just stating my understanding so that you can correct it if i'm wrong.

@epk
Copy link

epk commented Jan 18, 2023

I built a poc with xDS and ALS which kind of does what this is supposed to do: https://github.com/epk/envoy-egress-mitm:

  • Starts off as a TCP/Dynamic Forward Proxy
  • Using Access Log Service, mints certificates based on the SNI
  • Using xDS, creates L7 filter chains + clusters for the said SNI

There's a suite in e2e which tests it against top 500 domains.

@vermajit
Copy link

vermajit commented Apr 5, 2023

@LuyaoZhong Are you guys still pursuing this?

@BenAgai
Copy link

BenAgai commented Aug 14, 2023

Hi @vermajit, @epk ,
I'm currently considering using Envoy as forward proxy to one of my projects.
Is TLS bumping already supported or the only usable code is the POC you provided in the below link:
https://github.com/epk/envoy-egress-mitm

Thanks in advance!

@chris-windscribe
Copy link

Also looking at Envoy as a forward proxy, however, it's unclear to me of TLS Bumping is supported or not.

@soulxu
Copy link
Member

soulxu commented Dec 14, 2023

Also looking at Envoy as a forward proxy, however, it's unclear to me of TLS Bumping is supported or not.

We don't support it today.

@geraldstanje
Copy link

geraldstanje commented May 22, 2024

hi @LuyaoZhong @soulxu @amthorn @epk and team is there any progress? what is the plan? any alternatives as of may 2024?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests