Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector leaks SSL_CERT_FILE and SSL_CERT_DIR env variables to exec processes #18103

Closed
hhromic opened this issue Jul 28, 2023 · 12 comments · Fixed by #18229
Closed

Vector leaks SSL_CERT_FILE and SSL_CERT_DIR env variables to exec processes #18103

hhromic opened this issue Jul 28, 2023 · 12 comments · Fixed by #18229
Labels
type: bug A code related bug.

Comments

@hhromic
Copy link
Contributor

hhromic commented Jul 28, 2023

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

During the development of a data ingestion component with Vector and the exec source, we discovered that Vector is leaking the SSL_CERT_FILE and SSL_CERT_DIR environment variables (used by OpenSSL) to the child processes started by exec. Note that these variables are only independently overwritten by Vector if not previously set in the environment.

In our use case, we are using curl inside a shell script as the exec process and setting a custom SSL_CERT_FILE variable in the environment. While Vector passes the correct value down to the child process, it is also still passing the SSL_CERT_DIR variable as well. Due to an issue in curl, if both SSL_CERT_FILE and SSL_CERT_DIR are passed, curl will not honour SSL_CERT_FILE.

While the actual bug is in curl by not honouring SSL_CERT_FILE alone, I think Vector should not be leaking those variables to child processes of exec, and instead should run the child process using the same environment in which Vector is running.


Potential Solution Discussion

This behaviour was introduced in #904, where the openssl-probe crate was added to Vector and then init_ssl_cert_env_vars() is called during Vector init to populate these variables for OpenSSL: https://github.com/vectordotdev/vector/blob/0f13b22a4cebbba000444bdb45f02bc820730a13/src/app.rs#L387C23-L392

This is necessary because Vector is using the vendored feature of the openssl crate, which does not load system CAs:

The vendored copy will not be configured to automatically find the system’s root certificates, but the openssl-probe crate can be used to do that instead.

There are three potential approaches to address the described problem:

  1. Instead of setting the SSL_CERT_FILE and SSL_CERT_DIR variables within Vector with init_ssl_cert_env_vars(), Vector could use the probe() function in openssl-probe which only probes suitable SSL CA file/dir locations on the system but does not actually set any variables itself. Then, with the probing results, Vector should configure OpenSSL directly.

    • This is normally accomplished in OpenSSL by using the SSL_CTX_load_verify_locations() function.
    • Unfortunately, The openssl crate does not provide complete access to this function (see source code), because it only sets the CAfile argument and leaves CApath always set to null.
  2. For the exec source, Vector could unset SSL_CERT_FILE or SSL_CERT_DIR if and only if they were set by init_ssl_cert_env_vars() before launching the child process.

    • The openssl-probe crate has a try_init_ssl_cert_env_vars() function to help with this.

      Returns true if any certificate file or directory was found while probing. Combine this with has_ssl_cert_env_vars() to check whether previously configured environment variables are valid.

    • A potential problem with this approach is that Vector would need to propagate this information (if the variables were set or not by openssl-probe) down to the exec source implementation.

  3. Vector can conditionally call init_ssl_cert_env_vars() if and only if SSL_CERT_FILE and SSL_CERT_DIR are not already present in the environment. This equivalent to Vector fully honouring those variables.

    • The openssl-probe crate has a has_ssl_cert_env_vars() function to easily accomplish this.

      if !openssl_probe::has_ssl_cert_env_vars() {
          openssl_probe::init_ssl_cert_env_vars();
      }
    • This approach would still leak SSL_CERT_FILE and SSL_CERT_DIR into exec child proceses, but only if none of theme was already set in the environment. This is better than leaking one or the other partially.

While approach (2) is the optimal solution, approach (3) could be a good-enough solution in the meantime.

Configuration

sources:
  exec:
    type: exec
    command: ["env"]
    mode: scheduled

sinks:
  console:
    type: console
    inputs: ["exec"]
    encoding:
      codec: text

# run in the following manner and observe how the console shows an environment with
# SSL_CERT_FILE and SSL_CERT_DIR set despite them not existing in the original environment
# $ unset SSL_CERT_FILE SSL_CERT_DIR
# $ vector -c vector.yaml
# (...)
# SSL_CERT_FILE=/usr/lib/ssl/cert.pem
# SSL_CERT_DIR=/usr/lib/ssl/certs

Version

vector 0.31.0 (x86_64-unknown-linux-gnu 0f13b22 2023-07-06 13:52:34.591204470)

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

@hhromic hhromic added the type: bug A code related bug. label Jul 28, 2023
@hhromic
Copy link
Contributor Author

hhromic commented Jul 30, 2023

As a proof of concept, I implemented approach (3) above:

diff --git a/src/app.rs b/src/app.rs
index 63d2a53bb..c7413270d 100644
--- a/src/app.rs
+++ b/src/app.rs
@@ -387,7 +387,9 @@ impl FinishedApplication {
 }
 
 pub fn init_global() {
-    openssl_probe::init_ssl_cert_env_vars();
+    if !openssl_probe::has_ssl_cert_env_vars() {
+        openssl_probe::init_ssl_cert_env_vars();
+    }
 
     #[cfg(not(feature = "enterprise-tests"))]
     metrics::init_global().expect("metrics initialization failed");

And it works as expected.

No SSL_CERT_* variables defined keeps existing behaviour:

# ./vector -c vector.yaml
2023-07-30T20:23:32.152994Z  INFO vector::app: Log level is enabled. level="vector=info,codec=info,vrl=info,file_source=info,tower_limit=info,rdkafka=info,buffers=info,lapin=info,kube=info"
2023-07-30T20:23:32.158345Z  INFO vector::app: Loading configs. paths=["vector.yaml"]
2023-07-30T20:23:32.161562Z  INFO vector::topology::running: Running healthchecks.
2023-07-30T20:23:32.163438Z  INFO vector: Vector has started. debug="false" version="0.32.0" arch="x86_64" revision=""
2023-07-30T20:23:32.163562Z  INFO vector::app: API is disabled, enable by setting `api.enabled` to `true` and use commands like `vector top`.
2023-07-30T20:23:32.166603Z  INFO vector::topology::builder: Healthcheck passed.
HOSTNAME=081b626e7f71
PWD=/target/release
HOME=/root
TERM=xterm
SHLVL=1
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
OLDPWD=/etc
_=./vector
SSL_CERT_FILE=/usr/lib/ssl/cert.pem
SSL_CERT_DIR=/usr/lib/ssl/certs

Defining any of the SSL_CERT_* variables to an INVALID VALUE still keeps the existing behaviour:

# SSL_CERT_FILE=/non-existing.crt ./vector -c vector.yaml
2023-07-30T20:24:16.145016Z  INFO vector::app: Log level is enabled. level="vector=info,codec=info,vrl=info,file_source=info,tower_limit=info,rdkafka=info,buffers=info,lapin=info,kube=info"
2023-07-30T20:24:16.151138Z  INFO vector::app: Loading configs. paths=["vector.yaml"]
2023-07-30T20:24:16.154488Z  INFO vector::topology::running: Running healthchecks.
2023-07-30T20:24:16.154780Z  INFO vector::topology::builder: Healthcheck passed.
2023-07-30T20:24:16.154852Z  INFO vector: Vector has started. debug="false" version="0.32.0" arch="x86_64" revision=""
2023-07-30T20:24:16.154928Z  INFO vector::app: API is disabled, enable by setting `api.enabled` to `true` and use commands like `vector top`.
SSL_CERT_FILE=/usr/lib/ssl/cert.pem
HOSTNAME=081b626e7f71
PWD=/target/release
HOME=/root
TERM=xterm
SHLVL=1
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
OLDPWD=/etc
_=./vector
SSL_CERT_DIR=/usr/lib/ssl/certs

Defining any of the SSL_CERT_* variables to a VALID VALUE prevents Vector on overriding both of them:

# SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt ./vector -c vector.yaml
2023-07-30T20:24:45.924590Z  INFO vector::app: Log level is enabled. level="vector=info,codec=info,vrl=info,file_source=info,tower_limit=info,rdkafka=info,buffers=info,lapin=info,kube=info"
2023-07-30T20:24:45.929437Z  INFO vector::app: Loading configs. paths=["vector.yaml"]
2023-07-30T20:24:45.933917Z  INFO vector::topology::running: Running healthchecks.
2023-07-30T20:24:45.934245Z  INFO vector: Vector has started. debug="false" version="0.32.0" arch="x86_64" revision=""
2023-07-30T20:24:45.934239Z  INFO vector::topology::builder: Healthcheck passed.
2023-07-30T20:24:45.934392Z  INFO vector::app: API is disabled, enable by setting `api.enabled` to `true` and use commands like `vector top`.
SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt
HOSTNAME=081b626e7f71
PWD=/target/release
HOME=/root
TERM=xterm
SHLVL=1
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
OLDPWD=/etc
_=./vector

If approach (2), the ideal way, is not viable or difficult to implement, I think this approach (3) would be good enough.

@jszwedko
Copy link
Member

jszwedko commented Aug 1, 2023

Thanks for trying that out @hhromic

I agree with the assertion that Vector should execute subprocesses in the same environment it was started in rather than a modified one. Given that, option (1) seems like the "most correct" to me. Is the lack of a set_ca_dir a gap in the openssl crate? I'm wondering if we should open an issue upstream.

Absent the ability to do (1), then (2) seems more correct, but (3) seems like a definite, and smaller, improvement so I'm happy to see a PR with it if it sufficient for you.

Incidentally it looks like the curl bug was fixed.

@hhromic
Copy link
Contributor Author

hhromic commented Aug 7, 2023

Hi @jszwedko !

Incidentally it looks like the curl bug was fixed.

Yup, unfortunately it will take a while for it to land on some package repositories, specially Debian.
At the time being we found out that curl itself has a CURL_CA_BUNDLE variable for this purpose so we are good.

Nevertheless I would like to see this behaviour fixed in Vector anyway to avoid future confusions.

I agree with the assertion that Vector should execute subprocesses in the same environment it was started in rather than a modified one. Given that, option (1) seems like the "most correct" to me. Is the lack of a set_ca_dir a gap in the openssl crate? I'm wondering if we should open an issue upstream.

I investigated this more and I was wrong with option (1). It turns out that the SSL_CTX_load_verify_locations() function only operates on specific SSL contexts created by the application. For the global verify locations, OpenSSL does not have any functions apart from setting these environment variables. The OpenSSL source code only sets the default verify locations either from compile-time defaults or environment variables. This effectively eliminates option (1) and leaves option (2) or (3) only.

For option (2) I actually thought of a better approach. On startup, before running openssl-probe, Vector can copy its original environment space into a map stored into the application state struct. Then, the exec source can simply use this original state to set the children's environment using the envs() method of Command. Moreover, as per the docs:

Child processes will inherit environment variables from their parent process by default. Environment variables explicitly set using Command::envs take precedence over inherited variables.

This would effectively shield exec from any Vector environment manipulation, not just OpenSSL.
The major work would be then plumbing the component builder to have access to the Application state data.

If the Vector team is willing to accept such contribution, this looks like a fun challenge to pursue for me. Let me know!

@jszwedko
Copy link
Member

For option (2) I actually thought of a better approach. On startup, before running openssl-probe, Vector can copy its original environment space into a map stored into the application state struct. Then, the exec source can simply use this original state to set the children's environment using the envs() method of Command. Moreover, as per the docs:

I like this as it should guard against any other environment changes by Vector as well. I'd be happy to see a PR for it if you are up to the challenge :) It will require some threading through of the environment captured at start-up to components.

@jszwedko
Copy link
Member

Ah, I see you already opened a PR taking a slightly different approach 😄

@hhromic
Copy link
Contributor Author

hhromic commented Aug 11, 2023

Ah, I see you already opened a PR taking a slightly different approach 😄

Hehe yes and no :)
As I was working on a POC for option (2), I realized that I could easily add support for env variables to exec, so I did in that PR. I think that feature (which is independent of this issue because it doesn't fully solve it) is good to have in Vector anyway.

I already found an easy way to capture and store the original environment at Vector startup, and now I'm trying to devise how to best pass this to the exec source. I don't want to pollute the existing Config structure with it, nor to have to pass the original env HashMap everywhere through the topology builders either. I'm thinking perhaps on creating a back-reference from ApplicationConfig to Application, so components can get back to that from a config instance.

This is indeed a fun challenge, but not so pleasant to work on because Vector truly takes a while to compile on my machine :(
But maybe is not so hard to get it implemented after all. If you have any ideas, I'm happy to hear them!

So far my plan is to propagate the original env Hashmap over this chain, but def want to do it as clean as possible:

@jszwedko
Copy link
Member

jszwedko commented Aug 11, 2023

Ah, I see you already opened a PR taking a slightly different approach 😄

Hehe yes and no :) As I was working on a POC for option (2), I realized that I could easily add support for env variables to exec, so I did in that PR. I think that feature (which is independent of this issue because it doesn't fully solve it) is good to have in Vector anyway.

Aha 😄 Agreed it is a useful feature in its own right.

This is indeed a fun challenge, but not so pleasant to work on because Vector truly takes a while to compile on my machine :(
But maybe is not so hard to get it implemented after all. If you have any ideas, I'm happy to hear them!

One common tip than can cut down compilation times a lot is to only include the components you are interacting with. For example, cargo check --no-default-features --features sources-exec.

Otherwise that plan makes sense to me!

@hhromic
Copy link
Contributor Author

hhromic commented Aug 11, 2023

One common tip than can cut down compilation times a lot is to only include the components you are interacting with. For example, cargo check --no-default-features --features sources-exec.

Ah that's good to know. What features do I need to enable to compile src/app.rs and src/topology/*.rs ?
Those are the files I will need to compile and re-compile a lot.

I also realised that for some reason, cargo vdev ... and cargo build tend to refresh the crates index very often and is quite slow. I helped that a bit by exporting CARGO_NET_OFFLINE=true.

@jszwedko
Copy link
Member

One common tip than can cut down compilation times a lot is to only include the components you are interacting with. For example, cargo check --no-default-features --features sources-exec.

Ah that's good to know. What features do I need to enable to compile src/app.rs and src/topology/*.rs ? Those are the files I will need to compile and re-compile a lot.

I think those will be compiled even if you just disable all features with cargo check --no-default-features.

I also realised that for some reason, cargo vdev ... and cargo build tend to refresh the crates index very often and is quite slow. I helped that a bit by exporting CARGO_NET_OFFLINE=true.

Ah, that's odd, but 👍 to that work around 😄

@hhromic
Copy link
Contributor Author

hhromic commented Aug 12, 2023

@jszwedko after thinking this a lot more, I decided to drop the idea of copying the original environment to pass it to the exec source. The reason is that actually multiple components in Vector use the environment, and all of them would need to be adapted to use the copied original environment so all remains consistent. This is of course very ugly and impractical.

The truth is that it is very unfortunate that the global truststore in OpenSSL can only be configured via process environment variables, forcing things like openssl-probe to be necessary.

That being said, I think for now the best solution is what I proposed in this comment (only calling the openssl-probe function if the respective SSL variables are not already set).

However, I decided on a slightly different implementation. See the linked PR.

@hhromic
Copy link
Contributor Author

hhromic commented Aug 15, 2023

@dsmith3197 @jszwedko
To address backwards compatibility concerns (see the PR comments), I thought today about another candidate option.

How about an opt-in environment variable (with a corresponding CLI option if you want) that specifically enables (default) or disables the openssl-probe functionality in Vector? In this way we don't need to make any assumptions. For example:

VECTOR_OPENSSL_NO_PROBE=true|false

If the value is set to true, then Vector simply will not run the probe and leave the environment as-is. If the value is set to false, or the variable is not provided, then Vector will continue to behave as currently (default). The users have full control.

I thought of that variable name due to (1) this being an OpenSSL-specific functionality and (2) the existence of another OpenSSL-related env variable in Vector: VECTOR_OPENSSL_LEGACY_PROVIDER.

Of course, this would be documented properly in the Vector environment docs including its use-case.

What do you think of that idea? I can update the PR accordingly if you prefer that approach.

@dsmith3197
Copy link
Contributor

What do you think of that idea? I can update the PR accordingly if you prefer that approach.

I'm in favor of this approach, as I think it is the clearest in terms of the behavior. Also, it is self-documented, meaning that others will be able to discover the feature, whereas that was not true in the approach where we would conditionally probe based upon the existence of other env variables. Thanks for all your effort on this issue so far @hhromic!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A code related bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants