Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector does not skip healthchecks on config validation - just ignores their results causing a hang #22339

Open
0x25CBFC4F opened this issue Jan 31, 2025 · 6 comments
Labels
domain: config Anything related to configuring Vector type: bug A code related bug.

Comments

@0x25CBFC4F
Copy link

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

vector validate ignores --skip-healthchecks argument and still checks health of components, just ignores their results.
This causes a 130 second hang on my build machine which validates configuration file before deploying it due to timeout (build machine does not have access to elasticsearch cluster, obviously)

docker build output:

#12 [syntax-validator 5/6] COPY ./vector.yaml /val/vector.yaml
#12 DONE 0.0s
#13 [syntax-validator 6/6] RUN ["/usr/bin/vector", "validate", "--deny-warnings", "--skip-healthchecks", "--config-yaml", "/val/vector.yaml"]
#13 0.507 2025-01-31T11:59:06.147293Z  WARN vector::config: Source has acknowledgements enabled by a sink, but acknowledgements are not supported by this source. Silent data loss could occur. source="startup_event_raw" sink="elastic"
#13 0.507 √ Loaded ["/val/vector.yaml"]
#13 131.6 2025-01-31T12:01:17.237003Z  WARN sink{component_kind="sink" component_id=elastic component_type=elasticsearch}: vector::internal_events::http_client: HTTP error. error=error trying to connect: tcp connect error: Connection timed out (os error 110) error_type="request_failed" stage="processing" internal_log_rate_limit=true
#13 131.6 2025-01-31T12:01:17.238480Z  WARN sink{component_kind="sink" component_id=elastic component_type=elasticsearch}: vector::sinks::elasticsearch::common: Failed to determine Elasticsearch API version. Please fix the reported error or set an API version explicitly via `api_version`. assumed_version=8 error=Failed to get Elasticsearch API version: Failed to make HTTP(S) request: error trying to connect: tcp connect error: Connection timed out (os error 110)
#13 131.7 √ Component configuration
#13 131.7 -----------------------------
#13 131.7                     Validated
#13 DONE 131.8s

Configuration

# The only part of my config that actually matters to reproduce the error:

sinks:
  elastic:
    type: elasticsearch
    mode: bulk
    compression: gzip
    bulk:
      index: myindexname
    inputs:
      - myinputname
    acknowledgements:
      enabled: true
    endpoints:
      - http://elastic.somewhere:9200

Version

vector 0.43.1 (x86_64-unknown-linux-gnu e30bf1f 2024-12-10 16:14:47.175528383)

Debug Output

Not needed.

Example Data

Not needed.

Additional Context

Not needed.

References

No response

@0x25CBFC4F 0x25CBFC4F added the type: bug A code related bug. label Jan 31, 2025
@0x25CBFC4F 0x25CBFC4F changed the title Vector does not skip healthchecks on config validation - just ignored their results causing a hang Vector does not skip healthchecks on config validation - just ignores their results causing a hang Jan 31, 2025
@pront pront added the domain: config Anything related to configuring Vector label Jan 31, 2025
@pront
Copy link
Member

pront commented Feb 3, 2025

I suspect that this get version request is the problem:

let response = get(
base_url,
auth,
#[cfg(feature = "aws-core")]
service_type,
request,
client,
"/",
)


Per https://vector.dev/docs/administration/validating/, you can also pass --no-environment.

Config:

...

sinks:
  elastic:
    type: elasticsearch
    mode: bulk
    compression: gzip
    bulk:
      index: myindexname
    inputs:
      - s0
    acknowledgements:
      enabled: true
    endpoints:
      - http://elastic.somewhere:9200

Validation:

cargo run --color=always --profile dev -- validate healthcheck-hang.yaml --skip-healthchecks --no-environment
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.67s
     Running `target/debug/vector validate healthcheck-hang.yaml --skip-healthchecks --no-environment`
2025-02-03T17:25:56.252068Z  WARN vector::config: Source has acknowledgements enabled by a sink, but acknowledgements are not supported by this source. Silent data loss could occur. source="s0" sink="elastic"
√ Loaded ["healthcheck-hang.yaml"]
--------------------------------------------------------------------------------------------
                                                                                   Validated

Process finished with exit code 0

@0x25CBFC4F
Copy link
Author

0x25CBFC4F commented Feb 4, 2025

Much appreciated. So, to skip actual connections from sinks I need to pass both --skip-healthchecks --no-environment, correct? Would no-environment hurt other parts of config validation, like syntax?

@0x25CBFC4F
Copy link
Author

It seems that it did. Syntax is no longer validated.

@pront
Copy link
Member

pront commented Feb 4, 2025

Much appreciated. So, to skip actual connections from sinks I need to pass both --skip-healthchecks --no-environment, correct? Would no-environment hurt other parts of config validation, like syntax?

Yes, they are different:

vector/src/validate.rs

Lines 171 to 180 in ff77761

async fn validate_environment(opts: &Opts, config: &Config, fmt: &mut Formatter) -> bool {
let diff = ConfigDiff::initial(config);
let mut pieces = if let Some(pieces) = validate_components(config, &diff, fmt).await {
pieces
} else {
return false;
};
opts.skip_healthchecks || validate_healthchecks(opts, config, &diff, &mut pieces, fmt).await
}

If no-environment isn't set, then we build all components. If --skip-healthchecks isn't set, then we execute healthcheck validations.

@0x25CBFC4F
Copy link
Author

Well obviously. But --skip-healthchecks still does a request to elasticsearch API to resolve the version and then it's ignored anyways. This sounds like a bug.

@pront
Copy link
Member

pront commented Feb 6, 2025

still does a request to elasticsearch API to resolve the version

Yes, that happens during config build(). If you specify the version in your config, it won't attempt to auto-resolve.

and then it's ignored anyways

This is not correct. You can see the full logic here:

ElasticsearchApiVersion::Auto => {

It's a good question though if we should allow such requests when building the config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: config Anything related to configuring Vector type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

2 participants