Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement dynamic probes #1098

Closed
wants to merge 10 commits into from
Closed

Conversation

jkroepke
Copy link
Contributor

@jkroepke jkroepke commented Jul 13, 2023

This is useful on platforms where a team offers a observability Stack and users could use them, e,g define custom fail reges matches without creating a dedicated module for each use case. In combination with the Probe CRD, it would be an powerful option.

@jkroepke jkroepke force-pushed the dynamic_probe branch 2 times, most recently from dede325 to 2223f56 Compare July 13, 2023 23:51
Signed-off-by: Jan-Otto Kröpke <[email protected]>
@jkroepke jkroepke marked this pull request as ready for review July 14, 2023 07:33
@jurgenhaas
Copy link

Very nice, that's exactly what I was looking for. Hope this gets accepted and merged soon.

@jkroepke jkroepke marked this pull request as draft July 14, 2023 14:52
@jkroepke jkroepke marked this pull request as ready for review July 14, 2023 15:50
Signed-off-by: Jan-Otto Kröpke <[email protected]>
@dswarbrick
Copy link
Contributor

I would recommend drawing attention to possibility that this could be abused by malicious actors. Currently, all aspects of a probe are predefined by a (presumably) responsible sysadmin. As such, blackbox_exporters may be used to probe targets that would not normally be accessible (e.g. behind some network perimiter).

This PR would open up the blackbox_exporter to injecting all manner of arbitrary headers / body in requests to arbitrary targets, and this really accentuates the need to adequately secure the blackbox_exporter from unauthorised use.

@jurgenhaas
Copy link

this really accentuates the need to adequately secure the blackbox_exporter from unauthorised use.

Isn't that what should be done in any case, even without this enhancement?

@dswarbrick
Copy link
Contributor

dswarbrick commented Jul 17, 2023

Isn't that what should be done in any case, even without this enhancement?

Arguably, yes. But if the blackbox_exporter has only been configured with a concise set of known-innocent probes, probably a lot of instances are not secured (albeit hopefully not accessible from public networks).

If this new functionality is enabled and available by default, a lot of unsecured blackbox_exporters would suddenly become a much bigger security risk, and this should really be communicated in big bold letters.

@jkroepke
Copy link
Contributor Author

jkroepke commented Jul 17, 2023

I can understand the points from @dswarbrick and I guess this feature should be an opt-in by default. Which means, an sys admin has explicit enabled this feature.

Then, this should not introduce new issues by default, and targets behind blackbox_exporter remains secured.

@dswarbrick do you think, this might be enough?

@dswarbrick
Copy link
Contributor

@jkroepke I certainly think that the feature should be opt-in.

Signed-off-by: Jan-Otto Kröpke <[email protected]>
@SuperQ
Copy link
Member

SuperQ commented Jul 17, 2023

Yes, we already get enough security reports that the blackbox_exporter is allowing the target param to make it a proxy.

@jkroepke
Copy link
Contributor Author

There is always the caveat between unsecured instances of blackbox exporter vs. an well secured environment.

@SuperQ To support both sides, could be a opt-in toggle a compromise?

@jkroepke
Copy link
Contributor Author

jkroepke commented Jul 22, 2023

@electron0zero Do you have an opinion, here?

Since this is opt-in be default, it should fine. For additional network restrictions, I would recommend NetworkPolicies.

@electron0zero
Copy link
Member

@electron0zero Do you have an opinion, here?

Since this is opt-in be default, it should fine. For additional network restrictions, I would recommend NetworkPolicies.

this looks like a valid use-case to me, but since it's a fairly big change I would like to hear what the other maintainers have to say, and ideally have a discussion about the feature, and how we want to implement it :)

cc @mem @roidelapluie

jkroepke and others added 2 commits July 22, 2023 20:04
Signed-off-by: Jan-Otto Kröpke <[email protected]>
main.go Show resolved Hide resolved
Signed-off-by: Jan-Otto Kröpke <[email protected]>
@jkroepke
Copy link
Contributor Author

jkroepke commented Aug 4, 2023

@mem @roidelapluie do you have an opinion here?

@jkroepke
Copy link
Contributor Author

@electron0zero it seems like, other maintainers does not have any opinion here.

How we want to proceed here?

@jc36
Copy link

jc36 commented Sep 11, 2023

I have already tested and use these dynamic probes. Thank you very much for the work done!
I also added query:"timeout" here as it is necessary for some probes. (units are nanoseconds, so 5s = 5000000000)

@jkroepke
Copy link
Contributor Author

@jc36 the blackbox exporter should respect the scrape_timeout from Prometheus.

func getTimeout(r *http.Request, module config.Module, offset float64) (timeoutSeconds float64, err error) {

You can test this behavior via curl, by using curl -H "X-Prometheus-Scrape-Timeout-Seconds: 10"

Do you think an explicit timeout query is still useful?

@jkroepke
Copy link
Contributor Author

@electron0zero do you have an opionion here how we can proceed here

@jc36
Copy link

jc36 commented Sep 13, 2023

I didn't really understand where I should add the header "X-Prometheus-Scrape-Timeout-Seconds" in the scrape-config job settings so that blackbox-exporter waits for a response from the target for no more than the specified time.

  scrape_configs:
        - job_name: blackbox_exporter_tcp
          params:
            prober: ['tcp']
            timeout: ['5000000000'] # timeout to wait response from target
            tcp.preferred_ip_protocol: ['ip4']
          scrape_interval: 30s
          scrape_timeout: 10s  # timeout to wait response from blackbox
          metrics_path: /probe/dynamic
          scheme: http
          static_configs:
            - targets:
                - some-target.com:443
          relabel_configs:
            - source_labels: [__address__]
              target_label: __param_target
            - target_label: __address__
              replacement: extmon-blackbox-exporter:9115

In any case, the timeout specified together with other parameters will be clearer.

@jkroepke
Copy link
Contributor Author

It will be set by prometheus based on the scrape_timeout configuration.

https://github.com/prometheus/prometheus/blob/4419399e4e831052bc9b8b4f26a8e7cf337091b0/scrape/scrape.go#L826

@jkroepke
Copy link
Contributor Author

@SuperQ @electron0zero @mem @roidelapluie Can I assume that we have a common sense here?

Signed-off-by: Jan-Otto Kröpke <[email protected]>
@GuridMa
Copy link

GuridMa commented Oct 24, 2023

When will this feature be released approximately?
What version to publish to?

Copy link
Contributor

@mem mem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

Thank you for taking the time to contribute to blackbox_exporter.

I have been looking at this code. It seems mostly OK. The goal of the change is not clear, though. It's hard to give you a good code review without knowing what the goal is.

The bit added to README.md tells the user how to use the feature, but not why they would want to use the feature, e.g. when it's appropriate to use one configuration method over the other. I find the example confusing since that's something that you can do with the regular configuration file. Other than embedding the configuration in the Prometheus configuration, I don't see much advantage with this method. In fact, things that are simple using two configuration files become a little awkward (because params is a sequence of key/value pairs, so while it still looks like you are writing YAML, you are in fact not, at least not what you think you are writing, e.g. prober: [http] has to be written like that instead of just prober: http; I was a bit confused as to why that worked until I remembered how it's being parsed).

Lacking a description of the goal of this change, my guess is that you have some highly dynamic system and you are obtaining the configuration parameters from some other system. If that's the case, I still don't see why it's not possible to generate the regular configuration file with modules and pass the parameters as usual and then issue a reload when the configuration changes.

@mem
Copy link
Contributor

mem commented Nov 10, 2023

Also, I'm not following how this addresses #1050.

@mem
Copy link
Contributor

mem commented Nov 10, 2023

Also, I'm not following how this addresses #1050.

I should be more verbose.

That issue is talking about not storing the API key in the configuration file.

While I do see that this change would address that in the sense that the configuration file is not required, you would still need to write this in the Prometheus configuration file, as shown in the example.

That issue is basically asking for a secret to be read from a different location (e.g. vault) and then passed to BBE in some way. This change implements "some way". But if you want to scrape BBE using Prometheus, you would still need to write that information to the configuration file, wouldn't you?

@jkroepke
Copy link
Contributor Author

jkroepke commented Nov 10, 2023

Goal: As part of the platform team, I want a provide a generic probe service which can be use by any team. Each team can setup probes without having the requirement of preregister probes on the probe service. With the PR, Teams can use the Prometheus Operator Scrape CR to configure any probes which are supported by blackbox exporter. Including expecting specific strings, set headers, expect status codes

But if you want to scrape BBE using Prometheus, you would still need to write that information to the configuration file, wouldn't you?

Yes, but in with an Prometheus Operator eco system, it can be securely abstracted. The values can be alternativly delivered by auto discovery.

If users are running a standalone/static Prometheus, its may have no benefit.

@jkroepke jkroepke requested a review from mem November 10, 2023 14:43
@SuperQ
Copy link
Member

SuperQ commented Nov 10, 2023

With the PR, Teams can use the Prometheus Operator Scrape CR to configure any probes which are supported by blackbox exporter. Including expecting specific strings, set headers, expect status codes

I agree with @mem, this seems like an XY Problem.

It sounds like what you need is "ProbeConfig" CRD that allows teams to dynamically configure the exporter. This way a matching Probe would be simpler to maintain.

@jkroepke
Copy link
Contributor Author

jkroepke commented Nov 10, 2023

The an another goal of the PR is to provide an generic request interface can eliminate the concept additional abstraction of probes. I would like to have one place where I can configure probe targets and probe options. Like the nagios check check_http in the old days. The configuration of probes can be fully shifted to the Prometheus configuration.

It sounds like what you need is "ProbeConfig" CRD

And with this PR, an ProbeConfig CRD is not longer needed, since the configuration could be included into Probe CRs. Of cause, an blackbox exporter operator would be one solution, but I'm favorite to remove the abstraction of pre-defined probe configs which eliminate the required of such an Operator.

The alternative to Prometheus Operator would the classical annotation based service discovery. Annotation could be use to configure additional probe settings, e.g. expected status code.

Signed-off-by: Jan-Otto Kröpke <[email protected]>
@jkroepke
Copy link
Contributor Author

jkroepke commented Dec 9, 2023

I resolved the merge conflicts.

One more use-case: Since blackbox exporter is embedded into grafana/agent, it would be helpful, too.

Instead having the requirements of maintain 2 distinct configs (probes + probe target), the PR aims to avoids this.

@SuperQ @electron0zero @mem How we can proceed here? Are we comfortable to merge this?

@SuperQ
Copy link
Member

SuperQ commented Dec 9, 2023

No, sorry, I don't think we want this feature.

@electron0zero
Copy link
Member

I am in agreement with @SuperQ and @mem on this :)

@jkroepke
Copy link
Contributor Author

It make me unhappy to read this.

It removes a lot of complexity on a lot of setups by shifting the probe configs into prometheus (or upstream SD).

Feature like http.fail_if_body_matches_regexp are mostly valid for single endpoints and worst case I have to configure a module for each probe to observe (In cases where I have to check more than status code).

@SuperQ I would like to appreciate one time more here to understand to user point of view here. We have the Prometheus Operator Eco system where someone can define probes via Kubernetes Custom Resources, but we do not have a corresponding blackbox_exporter Operator for Kubernetes and I expect in the next 12-24 months, there wont be one.

Looking at exporters like https://github.com/webdevops/azure-metrics-exporter#default-template , they allows each input as GET parameters, which perfectly integrate into the existing Prometheus Operator eco system.

@SuperQ
Copy link
Member

SuperQ commented Dec 25, 2023

We have the Prometheus Operator Eco system where someone can define probes via Kubernetes Custom Resources, but we do not have a corresponding blackbox_exporter Operator for Kubernetes and I expect in the next 12-24 months, there wont be one.

Maybe you can contribute them.

@SuperQ
Copy link
Member

SuperQ commented Dec 25, 2023

It removes a lot of complexity on a lot of setups by shifting the probe configs into prometheus (or upstream SD).

That's just shifting complexity around. It doesn't actually solve the underlying problem and makes it worse for everyone since there's now a much larger abuse potential.

The XY Problem here is that you need to make the probes self-service. That's a problem with the blackbox_exporter config, not Prometheus.

I recommend checking the Prometheus Operator issue tracker and filing an issue there first.

@SuperQ
Copy link
Member

SuperQ commented Dec 25, 2023

Since all of the maintainers agree that we can not accept this, I am going to close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]probe support params of module
8 participants