Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusing timeout behaviour #259

Closed
acdha opened this issue Nov 9, 2017 · 4 comments · Fixed by #509
Closed

Confusing timeout behaviour #259

acdha opened this issue Nov 9, 2017 · 4 comments · Fixed by #509

Comments

@acdha
Copy link

acdha commented Nov 9, 2017

I was surprised by the behaviour of timeouts when troubleshooting an internal healthcheck for a very slow service with configuration like this:

    https_2xx:
        prober: http
        timeout: 15s
        http:
            fail_if_not_ssl: true
            preferred_ip_protocol: "ip4"

The first problem was that the timeout values are not applied by default:

$ curl 'http://localhost:9999/probe?module=https_2xx&target=https://example.org/very/slow/healthcheck'
…
probe_duration_seconds 9.500304772

Note that while the configuration specifies 15s the actual timeout was just under the default 10 seconds. Adding -H X-Prometheus-Scrape-Timeout-Seconds:120 allows reaching the configured timeout:

probe_duration_seconds 14.500166973
@brian-brazil
Copy link
Contributor

The timeout is a maximum, and we use a default of 10 as that's the default over in Prometheus - that header wasn't always sent.

Did you try using the debug output for this particular issue?

@acdha
Copy link
Author

acdha commented Nov 13, 2017

No - I'd used log-level=debug for the server process but didn't run with debug=true. In this case, I'm not sure it would changed matters too much (unlike #260) since the part I really needed was a line in the documentation saying that the timeout in the module config could lower the cap but not raise it.

@brian-brazil
Copy link
Contributor

Would you like to send a PR to clarify that?

@dswarbrick
Copy link
Contributor

dswarbrick commented May 4, 2018

I also find the timeout behaviour puzzling. The following code effectively enforces a maximum timeout of 10s for the module timeout if not specified by the Prometheus scrape_interval:

	if timeoutSeconds == 0 {
		timeoutSeconds = 10
	}

	if module.Timeout.Seconds() < timeoutSeconds && module.Timeout.Seconds() > 0 {
		timeoutSeconds = module.Timeout.Seconds()
	}

However, just a few lines before that, the X-Prometheus-Scrape-Timeout-Seconds header is parsed, and blackbox will merrily use whatever value that successfully parses, even if greater than 10s.

This makes it somewhat of a hassle to test very slow probes (> 10s) via the blackbox web UI, since one would have to coerce the browser into setting the all-powerful X-Prometheus-Scrape-Timeout-Seconds header (or use curl). This seems a bit unintuitive to me, and the documentation is also a bit ambiguous.

brian-brazil added a commit that referenced this issue Aug 16, 2019
The current default of 10s is problematic when testing slower probes
from a browser. Cancellation will still work as a fallback,
including for older Prometheus servers that don't send the timeout
header, so there shouldn't be a big buildup of ongoing probes.

Fixes #259

Signed-off-by: Brian Brazil <[email protected]>
brian-brazil added a commit that referenced this issue Aug 16, 2019
The current default of 10s is problematic when testing slower probes
from a browser. Cancellation will still work as a fallback,
including for older Prometheus servers that don't send the timeout
header, so there shouldn't be a big buildup of ongoing probes.

Fixes #259

Signed-off-by: Brian Brazil <[email protected]>
brian-brazil added a commit that referenced this issue Aug 16, 2019
The current default of 10s is problematic when testing slower probes
from a browser. Cancellation will still work as a fallback,
including for older Prometheus servers that don't send the timeout
header, so there shouldn't be a big buildup of ongoing probes.

Fixes #259

Signed-off-by: Brian Brazil <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants