Confusing timeout behaviour #259

acdha · 2017-11-09T21:41:19Z

I was surprised by the behaviour of timeouts when troubleshooting an internal healthcheck for a very slow service with configuration like this:

    https_2xx:
        prober: http
        timeout: 15s
        http:
            fail_if_not_ssl: true
            preferred_ip_protocol: "ip4"

The first problem was that the timeout values are not applied by default:

$ curl 'http://localhost:9999/probe?module=https_2xx&target=https://example.org/very/slow/healthcheck'
…
probe_duration_seconds 9.500304772

Note that while the configuration specifies 15s the actual timeout was just under the default 10 seconds. Adding -H X-Prometheus-Scrape-Timeout-Seconds:120 allows reaching the configured timeout:

probe_duration_seconds 14.500166973

The text was updated successfully, but these errors were encountered:

brian-brazil · 2017-11-10T10:21:46Z

The timeout is a maximum, and we use a default of 10 as that's the default over in Prometheus - that header wasn't always sent.

Did you try using the debug output for this particular issue?

acdha · 2017-11-13T14:39:09Z

No - I'd used log-level=debug for the server process but didn't run with debug=true. In this case, I'm not sure it would changed matters too much (unlike #260) since the part I really needed was a line in the documentation saying that the timeout in the module config could lower the cap but not raise it.

brian-brazil · 2017-11-13T15:42:53Z

Would you like to send a PR to clarify that?

dswarbrick · 2018-05-04T21:50:11Z

I also find the timeout behaviour puzzling. The following code effectively enforces a maximum timeout of 10s for the module timeout if not specified by the Prometheus scrape_interval:

	if timeoutSeconds == 0 {
		timeoutSeconds = 10
	}

	if module.Timeout.Seconds() < timeoutSeconds && module.Timeout.Seconds() > 0 {
		timeoutSeconds = module.Timeout.Seconds()
	}

However, just a few lines before that, the X-Prometheus-Scrape-Timeout-Seconds header is parsed, and blackbox will merrily use whatever value that successfully parses, even if greater than 10s.

This makes it somewhat of a hassle to test very slow probes (> 10s) via the blackbox web UI, since one would have to coerce the browser into setting the all-powerful X-Prometheus-Scrape-Timeout-Seconds header (or use curl). This seems a bit unintuitive to me, and the documentation is also a bit ambiguous.

The current default of 10s is problematic when testing slower probes from a browser. Cancellation will still work as a fallback, including for older Prometheus servers that don't send the timeout header, so there shouldn't be a big buildup of ongoing probes. Fixes #259 Signed-off-by: Brian Brazil <[email protected]>

brian-brazil added the good first issue label Aug 6, 2019

brian-brazil mentioned this issue Aug 16, 2019

Bump default timeout to 2m #509

Merged

brian-brazil closed this as completed in #509 Aug 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusing timeout behaviour #259

Confusing timeout behaviour #259

acdha commented Nov 9, 2017

brian-brazil commented Nov 10, 2017

acdha commented Nov 13, 2017

brian-brazil commented Nov 13, 2017

dswarbrick commented May 4, 2018 •

edited

Loading

Confusing timeout behaviour #259

Confusing timeout behaviour #259

Comments

acdha commented Nov 9, 2017

brian-brazil commented Nov 10, 2017

acdha commented Nov 13, 2017

brian-brazil commented Nov 13, 2017

dswarbrick commented May 4, 2018 • edited Loading

dswarbrick commented May 4, 2018 •

edited

Loading