Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add regexp matching of HTTP response headers to the http probe #419

Merged
merged 11 commits into from
Feb 21, 2019
Merged

Add regexp matching of HTTP response headers to the http probe #419

merged 11 commits into from
Feb 21, 2019

Conversation

gvsmirnov
Copy link
Contributor

I needed to add the verification of header values for some of the targets probed by the blackbox exporter, but apparently matching is currently only supported for body content.

This pull request adds two new config options: fail_if_header_matches_regexp and fail_if_header_not_matches_regexp, mirroring the existing fail_if_matches_regexp and fail_if_not_matches_regexp for the response body. I considered renaming these ones to fail_if_body_matches_regexp and fail_if_body_not_matches_regexp for consistency, but that would be a breaking change and thus does not seem like a good idea.

I am fairly new to golang, so I just tried to follow the style and apparent conventions in the rest of this project. Please let me know if something should be changed.

I also tested this extensively manually, it worked in all the corner cases I tried.

@gvsmirnov gvsmirnov changed the title Verify http headers Add regexp matching of HTTP response headers to the http probe Feb 16, 2019
@gvsmirnov
Copy link
Contributor Author

gvsmirnov commented Feb 16, 2019

Oh dear, I just checked the list of open pull requests and found not one, but TWO others with the same changes: #62 and #332 😕 Apparently they were abandoned by their authors before all the comments were addressed.

Looking at the comments from @brian-brazil, it seems like all of them are addressed in this PR now. The "check that header is missing" can be achieved by

fail_if_header_matches_regexp:
  - header: Not-Supposed-To-Be-Here
    allow_missing: true
    regexp: '.*'

config/config.go Outdated
Method string `yaml:"method,omitempty"`
Headers map[string]string `yaml:"headers,omitempty"`
FailIfMatchesRegexp []string `yaml:"fail_if_matches_regexp,omitempty"`
FailIfNotMatchesRegexp []string `yaml:"fail_if_not_matches_regexp,omitempty"`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these two above be renamed to FailIfBodyMatchesRegexp and FailIfBodyNotMatchesRegexp ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would rename them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@SuperQ
Copy link
Member

SuperQ commented Feb 17, 2019

Thanks for picking up where those other PRs went stale.

Per your above comment, I think you want the regexp to match .+ in order to enforce some header text. Or maybe I'm misunderstanding.

fail_if_header_matches_regexp:
  - header: Not-Supposed-To-Be-Here
    allow_missing: true
    regexp: '.+'

EDIT: Are headers with no value after the header name valid?

@SuperQ SuperQ requested a review from brian-brazil February 17, 2019 17:04
@gvsmirnov
Copy link
Contributor Author

I think you want the regexp to match .+ in order to enforce some header text.
EDIT: Are headers with no value after the header name valid?

If I read RFC 7230 section 3.2 correctly, header value is not optional, but an empty value is valid. That seems to be the generally accepted interpretation, judging by the discussions on the Internet that I found. However, http.Header.Get() returns an empty string ("") if the header is not set, which is ambiguous. Thankfully, the mapping can be accessed directly, so it's possible to just do something like len(textproto.MIMEHeader(header)[key]) > 0.

This leads to another consideration: as per section section 3.2.2, there may be multiple header fields with the same field name. Judging by the doc on http.Response.Header, they may be concatenated with comma delimiters. It is not clear to me in which cases it would be populated by the http client as a list with multiple string values, and in which cases it would be in a single string value as a comma-separated list.

I would propose the following behaviour:

  1. Verify the allow_missing by checking if the header with given name is present at least once. If one or more values are empty, do not consider it to be missing.
  2. Verify the regexp against all the header values for a given header name. For "fail if matches", at least one match fails the probe. For "fail if does not match", at least one mismatch fails the probe.
  3. If multiple values are concatenated into a single comma-separated string, do not do any special handling. Let the user deal with such cases on their own.

I have doubts about the behaviour in (2) for the "fail if does not match" for multi-value headers. "fail if none match" may be more desirable than "fail if any does not match".

What do you think?

@gvsmirnov
Copy link
Contributor Author

gvsmirnov commented Feb 17, 2019

After some consideration, the most common use of multi-value headers is probably Set-Cookie (the RFC even mentions it explicitly). The following use cases seem to be plausible:

  1. Verify that a specific cookie is set: the user would want the probe to fail if a desired cookie is not present, but would not want the probe to fail if there are other cookies:
fail_if_header_not_matches_regexp:
  header: Set-Cookie
  regexp: '.*Domain=\.example\.com.*'
  1. Verify that no cookies are set
fail_if_header_matches_regexp:
  header: Set-Cookie
  allow_missing: true
  regexp: '.*'

Another common multi-value header is Accept-Encoding, but in practice it's mostly represented as a single comma-separated value, not as multiple values.

I pushed the changes implementing the behaviour described above. Please let me know if you think it is reasonable.

Copy link
Contributor

@brian-brazil brian-brazil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for consistency, but that would be a breaking change and thus does not seem like a good idea.

It's probably worth it.

We can always reconsider handling of multiple headers if there's use cases.

prober/http.go Outdated
@@ -173,6 +236,11 @@ func ProbeHTTP(ctx context.Context, target string, module config.Module, registr
Help: "Indicates if probe failed due to regex",
})

probeFailedDueToHeaders = prometheus.NewGauge(prometheus.GaugeOpts{
Name: "probe_failed_due_to_headers",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probe_http_

falied_due_regex should already cover this

Copy link
Contributor Author

@gvsmirnov gvsmirnov Feb 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probe_http

Thee are several other labels that do not have the http: probe_ssl_earliest_cert_expiry and probe_failed_due_to_regex. Should they be changed, too? Or can I just ditch the newly introduced probe_failed_due_to_headers and keep the rest as is?

EDIT: formatting

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should be changed, but at least the ssl one is commonly used for alerting so we need to be a little careful so not now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will not touch the existing labels.

But I am not sure if representing the failed header matches with probe_failed_due_to_regex is the right way. It would be helpful if the user could see immediately from the metrics where the failure originated from. I do not see any drawbacks to this either (the extra disk usage would be negligible).

If you do insist, I can reuse probe_failed_due_to_regex. Do you insist?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest reusing, either that or have two more clearly named metrics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, reused existing metric. Renaming existing ones would be a more dangerous breaking change indeed.

prober/http.go Outdated
}
}
for _, headerMatchSpec := range httpConfig.FailIfHeaderNotMatchesRegexp {
values := textproto.MIMEHeader(header)[headerMatchSpec.Header]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't canonicalise the key

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed and added tests

@gvsmirnov
Copy link
Contributor Author

It's probably worth it.

Alright, done. The breaking change would be caught at the configuration parsing stage, so it's a fail-fast scenario.

config/config.go Outdated
Headers map[string]string `yaml:"headers,omitempty"`
FailIfBodyMatchesRegexp []string `yaml:"fail_if_body_matches_regexp,omitempty"`
FailIfBodyNotMatchesRegexp []string `yaml:"fail_if_body_not_matches_regexp,omitempty"`
FailIfHeaderMatchesRegexp []HeaderMatch `yaml:"fail_if_header_matches_regexp,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a string, so I'd remove the _regexp from the name to avoid confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where would the confusion arise from? Is there a convention that properties ending with _regexp are always strings that represent regular expression?

I could rename the properties so it would be like this:

# ...
fail_if_body_matches_regexp:
  - 'Thou shall not pass'
fail_if_header_matches:
  - header: Host
    regexp: example.com

It would lose the consistency between property names for header and body, though. Is it worth it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the data structures are different

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@gvsmirnov
Copy link
Contributor Author

@brian-brazil @SuperQ is there anything else I can do to get this merged? I would love to use the upstream version instead of own fork.

CONFIGURATION.md Outdated
[ - <regex>, ... ]

# Probe fails if response header matches regex. For headers with multiple values, fails if *at least one* matches
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full stop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added here and in other places

CONFIGURATION.md Outdated
# Probe fails if response header matches regex. For headers with multiple values, fails if *at least one* matches
fail_if_header_matches:
[ - [ header: <string>,
[ regex: <regex>, ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation seems off here

Copy link
Contributor Author

@gvsmirnov gvsmirnov Feb 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, does not look quite good. I got it from the prometheus config documentation, e.g. here when the same level contains both optional and non-optional parameters, then it looks like this (comments stripped):

job_name: <job_name>
[ scrape_interval: <duration> | default = <global_config.scrape_interval> ]

In here, though, the regex should be named regexp and be made non-optional, so it would look like this:

  # Probe fails if response header matches regex. For headers with multiple values, fails if *at least one* matches.
  fail_if_header_matches:
    [ - [ header: <string>,
          regexp: <regex>,
          [ allow_missing: <boolean> | default = false ]
        ], ...
    ]

Would that be OK?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking around, the only similar one we have is some of pagerduty for the AM. There we break it out as a different type.

Copy link
Contributor Author

@gvsmirnov gvsmirnov Feb 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another example right in this repo is query_response in tcp_probe, it does not break it out as a separate type. That's why I did not do it here. Can extract both or keep inlined like above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For query_response they're all optional though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, extracted only the http header match spec.

prober/http.go Outdated
level.Error(logger).Log("msg", "Missing required header", "header", headerMatchSpec.Header)
return false
} else {
continue // no need to match any regex on missing headers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"No", and full stop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed here and in the other place.

Signed-off-by: Gleb Smirnov <[email protected]>
@gvsmirnov
Copy link
Contributor Author

Huh, travis build failed when fetching dependencies. Is it possible to rerun it?

@gvsmirnov
Copy link
Contributor Author

@brian-brazil @SuperQ sorry to pester, but I ask again: anything else I can do to get this merged?


```yml
header: <string>,
regexp: <regex>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is optional

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it optional? It is not possible to match a header without a regex to match against.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless you allow it to be missing

Copy link
Contributor Author

@gvsmirnov gvsmirnov Feb 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if you allow it to be missing, then what is the point of having a match rule at all?

fail_if_header_matches:
  - header: Transfer-Encoding
    allow_missing: true
fail_if_header_not_matches:
  - header: Transfer-Encoding
    allow_missing: true

Neither of these rules make sense to me. But looking at the checks made in config.go, apparently they used to. So I think I should change the config check, so it always required regexp to be set, regardless of allow_missing.

Would that be OK?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a stroke of speculative execution, did just that.

@brian-brazil brian-brazil merged commit c453ee3 into prometheus:master Feb 21, 2019
@brian-brazil
Copy link
Contributor

Thanks!

@gvsmirnov
Copy link
Contributor Author

Thanks for a thorough code review! I appreciate it.

Is there any estimate of when the next version will be released? I’m fine building from upstream source, asking for the sake of completeness.

@gvsmirnov gvsmirnov deleted the verify-http-headers branch February 21, 2019 16:00
@brian-brazil
Copy link
Contributor

It's not going to be today, and there's a breaking change in there so Friday is a bad idea. Probably early next week.

@piotrkochan
Copy link

piotrkochan commented Mar 11, 2019

So now example.yml should be fixed, because for "http" prober option "fail_if_body_matches_regexp" fails and this is very confusing:

msg="Error loading config" err="Error parsing config file: yaml: unmarshal errors:\n  line 8: field fail_if_body_matches_regexp not found in type config.plain"

@gvsmirnov
Copy link
Contributor Author

@piotrkochan how should it be fixed? example.yml is in line with the renamed config parameter. It used to be fail_if_matches_regexp, now it is fail_if_body_matches_regexp.

Where do you get this error? I would guess that you are running the old version of blackbox_exporter with the new version of config. However, the new version of blackbox_exporter has not yet been released, so the documentation is out of sync with the available distributions.

@brian-brazil any way I can help with making a release? This is indeed confusing now.

@brian-brazil
Copy link
Contributor

#382 (comment) is what's holding a release up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants