-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue #673: incorrect content length reporting #674
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update github.com/prometheus/common to v0.11.1 This is necessary so that BBE has access to the original content length
as reported by the server.
I can't see anything relevant in that code, can you explain?
prober/http.go
Outdated
@@ -450,13 +474,37 @@ func ProbeHTTP(ctx context.Context, target string, module config.Module, registr | |||
} | |||
} | |||
|
|||
if resp != nil && !requestErrored && strings.EqualFold(resp.Header.Get("Content-Encoding"), "gzip") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like that we're special casing gzip compression, can we not let Go handle this for us?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I meant to reply to a different comment, and I wrote it here. Editing this comment to avoid confusion.
gzip compression is special-cased because that's what the underlying transport would be doing. See below.
The change to github.com/prometheus/[email protected] is my bad: while chasing where the -1 was coming from I misread a diff that was doing something different. The change is not necessary.
A little bit of a braindump here...
In github.com/prometheus/[email protected]/config/http_config.go
there's:
// NewRoundTripperFromConfig returns a new HTTP RoundTripper configured for the
// given config.HTTPClientConfig. The name is used as go-conntrack metric label.
func NewRoundTripperFromConfig(cfg HTTPClientConfig, name string, disableKeepAlives bool) (http.RoundTripper, error) {
newRT := func(tlsConfig *tls.Config) (http.RoundTripper, error) {
// The only timeout we care about is the configured scrape timeout.
// It is applied on request. So we leave out any timings here.
var rt http.RoundTripper = &http.Transport{
// ...
DisableCompression: true,
// ...
}
this function is called from prober/http.go
client, err := pconfig.NewClientFromConfig(httpClientConfig, "http_probe", true)
which is calling the above:
rt, err := NewRoundTripperFromConfig(cfg, name, disableKeepAlives)
That DisableCompression
setting has been there for quite a while. My bad.
That said, the code eventually hits this:
if cs.requestedGzip && res.Header.Get("Content-Encoding") == "gzip" {
res.Header.Del("Content-Encoding")
res.Header.Del("Content-Length")
res.ContentLength = -1
res.Body = &gzipReader{body: res.Body}
res.Uncompressed = true
}
requestedGzip is set, among other conditions, when DisableCompression is false, so that path is never taken for BBE.
This bit of code is relevant because the res.Body
bit near the end is what causes the body to be automatically decompressed.
So with the current code BBE could be getting a compressed body (if the server provides one, even if one was not requested), Content-Length
should be populated in the common case (even for compressed bodies), and Content-Encoding
could be populated (with various values, but the only one that is handled by the existing code is gzip
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking this thru, I must be hitting some servers where Content-Length
is not reported, and that's why I'm seeing a -1:
$ curl --include --silent https://prometheus.io/
HTTP/2 200
date: Thu, 13 Aug 2020 22:21:23 GMT
content-type: text/html; charset=UTF-8
set-cookie: __cfduid=daaa32207a8b1cc2df0d447face9aa8a41597357283; expires=Sat, 12-Sep-20 22:21:23 GMT; path=/; domain=.prometheus.io; HttpOnly; SameSite=Lax
cache-control: public, max-age=0, must-revalidate
age: 2605
vary: Accept-Encoding
x-nf-request-id: d59078e9-d2ba-4178-98c3-2d05e5b46e3c-5951011
cf-cache-status: DYNAMIC
cf-request-id: 048b82ef9600001297350f5200000001
expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
strict-transport-security: max-age=2592000; preload
x-content-type-options: nosniff
server: cloudflare
cf-ray: 5c25d42c2c4e1297-MIA
In this case probe_http_content_length
is reported as -1 and probe_http_uncompressed_body_length
is reported as whatever number of bytes were read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So with the current code BBE could be getting a compressed body (if the server provides one, even if one was not requested),
That's a bug on the part of the server then, and not something we should worry about. Any failures due to this are true positives. I don't see why we need any special casing here, and of just one compression algorithm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand.
The reason why this is needed is because browsers will pay attention to the Content-Encoding header and decompress the body even if they didn't ask for compression, so there are servers out there relying on this behavior, and if we want to apply the regular expression validations, we need to decompress the body.
I have added support for decompressing the currently known values except for "compress" (mostly because the RFCs that talk about it don't specify the compression parameters that should be used, and those are necessary during decompression; also because this lost support from browsers around 20 years ago).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going back to the RFC 7231 just to make sure what's conformant behavior and what isn't, I see:
5.3.4. Accept-Encoding
A server tests whether a content-coding for a given representation is
acceptable using these rules:
If no Accept-Encoding field is in the request, any content-coding
is considered acceptable by the user agent.If the representation has no content-coding, then it is
acceptable by default unless specifically excluded by the
Accept-Encoding field stating either "identity;q=0" or "*;q=0"
without a more specific entry for "identity".If the representation's content-coding is one of the
content-codings listed in the Accept-Encoding field, then it is
acceptable unless it is accompanied by a qvalue of 0. (As
defined in Section 5.3.1, a qvalue of 0 means "not acceptable".)If multiple content-codings are acceptable, then the acceptable
content-coding with the highest non-zero qvalue is preferred.
So not specifying Accept-Encoding is equivalent to Accept-Encoding: *
per 5.3.4.1. I was in error when I said that servers rely on the browser's behavior.
Since we are not adding an Accept-Encoding
header to the request, and the prometheus/common code is setting DisableCompression to true, what happens is:
-
Accept-Encoding is not present, the server will treat that as
Accept-Encoding: *
sending back compressed content if it chooses to. -
We might get compressed content, but we don't decompress it, so we are not able to apply regular expression validations.
-
If we were to set DisableCompression to false, the Go client would only handle gzip for us.
Sorry about the confusion I created, I should have started by going to the RFC to see why I was getting compressed content when I wasn't requesting it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it was better worded in RFC 2616: "If no Accept-Encoding field is present in a request, the server MAY assume that the client will accept any content coding. In this case, if "identity" is one of the available content-codings, then the server SHOULD use the "identity" content-coding..."
prometheus/common code is setting DisableCompression to true
I'm not sure why that is set, it's not immediately clear from the history - but the right setting for the blackbox exporter.
If a server sends us arbitrary encodings in the hope that we understand them, I don't think we've any obligation to try and decode them - and the user can always specify headers explicitly. For an exporter like this I'd prefer not to be magic, this should be as raw HTTP as is practical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, cool, I'll remove the content-encoding handling from the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the PR and removed the Content-Encoding
handling. If the server replies with compressed content, the regular expression validations, if any, will be applied to the compressed content and therefore they will not match. I will open another issue so we can discuss what to do with compressed responses.
I have left the code in the tests that verifies the behavior for compressed responses as a safeguard against the behavior of underlying libraries changing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new issue is #684.
prober/http.go
Outdated
@@ -226,6 +228,28 @@ func (t *transport) GotFirstResponseByte() { | |||
t.current.responseStart = time.Now() | |||
} | |||
|
|||
// chainedReadCloser implementes an io.ReadCloser that refers to two io.ReadCloser instances, forwarding all Read |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A reader that counted the bytes that passed through it would seem a bit simpler than a unusual tee.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention behind that code is to be able to read from the gzip Reader, but at the same time be able to close both Closer's, if they exist. The original body needs to be closed to be able to reuse the TCP connection and free the file descriptor; the gzip reader needs to be closed in order to check if there are decompression errors.
I think your comment refers to the use of io.TeeReader in a different part of the change. I'll fix that, because it's true that I'm abusing the fact that matchRegularExpressions
is using ioutil.ReadAll.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is more that this chainedReadCloser should not exist, it seems excessive for the problem at hand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, removed.
prober/http.go
Outdated
@@ -226,6 +228,28 @@ func (t *transport) GotFirstResponseByte() { | |||
t.current.responseStart = time.Now() | |||
} | |||
|
|||
// chainedReadCloser implementes an io.ReadCloser that refers to two io.ReadCloser instances, forwarding all Read |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is more that this chainedReadCloser should not exist, it seems excessive for the problem at hand.
prober/http.go
Outdated
// number of bytes it has read. | ||
type byteCounter struct { | ||
rc io.ReadCloser | ||
n *int64 // the location used to keep the running count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't really need this comment.
Why is this a pointer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed that.
Basically I was trying to keep the code using resp.Body and not having to type-cast to another type to be able to access the count. Obviously that makes the code less clear, so I changed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is still here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really fixed now.
prober/http.go
Outdated
@@ -450,13 +474,37 @@ func ProbeHTTP(ctx context.Context, target string, module config.Module, registr | |||
} | |||
} | |||
|
|||
if resp != nil && !requestErrored && strings.EqualFold(resp.Header.Get("Content-Encoding"), "gzip") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So with the current code BBE could be getting a compressed body (if the server provides one, even if one was not requested),
That's a bug on the part of the server then, and not something we should worry about. Any failures due to this are true positives. I don't see why we need any special casing here, and of just one compression algorithm.
go.mod
Outdated
@@ -1,6 +1,7 @@ | |||
module github.com/prometheus/blackbox_exporter | |||
|
|||
require ( | |||
github.com/andybalholm/brotli v1.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is needed to support decompressing br
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need this for the tests.
I thought about just reporting Content-Encoding: br
and not providing a body that's actually compressed using that, but the intention behind leaving those tests there is to catch behavior changes in underlying libraries (like automatically decompressing content) so I need to provide something that is actually compressed like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not be pulling in dependencies just for tests of things we don't support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, done
prober/http_test.go
Outdated
expectError: true, | ||
msg: msg, | ||
contentLenght: len(buf), | ||
uncompressedBodyLength: len(buf), // compressed lenght is reported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this ok, or is it better to report -1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, hard to tell. We can't decode it so we can't know.
I'm going to be away from the keyboard for ~ 1 week. Apologies for going silent. |
Don't worry about it, there's no rush here. |
5551425
to
58e365f
Compare
go.mod
Outdated
@@ -1,6 +1,7 @@ | |||
module github.com/prometheus/blackbox_exporter | |||
|
|||
require ( | |||
github.com/andybalholm/brotli v1.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer needed.
prober/http_test.go
Outdated
handler http.HandlerFunc | ||
} | ||
|
||
// These test cases cover known encodings (brotli, deflate, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests don't relate to this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
prober/http_test.go
Outdated
"deflate": func() testdata { | ||
msg := strings.Repeat("hello world", 10) | ||
var buf bytes.Buffer | ||
// the only error path is an invalid compression level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments should be full sentences with capital letters and full stops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
This part of the code was removed because of the other comments.
prober/http_test.go
Outdated
fw.Close() | ||
return testdata{ | ||
msg: msg, | ||
contentLenght: buf.Len(), // content lenght is the length of the compressed buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
length
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Always gets me.
d036260
to
8fe262b
Compare
prober/http_test.go
Outdated
@@ -257,12 +353,15 @@ func TestFailIfBodyMatchesRegexp(t *testing.T) { | |||
t.Fatal(err) | |||
} | |||
expectedResults := map[string]float64{ | |||
"probe_failed_due_to_regex": 1, | |||
"probe_failed_due_to_regex": 1, | |||
"probe_http_content_length": float64(len(msg)), // issue #673, check that this is correctly populated with regex validations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these comments need formatting too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for spotting this.
While fixing this I noticed that the other two cases should be checking the results, too. I added that, and since the code was getting a little repetitive, I reorganized the code in the test a little. I simply moved the first block of code into an anonymous function and added t.Run to call that for each test case.
Solved.
Without regexp validations, the current code reads the response from the server in order to figure out the number of bytes we received. With regular expression validations, it's necessary to read the whole response in order verify it if matches or not. In this case, since the response is completely consumed, the current code is reporting a length of 0. Modify the code so that we can keep track of the content length in either case: wrap the io.ReadCloser that corresponds to the response body with another io.ReadCloser that keeps track of the number of bytes read. After completely consuming the reponse, the wrapper is able to report the total number of bytes read. Add TestContentLength that checks the content lenght is reported correctly. Rework existing TestFailIfBodyMatchesRegexp to reduce repetiiton; add expectations to avoid regressing issue 673. Fixes prometheus#673 Signed-off-by: Marcelo E. Magallon <[email protected]>
Thanks! |
Without regexp validations, the current code reads the response from the
server in order to figure out the number of bytes we received. With
regular expression validations, it's necessary to read the whole
response in order verify it if matches or not. In this case, since the
response is completely consumed, the current code is reporting a length
of 0.
Modify the code so that we can keep track of the content length in
either case: wrap the io.ReadCloser that corresponds to the response
body with another io.ReadCloser that keeps track of the number of bytes
read. After completely consuming the reponse, the wrapper is able to
report the total number of bytes read.
Add TestContentLength that checks the content lenght is reported
correctly. This includes checking cases with content encoding values we
might get back from servers (brotli, deflate, gzip and identity) to make
sure we don't change behavior if one of the underlying libraries does.
Add code in existing tests dealing the regular expression validations to
avoid regressing issue 673.
Fixes #673
Signed-off-by: Marcelo E. Magallon [email protected]