-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metrics to reliable socket and reconnecting libs #3242
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 15 of 15 files at r1.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @scrye)
go/lib/prom/prom.go, line 75 at r1 (raw file):
1.28, 2.56, 5.12, 10.24} // DefaultSizeBuckets 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384 DefaultSizeBuckets = []float64{1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192,
I think on the lower bound we could start at 32 or something.
go/lib/sock/reliable/reliable.go, line 402 at r1 (raw file):
return prom.Success } if opError, ok := err.(*net.OpError); ok && opError.Timeout() {
would serrors.IsTimeout(err)
not work for this case?
go/lib/sock/reliable/internal/metrics/metrics.go, line 1 at r1 (raw file):
// Copyright 2019 ETH Zurich
Please add tests for metrics labels.
promtest.CheckLabelsStruct(t, DialLabels{})
etc.
go/lib/sock/reliable/reconnect/errors.go, line 21 at r1 (raw file):
var ( ErrDispatcherDead = serrors.New("dispatcher dead") ErrReconnecterTimeoutExpired = serrors.New("timeout expired")
I wonder if that should be a timeout error so that serrors.IsTimeout
would return true.
go/lib/sock/reliable/reconnect/internal/metrics/metrics.go, line 62 at r1 (raw file):
// Timeouts returns a counter for timeout errors. func (m metrics) Timeouts(_ Labels) prometheus.Counter {
I think we could just skip the Labels for this. just Timeouts().Inc()
on the call site.
Metrics: ``` # HELP lib_reliable_dials_total Total number of Dial calls. # TYPE lib_reliable_dials_total counter lib_reliable_dials_total{result="err_not_classified"} 1 lib_reliable_dials_total{result="ok_success"} 3 # HELP lib_reliable_registers_total Total number of Register calls. # TYPE lib_reliable_registers_total counter lib_reliable_registers_total{result="ok_success",svc="CS"} 1 lib_reliable_registers_total{result="ok_success",svc="UNKNOWN"} 3 # HELP lib_reliable_reads_total Total number of Read calls # TYPE lib_reliable_reads_total histogram lib_reliable_reads_total_bucket{result="err_not_classified",le="1"} 2 lib_reliable_reads_total_bucket{result="err_not_classified",le="+Inf"} 2 lib_reliable_reads_total_sum{result="err_not_classified"} 0 lib_reliable_reads_total_count{result="err_not_classified"} 2 lib_reliable_reads_total_bucket{result="ok_success",le="1"} 0 lib_reliable_reads_total_bucket{result="ok_success",le="+Inf"} 422 lib_reliable_reads_total_sum{result="ok_success"} 148709 lib_reliable_reads_total_count{result="ok_success"} 422 # HELP lib_reliable_writes_total Total number of Write calls # TYPE lib_reliable_writes_total histogram lib_reliable_writes_total_bucket{result="ok_success",le="1"} 0 lib_reliable_writes_total_bucket{result="ok_success",le="+Inf"} 403 lib_reliable_writes_total_sum{result="ok_success"} 85732 lib_reliable_writes_total_count{result="ok_success"} 403 # HELP lib_reconnect_retries_total Total number of reconnection attempt retries # TYPE lib_reconnect_retries_total counter lib_reconnect_retries_total 2 # HELP lib_reconnect_timeouts_total Total number of reconnection attempt timeouts # TYPE lib_reconnect_timeouts_total counter lib_reconnect_timeouts_total 0 ``` Fixes #3183.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 9 of 15 files reviewed, 5 unresolved discussions (waiting on @lukedirtwalker)
go/lib/prom/prom.go, line 75 at r1 (raw file):
Previously, lukedirtwalker (Lukas Vogel) wrote…
I think on the lower bound we could start at 32 or something.
Done.
go/lib/sock/reliable/reliable.go, line 402 at r1 (raw file):
Previously, lukedirtwalker (Lukas Vogel) wrote…
would
serrors.IsTimeout(err)
not work for this case?
Yes, it should work. Done.
go/lib/sock/reliable/internal/metrics/metrics.go, line 1 at r1 (raw file):
Previously, lukedirtwalker (Lukas Vogel) wrote…
Please add tests for metrics labels.
promtest.CheckLabelsStruct(t, DialLabels{})
etc.
Done.
go/lib/sock/reliable/reconnect/errors.go, line 21 at r1 (raw file):
Previously, lukedirtwalker (Lukas Vogel) wrote…
I wonder if that should be a timeout error so that
serrors.IsTimeout
would return true.
It probably should, but it's not something I would change in this PR because it can break stuff, since it interferes with execution paths.
Added a FIXME instead.
go/lib/sock/reliable/reconnect/internal/metrics/metrics.go, line 62 at r1 (raw file):
Previously, lukedirtwalker (Lukas Vogel) wrote…
I think we could just skip the Labels for this. just
Timeouts().Inc()
on the call site.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 6 files at r2.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @scrye)
go/lib/sock/reliable/internal/metrics/metrics.go, line 1 at r1 (raw file):
Previously, scrye (Sergiu Costea) wrote…
Done.
The file is missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r3.
Reviewable status:complete! all files reviewed, all discussions resolved
Metrics:
Fixes #3183.
This change is![Reviewable](https://camo.githubusercontent.com/1541c4039185914e83657d3683ec25920c672c6c5c7ab4240ee7bff601adec0b/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)