Metrics Refactor and record metrics around L1 Server-Timings #70

aarshkshah1992 · 2023-03-31T13:55:43Z

Some refactor/fixes of the metrics code for correctness/grokability/dryness
Add more dimensions/labels to metrics to better capture block/car, cache-hit/cache-miss cases
Record prometheus metrics around important L1 server-timings

willscott · 2023-03-31T13:58:00Z

pool.go

@@ -17,6 +17,10 @@ import (
 	"github.com/serialx/hashring"
 )

+var (
+	maxPoolSize = 300


should this be a const rather than a var?

willscott · 2023-03-31T13:58:36Z

pool.go

-			goLogger.Infow("trimmed pool list to 200", "first", n[0].url, "first_weight",
-				n[0].weight, "last", n[199].url, "last_weight", n[199].weight)
+			n = n[:maxPoolSize]
+			goLogger.Infow(fmt.Sprintf("trimmed pool size to %d", maxPoolSize), "first", n[0].url, "first_weight",


why the combination of infow and sprintf? can't you have "size", maxPoolSize as another pair of keys directly to infoW?

willscott · 2023-03-31T14:01:09Z

metrics.go

-		Help:    "Latency observed during failed caboose car fetches from a single peer",
-		Buckets: durationPerCarHistogram,
+	fetchDurationBlockFailureMetric = prometheus.NewHistogram(prometheus.HistogramOpts{
+		Name:    prometheus.BuildFQName("ipfs", "caboose", "fetch_duration_block_failure"),


renaming current metrics will break current dashboards - why is this changing?

@willscott Not all the metrics have been renamed. Only some that were either incorrectly named or some that have taken the place of some repetitive ones.

I'll ensure the Caboose/Bifrost dashboard/metrics encompass the correct names. We're anyways revamping both the dashboards.

willscott · 2023-03-31T14:31:29Z

fetcher.go

@@ -284,6 +284,11 @@ func updateSuccessServerTimingMetrics(timingHeaders []string, resourceType strin
 					fetchDurationPerPeerSuccessCacheMissTotalLassieMetric.WithLabelValues(resourceType).Observe(float64(m.Duration.Milliseconds()))
 				case "nginx":
 					fetchDurationPerPeerSuccessTotalL1NodeMetric.WithLabelValues(resourceType, cache_status).Observe(float64(m.Duration.Milliseconds()))
+					networkTimeMs := totalTimeMs - m.Duration.Milliseconds()
+					fetchNetworkSpeedPerPeerSuccessMetric.WithLabelValues(resourceType).Observe(float64(recieved) / float64(networkTimeMs))


what about when networkTimeMs is 0? - that leads to a division by 0

Fixed and added more sanity checks.

aarshkshah1992 added 2 commits March 31, 2023 13:12

refactor and document metrics

e9e7c2c

metrics refactor and include server timings

b25d9fa

aarshkshah1992 requested a review from willscott March 31, 2023 13:55

willscott reviewed Mar 31, 2023

View reviewed changes

aarshkshah1992 added 2 commits March 31, 2023 18:07

address review

8d36404

latency and speed metric

24b16a9

willscott reviewed Mar 31, 2023

View reviewed changes

address review

8110b71

willscott approved these changes Mar 31, 2023

View reviewed changes

willscott merged commit 3c2f141 into main Mar 31, 2023

willscott deleted the feat/metrics-verification branch March 31, 2023 14:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics Refactor and record metrics around L1 Server-Timings #70

Metrics Refactor and record metrics around L1 Server-Timings #70

aarshkshah1992 commented Mar 31, 2023 •

edited

Loading

willscott Mar 31, 2023

willscott Mar 31, 2023

willscott Mar 31, 2023

aarshkshah1992 Mar 31, 2023

willscott Mar 31, 2023

aarshkshah1992 Mar 31, 2023

Metrics Refactor and record metrics around L1 Server-Timings #70

Metrics Refactor and record metrics around L1 Server-Timings #70

Conversation

aarshkshah1992 commented Mar 31, 2023 • edited Loading

willscott Mar 31, 2023

Choose a reason for hiding this comment

willscott Mar 31, 2023

Choose a reason for hiding this comment

willscott Mar 31, 2023

Choose a reason for hiding this comment

aarshkshah1992 Mar 31, 2023

Choose a reason for hiding this comment

willscott Mar 31, 2023

Choose a reason for hiding this comment

aarshkshah1992 Mar 31, 2023

Choose a reason for hiding this comment

aarshkshah1992 commented Mar 31, 2023 •

edited

Loading