-
Notifications
You must be signed in to change notification settings - Fork 806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track distribution of row write rates with new HashBucketHistogram. #261
Conversation
Tested locally and seems to work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modulo one question
@@ -414,6 +426,7 @@ func (c *AWSStore) calculateDynamoWrites(userID string, chunks []Chunk) (map[str | |||
entries := 0 | |||
for _, bucket := range c.bigBuckets(chunk.From, chunk.Through) { | |||
hashValue := hashValue(userID, bucket.bucket, metricName) | |||
rowWrites.Observe(hashValue, uint32(len(chunk.Metric))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to double check: is this likely to end up with us distributing across hash buckets in a way that's aligned to how we're distributing across dynamo? If so, won't that make the measurements less useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this likely to end up with us distributing across hash buckets in a way that's aligned to how we're distributing across dynamo?
Thats unknown, as we don't know how dynamodb distributes across partitions.
If so, won't that make the measurements less useful?
Less useful maybe, but I still think quite useful - this will give us good information on the distribution of our write load within the hash space, and any massive outliers should show up.
rowWrites = util.NewHashBucketHistogram(util.HashBucketHistogramOpts{ | ||
HistogramOpts: prometheus.HistogramOpts{ | ||
Namespace: "cortex", | ||
Name: "chunk_store_row_write_total", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_total
is a suffix reserved for counters by convention, not for histograms, which get additional _sum
and _count
counters created automatically. I'd call this chunk_store_row_writes_distribution
or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, will rename
// Collect implements prometheus.Metric | ||
func (h *hashBucketHistogram) Collect(c chan<- prometheus.Metric) { | ||
for i := range h.buckets { | ||
h.Histogram.Observe(float64(atomic.SwapUint32(&h.buckets[i], 0))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what your histogram observes depends totally on your scrape rate, missed scrapes, and whether you scrape from multiple Prometheus servers in parallel. This is obviously funky and not normally recommended Prometheus metrics usage, but I guess you know that?
Also, the xxx_count
will be off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is a bit of a hack for now - suggests to make it more robust are welcome! I'm thinking a goroutine which dumps the buckets into the histogram and resets them every second or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that would be better!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, to normalise this a little, I think it would be useful to count all writes (a single counter), and then express each bucket's writes as a proportion of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or even better yet, to cancel out the effect the number of buckets has on that, multiply by number of buckets too - so 1 would be perfectly load balanced, more than 1 skewed etc
Part of #254