Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track distribution of row write rates with new HashBucketHistogram. #261

Merged
merged 2 commits into from
Feb 2, 2017

Conversation

tomwilkie
Copy link
Contributor

Part of #254

@tomwilkie
Copy link
Contributor Author

Tested locally and seems to work.

@tomwilkie tomwilkie requested a review from jml February 2, 2017 13:45
Copy link
Contributor

@jml jml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modulo one question

@@ -414,6 +426,7 @@ func (c *AWSStore) calculateDynamoWrites(userID string, chunks []Chunk) (map[str
entries := 0
for _, bucket := range c.bigBuckets(chunk.From, chunk.Through) {
hashValue := hashValue(userID, bucket.bucket, metricName)
rowWrites.Observe(hashValue, uint32(len(chunk.Metric)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to double check: is this likely to end up with us distributing across hash buckets in a way that's aligned to how we're distributing across dynamo? If so, won't that make the measurements less useful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this likely to end up with us distributing across hash buckets in a way that's aligned to how we're distributing across dynamo?

Thats unknown, as we don't know how dynamodb distributes across partitions.

If so, won't that make the measurements less useful?

Less useful maybe, but I still think quite useful - this will give us good information on the distribution of our write load within the hash space, and any massive outliers should show up.

@tomwilkie tomwilkie merged commit f598d31 into master Feb 2, 2017
@tomwilkie tomwilkie deleted the hash-bucket-histogram branch February 2, 2017 17:31
rowWrites = util.NewHashBucketHistogram(util.HashBucketHistogramOpts{
HistogramOpts: prometheus.HistogramOpts{
Namespace: "cortex",
Name: "chunk_store_row_write_total",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_total is a suffix reserved for counters by convention, not for histograms, which get additional _sum and _count counters created automatically. I'd call this chunk_store_row_writes_distribution or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, will rename

// Collect implements prometheus.Metric
func (h *hashBucketHistogram) Collect(c chan<- prometheus.Metric) {
for i := range h.buckets {
h.Histogram.Observe(float64(atomic.SwapUint32(&h.buckets[i], 0)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what your histogram observes depends totally on your scrape rate, missed scrapes, and whether you scrape from multiple Prometheus servers in parallel. This is obviously funky and not normally recommended Prometheus metrics usage, but I guess you know that?

Also, the xxx_count will be off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is a bit of a hack for now - suggests to make it more robust are welcome! I'm thinking a goroutine which dumps the buckets into the histogram and resets them every second or something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be better!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, to normalise this a little, I think it would be useful to count all writes (a single counter), and then express each bucket's writes as a proportion of that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even better yet, to cancel out the effect the number of buckets has on that, multiply by number of buckets too - so 1 would be perfectly load balanced, more than 1 skewed etc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants