Track distribution of row write rates with new HashBucketHistogram. #261

tomwilkie · 2017-02-02T11:51:29Z

Part of #254

tomwilkie · 2017-02-02T12:24:20Z

Tested locally and seems to work.

jml

Modulo one question

jml · 2017-02-02T17:29:30Z

chunk/chunk_store.go

@@ -414,6 +426,7 @@ func (c *AWSStore) calculateDynamoWrites(userID string, chunks []Chunk) (map[str
 		entries := 0
 		for _, bucket := range c.bigBuckets(chunk.From, chunk.Through) {
 			hashValue := hashValue(userID, bucket.bucket, metricName)
+			rowWrites.Observe(hashValue, uint32(len(chunk.Metric)))


I want to double check: is this likely to end up with us distributing across hash buckets in a way that's aligned to how we're distributing across dynamo? If so, won't that make the measurements less useful?

is this likely to end up with us distributing across hash buckets in a way that's aligned to how we're distributing across dynamo?

Thats unknown, as we don't know how dynamodb distributes across partitions.

If so, won't that make the measurements less useful?

Less useful maybe, but I still think quite useful - this will give us good information on the distribution of our write load within the hash space, and any massive outliers should show up.

juliusv · 2017-02-03T17:16:00Z

chunk/chunk_store.go

+	rowWrites = util.NewHashBucketHistogram(util.HashBucketHistogramOpts{
+		HistogramOpts: prometheus.HistogramOpts{
+			Namespace: "cortex",
+			Name:      "chunk_store_row_write_total",


_total is a suffix reserved for counters by convention, not for histograms, which get additional _sum and _count counters created automatically. I'd call this chunk_store_row_writes_distribution or something.

Cool, will rename

juliusv · 2017-02-03T17:18:44Z

util/hash_bucket_histogram.go

+// Collect implements prometheus.Metric
+func (h *hashBucketHistogram) Collect(c chan<- prometheus.Metric) {
+	for i := range h.buckets {
+		h.Histogram.Observe(float64(atomic.SwapUint32(&h.buckets[i], 0)))


So what your histogram observes depends totally on your scrape rate, missed scrapes, and whether you scrape from multiple Prometheus servers in parallel. This is obviously funky and not normally recommended Prometheus metrics usage, but I guess you know that?

Also, the xxx_count will be off.

Yeah this is a bit of a hack for now - suggests to make it more robust are welcome! I'm thinking a goroutine which dumps the buckets into the histogram and resets them every second or something?

Yes, that would be better!

Also, to normalise this a little, I think it would be useful to count all writes (a single counter), and then express each bucket's writes as a proportion of that.

Or even better yet, to cancel out the effect the number of buckets has on that, multiply by number of buckets too - so 1 would be perfectly load balanced, more than 1 skewed etc

tomwilkie added 2 commits February 2, 2017 11:50

Track distribution of row write rates with new HashBucketHistogram.

220d7fe

Fix lint

1193961

tomwilkie mentioned this pull request Feb 2, 2017

Ingester flush queue is backing up again #254

Closed

9 tasks

tomwilkie requested a review from jml February 2, 2017 13:45

jml approved these changes Feb 2, 2017

View reviewed changes

tomwilkie merged commit f598d31 into master Feb 2, 2017

tomwilkie deleted the hash-bucket-histogram branch February 2, 2017 17:31

juliusv reviewed Feb 3, 2017

View reviewed changes

tomwilkie mentioned this pull request Feb 6, 2017

Review feeback for the HashBucketHistogram #264

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track distribution of row write rates with new HashBucketHistogram. #261

Track distribution of row write rates with new HashBucketHistogram. #261

tomwilkie commented Feb 2, 2017

tomwilkie commented Feb 2, 2017

jml left a comment

jml Feb 2, 2017

tomwilkie Feb 2, 2017

juliusv Feb 3, 2017

tomwilkie Feb 3, 2017

juliusv Feb 3, 2017

tomwilkie Feb 3, 2017

juliusv Feb 4, 2017

tomwilkie Feb 4, 2017

tomwilkie Feb 6, 2017

Track distribution of row write rates with new HashBucketHistogram. #261

Track distribution of row write rates with new HashBucketHistogram. #261

Conversation

tomwilkie commented Feb 2, 2017

tomwilkie commented Feb 2, 2017

jml left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment