BigTable storage backend for Cortex #468

tomwilkie · 2017-06-19T18:26:54Z

Includes #463

Currently stores chunks in BigTable only.
Minimal amount of refactoring, had to break out the storage client factory into its own package to avoid import loops.

jml · 2017-07-04T17:09:46Z

Thanks @tomwilkie. Can you please rebase this before we review it?

tomwilkie · 2017-07-05T10:31:47Z

Rebased and squashed, thanks @jml

tomwilkie · 2017-07-05T10:32:31Z

Makefile

@@ -122,6 +122,10 @@ update-gazelle:
 	gazelle -go_prefix github.com/weaveworks/cortex -external vendored \
 		-build_file_name BUILD.bazel

+update-vendor:


@aaron7 here the update-vendor target we discussed.

jml

Wonderful, thanks. Minor niggles.

I've only done a "can I understand the code?" level review, rather than a review of how we are actually using BigTable. If you want the latter, it'd probably be easier to review a design doc than the code.

jml · 2017-07-05T17:21:23Z

pkg/chunk/gcp/instrumentation.go

+
+	// DynamoDB latency seems to range from a few ms to a few sec and is
+	// important.  So use 8 buckets from 64us to 8s.
+	Buckets: prometheus.ExponentialBuckets(0.000128, 4, 8),


Either comment or call to ExponentialBuckets is wrong. When I evaluate, I get:

[0.000128, 0.000512, 0.002048, 0.008192, 0.032768, 0.131072, 0.524288, 2.097152]

So:

not 8s

not from 64us

This is just copied from DynamoDB. Have fixed comment in both cases.

jml · 2017-07-05T17:22:09Z

pkg/chunk/gcp/storage_client.go

+		b.tables[tableName] = rows
+	}
+
+	// TODO the hashValue should actually be hashed.


Why not just do this now? Is it more than just picking a hash function & calling it?

Currently have data written in this format, so I need to do a proper migration for it. Will add this to the comment.

Its also not clear this will affect load balancing on BigTable, as it dynamically splits ranges (completely different to how DynamoDB load balances data).

jml · 2017-07-05T17:24:37Z

pkg/chunk/gcp/storage_client.go

+	}, bigtable.RowFilter(bigtable.FamilyFilter(columnFamily)))
+}
+
+type bigtableReadBatch bigtable.Row


Maybe put a comment here explaining the purpose of this, giving some hint as to why Len() always is 1 etc.

Done - its just a side effect of smashing these two interfaces together. BigTable returns rows one-by-one.

jml · 2017-07-05T17:26:44Z

pkg/chunk/gcp/table_client.go

+}
+
+// NewTableClient returns a new TableClient.
+func NewTableClient(ctx context.Context, cfg Config) (chunk.TableClient, error) {


I thought you believed in returning structs, not interfaces?

I found a blog about this recently! I thought of you. The reason they gave is that it makes it easier to find the implementation. Meh.

The struct is private anyway, so exposing it would give a lint error.

- Currently stores chunks in BigTable only. - Minimal amount of refactoring, had to: - break out the storage client factory into its own package to avoid import loops. - export a few methods on Chunk (ExternalKey and Decode). - Add stack traces to some of the chunk store errors, and log them on queries. - Instrument the BigTable gRPC client. Also, fix some logic in the previous BigTable storage engine: - Return correct part of the row key as the range key - Correctly stop reading bigtable results when trying to read to the end of a 'row'. - Log the range key we read from bigtable. - Chunks are being written before calculating their checksum, which shouldn't happen. - Don't return empty chunks for BigTable backend.

tomwilkie · 2017-07-07T09:41:17Z

Thanks for the review @jml!

Hold on merging just yet, I'm still tracking down one last bug (I hope).

tomwilkie · 2017-07-07T19:02:10Z

Okay fixed the last bug. All good to go now!

tomwilkie · 2017-07-12T12:21:26Z

@jml ping?

jml · 2017-07-13T09:20:26Z

Thanks. Sorry for the delay—it's been a very full week.

tomwilkie · 2017-07-13T09:21:16Z

No worries; thank you!

garye · 2017-07-28T13:25:36Z

@tomwilkie Whoa, this is awesome. I work on the Bigtable team so please let me know if there's anything we can do to help with this.

tomwilkie · 2017-07-28T13:27:36Z

Will do Gary! I must say after working with DynamoDB, BigTable is a breath of fresh air. So much easier to use and better performance. Will write a blog post about it soon. And using gRPC makes it so much easier to instrument. Good work.

…

On Fri, 28 Jul 2017 at 14:25, Gary Elliott ***@***.***> wrote: @tomwilkie <https://github.com/tomwilkie> Whoa, this is awesome. I work on the Bigtable team so please let me know if there's anything we can do to help with this. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#468 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAbGhbWJHGYb8PEhpm9k92mTYs4NljBiks5sSeFQgaJpZM4N-m7_> .

garye · 2017-07-28T13:29:25Z

Well that's awesome to hear and I look forward to the blog post!

mbrukman · 2017-07-28T19:44:12Z

@tomwilkie – thank you, that's music to my ears! I'm the PM for Bigtable at Google, and am also eagerly awaiting your blog post.

mbrukman · 2017-07-29T16:02:07Z

@tomwilkie – how does one configure Cortex to use Bigtable as a storage backend? Are there docs or a howto one can follow?

tomwilkie · 2017-07-29T16:07:04Z

Hi Misha - I'll write up some instructions next week for you. Right now it's all in my head.

…

On Sat, 29 Jul 2017 at 17:02, Misha Brukman ***@***.***> wrote: @tomwilkie <https://github.com/tomwilkie> – how does one configure Cortex to use Bigtable as a storage backend? Are there docs or a howto one can follow? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#468 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAbGhf5hd8I_sqyx2Ml0tthFV-Mkjz5Wks5sS1eAgaJpZM4N-m7_> .

tomwilkie force-pushed the cortex-gcp branch from b2e9c18 to 3d6a2b8 Compare June 20, 2017 10:03

tomwilkie force-pushed the cortex-gcp branch from 9be0e4e to e303f84 Compare June 30, 2017 10:29

tomwilkie changed the title ~~[WIP] BigTable storage backend for Cortex~~ BigTable storage backend for Cortex Jul 4, 2017

tomwilkie force-pushed the cortex-gcp branch from 574dd4d to 3ea5414 Compare July 5, 2017 10:30

tomwilkie commented Jul 5, 2017

View reviewed changes

jml approved these changes Jul 5, 2017

View reviewed changes

tomwilkie force-pushed the cortex-gcp branch from 3ea5414 to 8485bd6 Compare July 7, 2017 09:31

Review feedback

40b31df

tomwilkie added 3 commits July 7, 2017 17:32

Instrument streaming gRPC BigTable calls too.

3d86cdd

Chunks don't nessecarily come from S3 anymore.

aa8f771

When fetching chunks, the should be returned if processingErr == nil.

b610daa

jml merged commit fa9cf5a into cortexproject:master Jul 13, 2017

tomwilkie deleted the cortex-gcp branch July 13, 2017 09:21

mbrukman mentioned this pull request Jul 28, 2017

Fix capitalization of "Bigtable". #522

Merged

bboreham mentioned this pull request Nov 19, 2017

Chunk Store docs for Bigtable #274

Closed

kakkoyun mentioned this pull request Jul 15, 2022

chore: Update Prometheus dependency thanos-io/thanos#5484

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BigTable storage backend for Cortex #468

BigTable storage backend for Cortex #468

tomwilkie commented Jun 19, 2017 •

edited

Loading

jml commented Jul 4, 2017

tomwilkie commented Jul 5, 2017

tomwilkie Jul 5, 2017

jml left a comment

jml Jul 5, 2017

tomwilkie Jul 7, 2017

jml Jul 5, 2017

tomwilkie Jul 7, 2017

tomwilkie Jul 7, 2017

jml Jul 5, 2017

tomwilkie Jul 7, 2017

jml Jul 5, 2017

tomwilkie Jul 7, 2017

tomwilkie commented Jul 7, 2017

tomwilkie commented Jul 7, 2017

tomwilkie commented Jul 12, 2017

jml commented Jul 13, 2017

tomwilkie commented Jul 13, 2017

garye commented Jul 28, 2017

tomwilkie commented Jul 28, 2017 via email

garye commented Jul 28, 2017

mbrukman commented Jul 28, 2017

mbrukman commented Jul 29, 2017

tomwilkie commented Jul 29, 2017 via email

BigTable storage backend for Cortex #468

BigTable storage backend for Cortex #468

Conversation

tomwilkie commented Jun 19, 2017 • edited Loading

jml commented Jul 4, 2017

tomwilkie commented Jul 5, 2017

Choose a reason for hiding this comment

jml left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomwilkie commented Jul 7, 2017

tomwilkie commented Jul 7, 2017

tomwilkie commented Jul 12, 2017

jml commented Jul 13, 2017

tomwilkie commented Jul 13, 2017

garye commented Jul 28, 2017

tomwilkie commented Jul 28, 2017 via email

garye commented Jul 28, 2017

mbrukman commented Jul 28, 2017

mbrukman commented Jul 29, 2017

tomwilkie commented Jul 29, 2017 via email

tomwilkie commented Jun 19, 2017 •

edited

Loading