Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigTable storage backend for Cortex #468

Merged
merged 5 commits into from
Jul 13, 2017
Merged

Conversation

tomwilkie
Copy link
Contributor

@tomwilkie tomwilkie commented Jun 19, 2017

Includes #463

  • Currently stores chunks in BigTable only.
  • Minimal amount of refactoring, had to break out the storage client factory into its own package to avoid import loops.

@tomwilkie tomwilkie changed the title [WIP] BigTable storage backend for Cortex BigTable storage backend for Cortex Jul 4, 2017
@jml
Copy link
Contributor

jml commented Jul 4, 2017

Thanks @tomwilkie. Can you please rebase this before we review it?

@tomwilkie
Copy link
Contributor Author

Rebased and squashed, thanks @jml

@@ -122,6 +122,10 @@ update-gazelle:
gazelle -go_prefix github.com/weaveworks/cortex -external vendored \
-build_file_name BUILD.bazel

update-vendor:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaron7 here the update-vendor target we discussed.

Copy link
Contributor

@jml jml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful, thanks. Minor niggles.

I've only done a "can I understand the code?" level review, rather than a review of how we are actually using BigTable. If you want the latter, it'd probably be easier to review a design doc than the code.


// DynamoDB latency seems to range from a few ms to a few sec and is
// important. So use 8 buckets from 64us to 8s.
Buckets: prometheus.ExponentialBuckets(0.000128, 4, 8),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either comment or call to ExponentialBuckets is wrong. When I evaluate, I get:

[0.000128, 0.000512, 0.002048, 0.008192, 0.032768, 0.131072, 0.524288, 2.097152]

So:

  • not 8s
  • not from 64us

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just copied from DynamoDB. Have fixed comment in both cases.

b.tables[tableName] = rows
}

// TODO the hashValue should actually be hashed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just do this now? Is it more than just picking a hash function & calling it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently have data written in this format, so I need to do a proper migration for it. Will add this to the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its also not clear this will affect load balancing on BigTable, as it dynamically splits ranges (completely different to how DynamoDB load balances data).

}, bigtable.RowFilter(bigtable.FamilyFilter(columnFamily)))
}

type bigtableReadBatch bigtable.Row
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe put a comment here explaining the purpose of this, giving some hint as to why Len() always is 1 etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - its just a side effect of smashing these two interfaces together. BigTable returns rows one-by-one.

}

// NewTableClient returns a new TableClient.
func NewTableClient(ctx context.Context, cfg Config) (chunk.TableClient, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you believed in returning structs, not interfaces?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a blog about this recently! I thought of you. The reason they gave is that it makes it easier to find the implementation. Meh.

The struct is private anyway, so exposing it would give a lint error.

- Currently stores chunks in BigTable only.
- Minimal amount of refactoring, had to:
  - break out the storage client factory into its own package to avoid import loops.
  - export a few methods on Chunk (ExternalKey and Decode).
- Add stack traces to some of the chunk store errors, and log them on queries.
- Instrument the BigTable gRPC client.

Also, fix some logic in the previous BigTable storage engine:

- Return correct part of the row key as the range key
- Correctly stop reading bigtable results when trying to read to the end of a 'row'.
- Log the range key we read from bigtable.
- Chunks are being written before calculating their checksum, which shouldn't happen.
- Don't return empty chunks for BigTable backend.
@tomwilkie
Copy link
Contributor Author

Thanks for the review @jml!

Hold on merging just yet, I'm still tracking down one last bug (I hope).

@tomwilkie
Copy link
Contributor Author

Okay fixed the last bug. All good to go now!

@tomwilkie
Copy link
Contributor Author

@jml ping?

@jml
Copy link
Contributor

jml commented Jul 13, 2017

Thanks. Sorry for the delay—it's been a very full week.

@jml jml merged commit fa9cf5a into cortexproject:master Jul 13, 2017
@tomwilkie tomwilkie deleted the cortex-gcp branch July 13, 2017 09:21
@tomwilkie
Copy link
Contributor Author

No worries; thank you!

@garye
Copy link

garye commented Jul 28, 2017

@tomwilkie Whoa, this is awesome. I work on the Bigtable team so please let me know if there's anything we can do to help with this.

@tomwilkie
Copy link
Contributor Author

tomwilkie commented Jul 28, 2017 via email

@garye
Copy link

garye commented Jul 28, 2017

Well that's awesome to hear and I look forward to the blog post!

@mbrukman
Copy link
Contributor

@tomwilkie – thank you, that's music to my ears! I'm the PM for Bigtable at Google, and am also eagerly awaiting your blog post.

@mbrukman
Copy link
Contributor

@tomwilkie – how does one configure Cortex to use Bigtable as a storage backend? Are there docs or a howto one can follow?

@tomwilkie
Copy link
Contributor Author

tomwilkie commented Jul 29, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants