Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunk Store docs for Bigtable #274

Closed
mwitkow opened this issue Feb 7, 2017 · 5 comments
Closed

Chunk Store docs for Bigtable #274

mwitkow opened this issue Feb 7, 2017 · 5 comments

Comments

@mwitkow
Copy link

mwitkow commented Feb 7, 2017

Hey Tom,

We'll trying out Cortex as our long-term Prometheus storage. However, we probably will consider moving it to GCE's Bigtable.

A couple of questions from the chunk_store implementation, before we start the spike:

Chunk storage:
What's the size of the ChunkData - we're considering storing Chunks inside Bigtable along with the index. What's the latency impact of reading from blob storage (S3, GCS) of Chunks? Why did you chose to store them in S3? Because of DynamoDB write sizes?

Schemas:
I see that compositeSchema switches between different versions. Is v4 the one currently used? Any particular tips on why you chose v4 layout? We're thinking about a tall-but-narrow scheme for bigtable that is a concatenation of your v3 hash key and range key.

I expect the only real thing that would need implementing is the chunk.Store interface? Is there anything else we're missing?

Thanks,
Michal

@tomwilkie
Copy link
Contributor

tomwilkie commented Feb 7, 2017

Hi Michal! Great news. I have a local branch which makes the underlying storage interface much more abstract (I was playing with putting this on Cassandra). I'll tidy this up and push this.

What's the size of the ChunkData?

Chunks are all 1Kb. This can be changed, but increasing it can make compression less effective, so theres a tradeoff to be had here.

What's the latency impact of reading from blob storage (S3, GCS) of Chunks?

S3 is sloooow for small random reads. There is a memcache layer, and we ensure its sized big enough to have virtually 100% hit rate. Only queries for very old data will hit S3, and we deem it acceptable for these queries acceptable to be slightly slower. S3 is massively parallelisable (and the chunk store does indeed fetch in parallel) so the net effect is queries that hit s3 take ~1s at 99%-ile.

Why did you chose to store them in S3?

DynamoDB is expensive. If we put the chunks in there, it would make the system cost too much to run (for us to sell). So its a trade off. They would have fit in DynamoDB easily...

I'm planning on introducing the idea of "super chunks", where 10 or so chunks worth of data is written to S3 in a single object. I need to experiment with sub-object get first, but this should reduce the cost of running the system further without any perceptible performance impact. (#141)

I see that compositeSchema switches between different versions. Is v4 the one currently used?

This was literally merged yesterday! We're not running v4 in prod yet - its in dev, and it seems to have held up for ~9:45hrs...

Any particular tips on why you chose v4 layout?

Yes - v3 and all the priors hot spotted DynamoDB too much. See #254.

Initial results are promising with v4. You should use this, I expect you'll have similar hotspotting issues with BigTable.

We're thinking about a tall-but-narrow scheme for bigtable that is a concatenation of your v3 hash key and range key.

I'll have to read up on this, and how Cloud BigTable differs from internal BigTable, but if it doesn't differ, you'll need to be careful with the row (range) keys as it'll hotspot the tabletservers. Let me get back to you on this.

I expect the only real thing that would need implementing is the chunk.Store interface?

I wouldn't cut it at this interface - there is a whole lot of encoded knowledge in the chunk store about how to match selectors (for instance, see #220 for an instance). I would pick the interface below the chunk store, between it and dynamo/s3 - this is the branch I was referring to above. This way you can reuse all the chunk store logic (and monitoring) without too much reimplementation.

Let me tidy up and push this branch today and then get back to you - do you want to come over to the weaveworks office for a coffee and chat at some point?

Is there anything else we're missing?

  • Are you planning on running this on k8s? Cortex has turned into a bit of a proliferation of jobs, so running it "manually" is a bit of a pain. I'll grab a copy of our k8s yamls and put them in the repo too, so you've got something to get started with.

  • UI: cortex doesn't include any UI components right now, its just an API. I removed the upstream UI code as we have a custom (tab completion etc etc) UI in a separate component. You'll want to resurrect this as a separate job probably, until we get round to open sourcing the cortex-ui. Take a peek:

https://drive.google.com/file/d/0BwqTw528sZRINXlRZmNQN2M2RnM/view?usp=sharing

@tomwilkie
Copy link
Contributor

Hi Michal - here the PR I was referring to: #275

This is the interface I'd start with: https://github.com/weaveworks/cortex/pull/275/files#diff-1f696bedb83075f4980d18328df839a7R8

Its not 100% finished yet, and still need to abstract out the chunk storage, but its a start. I'll probably have it mostly done today.

@mwitkow
Copy link
Author

mwitkow commented Feb 10, 2017

Thanks for responding quickly and doing the interfaces. Apologies for the slow response, we're a wee-bit snowed under :)

The Cloud Bigtable is very similar to the internal one. Thanks for sharing the knowledge about hot spotting, it will definitely be interesting to see how it plays out.

We'd be deploying to K8S, and we're try it on our AWS clusters first. It'd be great to have som YAMLs, if you got them :) Later, if we find that it fits our existing multi-tenant, multi-stage monitoring pipeline, we'll do the Bigtable port for the primary clusters we use on GCE.

@tomwilkie
Copy link
Contributor

The Cloud Bigtable is very similar to the internal one. Thanks for sharing the knowledge about hot spotting, it will definitely be interesting to see how it plays out.

If thats the case you'll definitely want to use the v4 schema. Experiments show its pretty good for load balancing, we're getting good results. I'd hash the hashKey on the client side, stick that in the row key, and use the rangeKey as the columnKey. Cell contents should be empty.

Doesn't appear you can do prefix matching on the row key, so you might want to do that client side. Shame the index doesn't make it more efficient.

We'd be deploying to K8S, and we're try it on our AWS clusters first. It'd be great to have some YAMLs, if you got them :)

Here you go, with some instructions: #284

@bboreham
Copy link
Contributor

A Bigtable back-end was added in #468

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants