-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] TSI Proposal 2 #7186
[WIP] TSI Proposal 2 #7186
Conversation
|
||
## Format | ||
|
||
┌────────────────────────────────────────────────────────────────┐ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put the diagrams in triple back-ticks so they show up properly on the markdown document?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
This looks great @jwilder. Because it's going to provide a smaller footprint than #7174 but with more intensive decompress/seek operations it's unclear (to me at least), which out of the two proposals is going to give the best bang-for-byte. I think we might want to prototype simple versions of both? |
One other question I have is what will we need to scan and maintain in memory on startup with this approach? Will we need to maintain a hash table for each Hash Index in-memory? |
|
||
## Hash Indexes | ||
|
||
The `Measurement`, `Tags` and `Values` sections all contain hash indexes hashing the key to an offset in the sorted data section. The hashing algorithm used would be _Robin Hood Hashing_ [1] which minimizes worst case search time and is also more CPU/Disk cache friendly. This allows for O(1) lookups when using exact filtering (very common) and more closely matches the current in-memory structure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're using 4 bytes to store an offset then we are limiting the maximum size of a TSI file to somewhere under 4GB. It's hard to say where because we have to store the Series Dictionary
as well as the other offset-containing blocks.
Is that OK? This comes back to the question of addressable space on 32bit systems. If we're supporting 32bit systems then I guess it's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
32-bit systems aren't going to be able to support memory mapping files over 2GB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benbjohnson yeah I meant if we're supporting 32bit systems then we will be able to support all offsets with a 4 byte value since we won't go anywhere near 4GB per file 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reworked this doc from a slightly different idea where the index was appended to the existing TSM files. 4 bytes makes sense in that case because TSM max files sizes are 2GB. Most of these could be 8 byte pointers too.
584f504
to
986e567
Compare
Overall I think the index makes good use of space on disk but I have two concerns:
|
└─────────┴─────────────────┴─────────┴─────────────────┘ | ||
``` | ||
|
||
The `Series Dictionary` contains two sections. The first is an array of unique terms for all series keys. For example, `cpu,host=server0`, would create an array `3cpu4host7server0` where each term is prefixed with it's variable byte encoded length. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to inline terms which are not used as frequently? For example, a pid
of 123
might only show up a couple of times and it's short so having to do a dictionary lookup for might outweigh the benefit. Msgpack uses a mixed type, variable length encoding scheme that would let us use integers for dictionary keys and strings for inline values.
https://github.com/msgpack/msgpack/blob/master/spec.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly. I was thinking that measurement and tag keys are in the dictionary before tag values because they would be more commonly seen. Having them earlier in the dictionary increases their chance of have a single (or double) byte representation.
There are other dictionary encoding schemes we could look at too. I only added this just for discussion on whether compressing the dictionary would be a good idea or not.
The series in the
Yeah, we'd need to unmarshal it via: https://godoc.org/github.com/RoaringBitmap/roaring#Bitmap.UnmarshalBinary. For most queries, I'd expect maybe 1-4 bitmaps to read.
We could roll-over to a new file if necessary. The bitmaps are just one idea for discussion for encoding the series list. |
@e-dard @benbjohnson We may want to combine some of the ideas from this and #7174. I'm concerned that the binary search in #7174 for measurement, tags and values is going to be too slow considering we're using an in-memory map currently. Something like:
might end up being 5 binary searches. |
986e567
to
193b414
Compare
This is another proposal for an on-disk, inverted index format building on some of the ideas in #7174.
A few things different in this proposal:
@benbjohnson @e-dard