Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] TSI Proposal 2 #7186

Closed
wants to merge 1 commit into from
Closed

[WIP] TSI Proposal 2 #7186

wants to merge 1 commit into from

Conversation

jwilder
Copy link
Contributor

@jwilder jwilder commented Aug 22, 2016

This is another proposal for an on-disk, inverted index format building on some of the ideas in #7174.

A few things different in this proposal:

  • Hash lookups of measurement, tags, and values instead of binary search
  • Sorted iteration over measurement, tags and values
  • Efficient series boolean operations over compressed bitmaps
  • Compressed series keys

@benbjohnson @e-dard


## Format

┌────────────────────────────────────────────────────────────────┐
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put the diagrams in triple back-ticks so they show up properly on the markdown document?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@e-dard
Copy link
Contributor

e-dard commented Aug 22, 2016

This looks great @jwilder. Because it's going to provide a smaller footprint than #7174 but with more intensive decompress/seek operations it's unclear (to me at least), which out of the two proposals is going to give the best bang-for-byte.

I think we might want to prototype simple versions of both?

@e-dard
Copy link
Contributor

e-dard commented Aug 22, 2016

One other question I have is what will we need to scan and maintain in memory on startup with this approach? Will we need to maintain a hash table for each Hash Index in-memory?


## Hash Indexes

The `Measurement`, `Tags` and `Values` sections all contain hash indexes hashing the key to an offset in the sorted data section. The hashing algorithm used would be _Robin Hood Hashing_ [1] which minimizes worst case search time and is also more CPU/Disk cache friendly. This allows for O(1) lookups when using exact filtering (very common) and more closely matches the current in-memory structure.
Copy link
Contributor

@e-dard e-dard Aug 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're using 4 bytes to store an offset then we are limiting the maximum size of a TSI file to somewhere under 4GB. It's hard to say where because we have to store the Series Dictionary as well as the other offset-containing blocks.

Is that OK? This comes back to the question of addressable space on 32bit systems. If we're supporting 32bit systems then I guess it's fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

32-bit systems aren't going to be able to support memory mapping files over 2GB.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benbjohnson yeah I meant if we're supporting 32bit systems then we will be able to support all offsets with a 4 byte value since we won't go anywhere near 4GB per file 😃

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworked this doc from a slightly different idea where the index was appended to the existing TSM files. 4 bytes makes sense in that case because TSM max files sizes are 2GB. Most of these could be 8 byte pointers too.

@jwilder jwilder force-pushed the jw-cardinality-index branch from 584f504 to 986e567 Compare August 22, 2016 15:16
@benbjohnson
Copy link
Contributor

Overall I think the index makes good use of space on disk but I have two concerns:

  1. Iterators need to be sorted by series at query time. I'm concerned that we'll either need to decompress the series key into the heap to do the comparisons (which will be huge for large queries) or we'll need to do decompression in the Less() function (which will be CPU intensive and cause heavy memory paging).
  2. The Go roaring bitmap implementation doesn't support memory-mapping like the Java implementation does. We'll need to implement it ourselves. Also, it uses uint32 values so we're limited to 4 billion series.

└─────────┴─────────────────┴─────────┴─────────────────┘
```

The `Series Dictionary` contains two sections. The first is an array of unique terms for all series keys. For example, `cpu,host=server0`, would create an array `3cpu4host7server0` where each term is prefixed with it's variable byte encoded length.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to inline terms which are not used as frequently? For example, a pid of 123 might only show up a couple of times and it's short so having to do a dictionary lookup for might outweigh the benefit. Msgpack uses a mixed type, variable length encoding scheme that would let us use integers for dictionary keys and strings for inline values.

https://github.com/msgpack/msgpack/blob/master/spec.md

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly. I was thinking that measurement and tag keys are in the dictionary before tag values because they would be more commonly seen. Having them earlier in the dictionary increases their chance of have a single (or double) byte representation.

There are other dictionary encoding schemes we could look at too. I only added this just for discussion on whether compressing the dictionary would be a good idea or not.

@jwilder
Copy link
Contributor Author

jwilder commented Aug 22, 2016

Iterators need to be sorted by series at query time. I'm concerned that we'll either need to decompress the series key into the heap to do the comparisons (which will be huge for large queries) or we'll need to do decompression in the Less() function (which will be CPU intensive and cause heavy memory paging).

The series in the Series Dictionary section are already sorted and a lower series ID means the series Key is less as well. After merging the bit sets, we would return them in order (already sorted).

The Go roaring bitmap implementation doesn't support memory-mapping like the Java implementation does.

Yeah, we'd need to unmarshal it via: https://godoc.org/github.com/RoaringBitmap/roaring#Bitmap.UnmarshalBinary. For most queries, I'd expect maybe 1-4 bitmaps to read.

We'll need to implement it ourselves. Also, it uses uint32 values so we're limited to 4 billion series.

We could roll-over to a new file if necessary.

The bitmaps are just one idea for discussion for encoding the series list.

@jwilder
Copy link
Contributor Author

jwilder commented Aug 22, 2016

@e-dard @benbjohnson We may want to combine some of the ideas from this and #7174. I'm concerned that the binary search in #7174 for measurement, tags and values is going to be too slow considering we're using an in-memory map currently.

Something like:

SELECT count(value) FROM cpu where host ='server-01' AND location = 'us-east1'

might end up being 5 binary searches.

@jwilder jwilder force-pushed the jw-cardinality-index branch from 986e567 to 193b414 Compare August 22, 2016 17:55
@benbjohnson benbjohnson mentioned this pull request Aug 25, 2016
@e-dard e-dard added the RFC label Sep 12, 2016
@jwilder jwilder closed this Jan 30, 2018
@jwilder jwilder deleted the jw-cardinality-index branch January 30, 2018 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants