[WIP] TSI Proposal 2 #7186

jwilder · 2016-08-22T06:06:16Z

This is another proposal for an on-disk, inverted index format building on some of the ideas in #7174.

A few things different in this proposal:

Hash lookups of measurement, tags, and values instead of binary search
Sorted iteration over measurement, tags and values
Efficient series boolean operations over compressed bitmaps
Compressed series keys

e-dard · 2016-08-22T09:16:24Z

docs/tsm/TSM_FORMAT.md

+
+## Format
+
+┌────────────────────────────────────────────────────────────────┐


Can you put the diagrams in triple back-ticks so they show up properly on the markdown document?

e-dard · 2016-08-22T10:31:13Z

This looks great @jwilder. Because it's going to provide a smaller footprint than #7174 but with more intensive decompress/seek operations it's unclear (to me at least), which out of the two proposals is going to give the best bang-for-byte.

I think we might want to prototype simple versions of both?

e-dard · 2016-08-22T11:06:59Z

One other question I have is what will we need to scan and maintain in memory on startup with this approach? Will we need to maintain a hash table for each Hash Index in-memory?

e-dard · 2016-08-22T11:11:52Z

docs/tsm/TSM_FORMAT.md

+
+## Hash Indexes
+
+The `Measurement`, `Tags` and `Values` sections all contain hash indexes hashing the key to an offset in the sorted data section.  The hashing algorithm used would be _Robin Hood Hashing_ [1] which minimizes worst case search time and is also more CPU/Disk cache friendly.  This allows for O(1) lookups when using exact filtering (very common) and more closely matches the current in-memory structure.


If we're using 4 bytes to store an offset then we are limiting the maximum size of a TSI file to somewhere under 4GB. It's hard to say where because we have to store the Series Dictionary as well as the other offset-containing blocks.

Is that OK? This comes back to the question of addressable space on 32bit systems. If we're supporting 32bit systems then I guess it's fine.

32-bit systems aren't going to be able to support memory mapping files over 2GB.

@benbjohnson yeah I meant if we're supporting 32bit systems then we will be able to support all offsets with a 4 byte value since we won't go anywhere near 4GB per file 😃

I reworked this doc from a slightly different idea where the index was appended to the existing TSM files. 4 bytes makes sense in that case because TSM max files sizes are 2GB. Most of these could be 8 byte pointers too.

benbjohnson · 2016-08-22T15:16:55Z

Overall I think the index makes good use of space on disk but I have two concerns:

Iterators need to be sorted by series at query time. I'm concerned that we'll either need to decompress the series key into the heap to do the comparisons (which will be huge for large queries) or we'll need to do decompression in the Less() function (which will be CPU intensive and cause heavy memory paging).
The Go roaring bitmap implementation doesn't support memory-mapping like the Java implementation does. We'll need to implement it ourselves. Also, it uses uint32 values so we're limited to 4 billion series.

benbjohnson · 2016-08-22T15:23:53Z

docs/tsm/TSM_FORMAT.md

+└─────────┴─────────────────┴─────────┴─────────────────┘
+```
+
+The `Series Dictionary` contains two sections.  The first is an array of unique terms for all series keys.  For example, `cpu,host=server0`, would create an array `3cpu4host7server0` where each term is prefixed with it's variable byte encoded length.


Does it make sense to inline terms which are not used as frequently? For example, a pid of 123 might only show up a couple of times and it's short so having to do a dictionary lookup for might outweigh the benefit. Msgpack uses a mixed type, variable length encoding scheme that would let us use integers for dictionary keys and strings for inline values.

https://github.com/msgpack/msgpack/blob/master/spec.md

Possibly. I was thinking that measurement and tag keys are in the dictionary before tag values because they would be more commonly seen. Having them earlier in the dictionary increases their chance of have a single (or double) byte representation.

There are other dictionary encoding schemes we could look at too. I only added this just for discussion on whether compressing the dictionary would be a good idea or not.

jwilder · 2016-08-22T15:42:03Z

Iterators need to be sorted by series at query time. I'm concerned that we'll either need to decompress the series key into the heap to do the comparisons (which will be huge for large queries) or we'll need to do decompression in the Less() function (which will be CPU intensive and cause heavy memory paging).

The series in the Series Dictionary section are already sorted and a lower series ID means the series Key is less as well. After merging the bit sets, we would return them in order (already sorted).

The Go roaring bitmap implementation doesn't support memory-mapping like the Java implementation does.

Yeah, we'd need to unmarshal it via: https://godoc.org/github.com/RoaringBitmap/roaring#Bitmap.UnmarshalBinary. For most queries, I'd expect maybe 1-4 bitmaps to read.

We'll need to implement it ourselves. Also, it uses uint32 values so we're limited to 4 billion series.

We could roll-over to a new file if necessary.

The bitmaps are just one idea for discussion for encoding the series list.

jwilder · 2016-08-22T15:59:58Z

@e-dard @benbjohnson We may want to combine some of the ideas from this and #7174. I'm concerned that the binary search in #7174 for measurement, tags and values is going to be too slow considering we're using an in-memory map currently.

Something like:

SELECT count(value) FROM cpu where host ='server-01' AND location = 'us-east1'

might end up being 5 binary searches.

jwilder added the area/tsm label Aug 22, 2016

e-dard reviewed Aug 22, 2016
View reviewed changes

jwilder force-pushed the jw-cardinality-index branch from 584f504 to 986e567 Compare August 22, 2016 15:16

benbjohnson reviewed Aug 22, 2016
View reviewed changes

Add TSM indexing format doc

193b414

jwilder force-pushed the jw-cardinality-index branch from 986e567 to 193b414 Compare August 22, 2016 17:55

benbjohnson mentioned this pull request Aug 25, 2016

TSI Proposal #7174

Closed

jwilder mentioned this pull request Aug 30, 2016

Support High Cardinality Tags and Series #7151

Closed

e-dard added the RFC label Sep 12, 2016

pauldix added in progress and removed in progress labels Jan 25, 2017

jwilder added the area/tsi label Apr 3, 2017

jwilder closed this Jan 30, 2018

jwilder deleted the jw-cardinality-index branch January 30, 2018 16:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] TSI Proposal 2 #7186

[WIP] TSI Proposal 2 #7186

jwilder commented Aug 22, 2016

e-dard Aug 22, 2016

jwilder Aug 22, 2016

e-dard commented Aug 22, 2016

e-dard commented Aug 22, 2016

e-dard Aug 22, 2016 •

edited

Loading

benbjohnson Aug 22, 2016

e-dard Aug 22, 2016

jwilder Aug 22, 2016

benbjohnson commented Aug 22, 2016

benbjohnson Aug 22, 2016

jwilder Aug 22, 2016

jwilder commented Aug 22, 2016

jwilder commented Aug 22, 2016


		## Format

		┌────────────────────────────────────────────────────────────────┐


		## Hash Indexes

		The `Measurement`, `Tags` and `Values` sections all contain hash indexes hashing the key to an offset in the sorted data section. The hashing algorithm used would be _Robin Hood Hashing_ [1] which minimizes worst case search time and is also more CPU/Disk cache friendly. This allows for O(1) lookups when using exact filtering (very common) and more closely matches the current in-memory structure.

[WIP] TSI Proposal 2 #7186

[WIP] TSI Proposal 2 #7186

Conversation

jwilder commented Aug 22, 2016

e-dard Aug 22, 2016

Choose a reason for hiding this comment

jwilder Aug 22, 2016

Choose a reason for hiding this comment

e-dard commented Aug 22, 2016

e-dard commented Aug 22, 2016

e-dard Aug 22, 2016 • edited Loading

Choose a reason for hiding this comment

benbjohnson Aug 22, 2016

Choose a reason for hiding this comment

e-dard Aug 22, 2016

Choose a reason for hiding this comment

jwilder Aug 22, 2016

Choose a reason for hiding this comment

benbjohnson commented Aug 22, 2016

benbjohnson Aug 22, 2016

Choose a reason for hiding this comment

jwilder Aug 22, 2016

Choose a reason for hiding this comment

jwilder commented Aug 22, 2016

jwilder commented Aug 22, 2016

e-dard Aug 22, 2016 •

edited

Loading