Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does the index system work? #36

Closed
xeoncross opened this issue Aug 21, 2018 · 1 comment
Closed

How does the index system work? #36

xeoncross opened this issue Aug 21, 2018 · 1 comment
Assignees
Labels

Comments

@xeoncross
Copy link

xeoncross commented Aug 21, 2018

Could you expand your description of the index system in the readme or wiki?

Looks like you store all indexes in the_index bucket and then each key in the index has a gobencoded slice of []byte values that are the actual references back to whatever objects are indexed.

I don't quite understand the lookup code other than it supports gt, ge, & eq operations.

@timshannon timshannon self-assigned this Aug 21, 2018
@timshannon
Copy link
Owner

I think I'm going to write up a more detailed blog post on all of this later, but here's the gist:

Lets say you have a Type called KeyColor and data like this:

Key Color
1 Blue
2 Red
3 Blue
4 Blue
5 Red

If you add an index on Color, Bolthold will create a bucket called _index:KeyColor:Color per this.

The contents of that index will look like this:

Key Value
Blue [1, 3, 4]
Red [2, 5]

When Bolthold runs a query, it loops through all the records using an iterator and checks if the current record matches the query.

for k, v := iter.Next(); k != nil; k, v = iter.Next() {

In a non indexed query, the iterator simply returns all of the keys in the store, and BoltHold needs to loop through ALL the records to see if any of them match the query's criteria. So in the dataset above, it'll need to loop 5 times without an index.

In an indexed query, the iterator instead returns the values in the index (which is a list of keys) for each index that matches the index criteria. So if the query criteria was .Where("Color").Eq("Red").Index("Color") BoldHold only needs to loop through and test records 2 and 5.

The piece of code you linked to above (https://github.com/timshannon/bolthold/blob/master/index.go#L295) is simply a short cut, so we don't need to cycle through all of the keys. If we know the starting range of the key, we can simply seek directly to a starting key value. For example; if you have a set of records with keys that range from 1 to 100, and your query is where the key is >= 87, you don't need to check the first 86 records, you can skip right to 87.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants