Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should search on primary key be as fast as an indexed field? #310

Closed
cbrake opened this issue Nov 2, 2020 · 5 comments · Fixed by #367
Closed

Should search on primary key be as fast as an indexed field? #310

cbrake opened this issue Nov 2, 2020 · 5 comments · Fixed by #367
Assignees
Labels
bug Something isn't working
Milestone

Comments

@cbrake
Copy link

cbrake commented Nov 2, 2020

What version of Genji are you using?

latest master: v0.8.1-0.20201031100226-a42cf4e06da3

Does this issue reproduce with the latest release?

Not sure -- did not try, but could if you think that is valuable information.

What did you do?

I created a test app that inserts ~100,000 records into a bolt database:

https://github.com/simpleiot/simpleiot/blob/feature-genji2/cmd/genji-test/main.go

This program inserts documents and then runs a number of queries on non-indexed, indexed, and primary key fields.

Results:

Search on indexed field:

2020/11/02 12:31:14 SELECT * FROM users WHERE email = "[email protected]": documents found: 1, time: 93.02µs

Search on non-indexed field:

2020/11/02 12:31:14 SELECT * FROM users WHERE firstname = "Mary": documents found: 1, time: 104.352365ms

Search on primary key:

2020/11/02 12:31:15 SELECT * FROM users WHERE id = 100001: documents found: 1, time: 85.627572ms

What did you expect to see?

A search on primary key would be fast -- similar to indexed field. Ideally we would not have to index the primary key to get fast searches.

What did you see instead?

A search on primary key took almost as long as a search on a non-indexed field.

What Go version and environment are you using?

v1.15.2

@cbrake cbrake added the bug Something isn't working label Nov 2, 2020
@cbrake
Copy link
Author

cbrake commented Nov 3, 2020

Running a simple experiment:

create table t1 (id text primary key);
insert into t1 values {id:'1', v:'2'};
insert into t1 values {id:'3', v:'4'};
insert into t1 values {id:'3', v:'4'};
duplicate document

So, it does not allow a duplicate key, which is good.

Looking at the above data in boltbrowser:

image

It appears the ID is duplicated in the key and the value. This is probably necessary for scanning data into data structures.

A select on primary key and regular value does not show any difference in the plan, so perhaps the DB is not optimized yet for searching on primary key.

genji> explain select * from t1 where id = "1";
{
  "plan": "Table(t1) -> σ(cond: id = \"1\") -> ∏(*)"
}
genji> explain select * from t1 where v = "1";
{
  "plan": "Table(t1) -> σ(cond: v = \"1\") -> ∏(*)"
}

@asdine
Copy link
Collaborator

asdine commented Nov 4, 2020

@cbrake Thanks for the detailed report! Actually, the code that does the optimization is already there but it is unplugged and will be re-added soon to the optimizer. It was unplugged when we changed to the new query planner, now all that's left is to write add a rule to the optimizer to deal with primary keys.

In Genji, primary keys don't use indexes, they are actual keys of the stores, which means that they can be slightly faster than indexes. However, they require their own lookup and iteration logic.

In the meantime, I suggest to simply create a unique index on the primary key, that way you'll get the speed benefit of indexes. Once this issue is fixed you only need to remove that index.

@cbrake
Copy link
Author

cbrake commented Nov 4, 2020

thanks @asdine -- that is great news!

@asdine asdine added this to the v0.10.0 milestone Nov 12, 2020
@asdine asdine self-assigned this Nov 16, 2020
@asdine asdine modified the milestones: v0.10.0, v0.11.0 Jan 26, 2021
@asdine asdine linked a pull request Mar 7, 2021 that will close this issue
@asdine
Copy link
Collaborator

asdine commented Mar 7, 2021

Fixed by #367
Closing this, please feel free to reopen if you notice any weird behavior

@asdine asdine closed this as completed Mar 7, 2021
@cbrake
Copy link
Author

cbrake commented Mar 8, 2021

Looks good -- searching on primary key is now fast:

before:

2020/11/02 12:31:15 SELECT * FROM users WHERE id = 100001: documents found: 1, time: 85.627572ms

Now:

2021/03/08 11:34:11 SELECT * FROM users WHERE id = 100001: documents found: 1, time: 71.5µs

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants