-
-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(feat) use view table for query operations #116
Conversation
8b90829
to
3cb6b66
Compare
update: moved release plan to the first message |
3cb6b66
to
32ad048
Compare
They will be enabled back with #116
I am already using this branch and must say... it feels so snappy! Absolutely love it! |
They will be enabled back with #116
b0089d5
to
2d3b6ce
Compare
I am still not happy with performance query functions on a set of 9k+ notes. Partially, the culprit here is that horrific SQL. The more tables I want to multiply, the worse performance becomes. So idea here is simple. Instead of doing multiplication of matrices during query operation, use a view table where every row is a full `vulpea-note` (in reality multiple notes because of aliases, but that's irrelevant here). This new view table is used in the following functions: - `vulpea-db-query` - `vulpea-db-query-by-ids` - `vulpea-db-get-by-id` This transitively affects all specialized queries. The way view table is being built is not optimal as it parses the buffer for the second time in addition to `org-roam` routine. Ideally this feature should be proposed to `org-roam` thus improving build time. Benchamarks =========== Count of notes: 9554. General benchmark with regular `org-roam` tables ------------------------------------------------ | test | result size | generic | specialized | | ------------- | ----------- | ------------------ | ------------ | | `tags-some` | 30 notes | 4.6693460650999995 | 0.0605194177 | | `tags-every` | 3168 notes | 4.7333844436999996 | 1.0618131127 | | `links-some` | 1657 notes | 4.8095771283 | 1.362214091 | | `links-every` | 92 notes | 4.5517473337999995 | 0.1707312557 | General benchmark with view table --------------------------------- | test | result size | generic | specialized | | ------------- | ----------- | ------------------ | ------------------ | | `tags-some` | 30 notes | 1.0112478712 | 0.0066033426 | | `tags-every=` | 3168 notes | 1.0059819176 | 0.5709392964999999 | | `links-some` | 1657 notes | 1.0462236128999999 | 0.4248580532 | | `links-every` | 92 notes | 1.0204833089 | 0.0545313596 | Comparison of `vulpea-db-query` ------------------------------- | test | result size | regular | view table | ratio | | ------------- | ----------- | ------------------ | ------------------ | --------- | | `tags-some` | 30 notes | 4.6693460650999995 | 1.0112478712 | 4.6174100 | | `tags-every` | 3168 notes | 4.7333844436999996 | 1.0059819176 | 4.7052381 | | `links-some` | 1657 notes | 4.8095771283 | 1.0462236128999999 | 4.5970833 | | `links-every` | 92 notes | 4.5517473337999995 | 1.0204833089 | 4.4603839 | Since `vulpea-db-query` loads everything into memory, no surprise that improvement is pretty much the same across test scenarios. The good part here is that on average new implementation performs 4.595 times faster. And since it stays around 1 second for 9554 notes it makes it quite practical. Comparison of specialized queries --------------------------------- | test | result size | regular | view table | ratio | | ------------- | ----------- | ------------ | ------------------ | --------- | | `tags-some` | 30 notes | 0.0605194177 | 0.0066033426 | 9.1649671 | | `tags-every` | 3168 notes | 1.0618131127 | 0.5709392964999999 | 1.8597653 | | `links-some` | 1657 notes | 1.362214091 | 0.4248580532 | 3.2062805 | | `links-every` | 92 notes | 0.1707312557 | 0.0545313596 | 3.1308821 | Since specialized query performance depends on result size (e.g. how many notes it needs to load into memory) the improvement is not stable. But even for big slices it outperforms previous implementation. On average, new implementation performs 4.34 times faster. Not really useful metric in this case, but still. Comparison of db sync --------------------- Now the real question is how synchronisation is affected. We have the following test scenarios. 1. Sync one small sized note. 2. Sync one medium sized note. 3. Sync one big sized note. 4. Sync vulpea-test-notes (e.g. 9554 relatively small sized notes). | test | regular | view table | diff | ratio | | ------------------- | ------------------ | -------------- | ------------ | --------- | | `vulpea-test-notes` | 172.79389154999998 | 337.61603822 | 164.82215 | 1.9538656 | | `small` | 0.000354079889 | 0.000416262194 | 6.2182305e-5 | 1.1756166 | | `medium` | 0.000492199416 | 0.000539389997 | 4.7190581e-5 | 1.0958770 | | `huge` | 0.1732851848 | 0.2240508243 | 0.050765640 | 1.2929601 | As you can see, it affects dramatically full rebuild, but when it comes to singular updates, the difference is pretty small. Conclusion ---------- View table improves query performance dramatically. Since most of the time I am using query functions and reading notes, this small difference on syncing small to medium files is not critical for release of view tables. Though full rebuild penalty is quite big and should improved in the future iterations. P. S. I am leaving the full sync test case disabled because I don't want to add 8.5 minutes to the tests. If needed, just replace xit with it.
This reverts commit 5f5f2c8.
It takes around 8.5 minutes to run this particular benchmark and running it every time provides little gain.
2d3b6ce
to
b7dada5
Compare
I will merge it today closer to evening. |
Wow, these performance differences are so amazing! That must be added to Do I understand correctly that this view table also contains the links? |
I would love to work on that actually. First I wanted to investigate this idea for some time to see if it works. And then ask @jethrokuan if he is willing to accept something like this. But since this conversation started to happen, then... @jethrokuan, what do you think? :)
Yes. Basically it started from #106. Plus I wanted to provide similar functionality (e.g. to search for links with |
BTW, inclusion to |
Release plan:
After some time (e.g. maybe one week or more), I release new version of
vulpea
.I am still not happy with performance query functions on a set of 9k+
notes. Partially, the culprit here is that horrific SQL. The more
tables I want to multiply, the worse performance becomes.
So idea here is simple. Instead of doing multiplication of matrices
during query operation, use a view table where every row is a full
vulpea-note
(in reality multiple notes because of aliases, butthat's irrelevant here). This new view table is used in the following
functions:
vulpea-db-query
vulpea-db-query-by-ids
vulpea-db-get-by-id
This transitively affects all specialized queries.
The way view table is being built is not optimal as it parses the
buffer for the second time in addition to
org-roam
routine. Ideallythis feature should be proposed to
org-roam
thus improving buildtime.
Benchamarks
Count of notes: 9554.
General benchmark with regular
org-roam
tablestags-some
tags-every
links-some
links-every
General benchmark with view table
tags-some
tags-every=
links-some
links-every
Comparison of
vulpea-db-query
tags-some
tags-every
links-some
links-every
Since
vulpea-db-query
loads everything into memory, no surprise thatimprovement is pretty much the same across test scenarios. The good
part here is that on average new implementation performs 4.595 times
faster. And since it stays around 1 second for 9554 notes it makes it
quite practical.
Comparison of specialized queries
tags-some
tags-every
links-some
links-every
Since specialized query performance depends on result size (e.g. how
many notes it needs to load into memory) the improvement is not
stable. But even for big slices it outperforms previous
implementation. On average, new implementation performs 4.34 times
faster. Not really useful metric in this case, but still.
Comparison of db sync
Now the real question is how synchronisation is affected. We have the
following test scenarios.
vulpea-test-notes
small
medium
huge
As you can see, it affects dramatically full rebuild, but when it
comes to singular updates, the difference is pretty small.
Conclusion
View table improves query performance dramatically. Since most of the
time I am using query functions and reading notes, this small
difference on syncing small to medium files is not critical for
release of view tables. Though full rebuild penalty is quite big and
should improved in the future iterations.
P. S. I am leaving the full sync test case disabled because I don't
want to add 8.5 minutes to the tests. If needed, just replace xit with
it.