(feat) use view table for query operations #116

d12frosted · 2021-11-12T07:01:19Z

Release plan:

Finish benchmarks.
Document this change.
Use it privately for some time.
If everything is fine, merge to master.

After some time (e.g. maybe one week or more), I release new version of vulpea.

I am still not happy with performance query functions on a set of 9k+
notes. Partially, the culprit here is that horrific SQL. The more
tables I want to multiply, the worse performance becomes.

So idea here is simple. Instead of doing multiplication of matrices
during query operation, use a view table where every row is a full
vulpea-note (in reality multiple notes because of aliases, but
that's irrelevant here). This new view table is used in the following
functions:

vulpea-db-query
vulpea-db-query-by-ids
vulpea-db-get-by-id

This transitively affects all specialized queries.

The way view table is being built is not optimal as it parses the
buffer for the second time in addition to org-roam routine. Ideally
this feature should be proposed to org-roam thus improving build
time.

Benchamarks

Count of notes: 9554.

General benchmark with regular `org-roam` tables

test	result size	generic	specialized
`tags-some`	30 notes	4.6693460650999995	0.0605194177
`tags-every`	3168 notes	4.7333844436999996	1.0618131127
`links-some`	1657 notes	4.8095771283	1.362214091
`links-every`	92 notes	4.5517473337999995	0.1707312557

General benchmark with view table

test	result size	generic	specialized
`tags-some`	30 notes	1.0112478712	0.0066033426
`tags-every=`	3168 notes	1.0059819176	0.5709392964999999
`links-some`	1657 notes	1.0462236128999999	0.4248580532
`links-every`	92 notes	1.0204833089	0.0545313596

Comparison of `vulpea-db-query`

test	result size	regular	view table	ratio
`tags-some`	30 notes	4.6693460650999995	1.0112478712	4.6174100
`tags-every`	3168 notes	4.7333844436999996	1.0059819176	4.7052381
`links-some`	1657 notes	4.8095771283	1.0462236128999999	4.5970833
`links-every`	92 notes	4.5517473337999995	1.0204833089	4.4603839

Since vulpea-db-query loads everything into memory, no surprise that
improvement is pretty much the same across test scenarios. The good
part here is that on average new implementation performs 4.595 times
faster. And since it stays around 1 second for 9554 notes it makes it
quite practical.

Comparison of specialized queries

test	result size	regular	view table	ratio
`tags-some`	30 notes	0.0605194177	0.0066033426	9.1649671
`tags-every`	3168 notes	1.0618131127	0.5709392964999999	1.8597653
`links-some`	1657 notes	1.362214091	0.4248580532	3.2062805
`links-every`	92 notes	0.1707312557	0.0545313596	3.1308821

Since specialized query performance depends on result size (e.g. how
many notes it needs to load into memory) the improvement is not
stable. But even for big slices it outperforms previous
implementation. On average, new implementation performs 4.34 times
faster. Not really useful metric in this case, but still.

Comparison of db sync

Now the real question is how synchronisation is affected. We have the
following test scenarios.

Sync one small sized note.
Sync one medium sized note.
Sync one big sized note.
Sync vulpea-test-notes (e.g. 9554 relatively small sized notes).

test	regular	view table	diff	ratio
`vulpea-test-notes`	172.79389154999998	337.61603822	164.82215	1.9538656
`small`	0.000354079889	0.000416262194	6.2182305e-5	1.1756166
`medium`	0.000492199416	0.000539389997	4.7190581e-5	1.0958770
`huge`	0.1732851848	0.2240508243	0.050765640	1.2929601

As you can see, it affects dramatically full rebuild, but when it
comes to singular updates, the difference is pretty small.

Conclusion

View table improves query performance dramatically. Since most of the
time I am using query functions and reading notes, this small
difference on syncing small to medium files is not critical for
release of view tables. Though full rebuild penalty is quite big and
should improved in the future iterations.

P. S. I am leaving the full sync test case disabled because I don't
want to add 8.5 minutes to the tests. If needed, just replace xit with
it.

d12frosted · 2021-11-12T16:36:16Z

update: moved release plan to the first message

They will be enabled back with #116

d12frosted · 2021-11-14T09:01:41Z

I am already using this branch and must say... it feels so snappy! Absolutely love it!

They will be enabled back with #116

I am still not happy with performance query functions on a set of 9k+ notes. Partially, the culprit here is that horrific SQL. The more tables I want to multiply, the worse performance becomes. So idea here is simple. Instead of doing multiplication of matrices during query operation, use a view table where every row is a full `vulpea-note` (in reality multiple notes because of aliases, but that's irrelevant here). This new view table is used in the following functions: - `vulpea-db-query` - `vulpea-db-query-by-ids` - `vulpea-db-get-by-id` This transitively affects all specialized queries. The way view table is being built is not optimal as it parses the buffer for the second time in addition to `org-roam` routine. Ideally this feature should be proposed to `org-roam` thus improving build time. Benchamarks =========== Count of notes: 9554. General benchmark with regular `org-roam` tables ------------------------------------------------ | test | result size | generic | specialized | | ------------- | ----------- | ------------------ | ------------ | | `tags-some` | 30 notes | 4.6693460650999995 | 0.0605194177 | | `tags-every` | 3168 notes | 4.7333844436999996 | 1.0618131127 | | `links-some` | 1657 notes | 4.8095771283 | 1.362214091 | | `links-every` | 92 notes | 4.5517473337999995 | 0.1707312557 | General benchmark with view table --------------------------------- | test | result size | generic | specialized | | ------------- | ----------- | ------------------ | ------------------ | | `tags-some` | 30 notes | 1.0112478712 | 0.0066033426 | | `tags-every=` | 3168 notes | 1.0059819176 | 0.5709392964999999 | | `links-some` | 1657 notes | 1.0462236128999999 | 0.4248580532 | | `links-every` | 92 notes | 1.0204833089 | 0.0545313596 | Comparison of `vulpea-db-query` ------------------------------- | test | result size | regular | view table | ratio | | ------------- | ----------- | ------------------ | ------------------ | --------- | | `tags-some` | 30 notes | 4.6693460650999995 | 1.0112478712 | 4.6174100 | | `tags-every` | 3168 notes | 4.7333844436999996 | 1.0059819176 | 4.7052381 | | `links-some` | 1657 notes | 4.8095771283 | 1.0462236128999999 | 4.5970833 | | `links-every` | 92 notes | 4.5517473337999995 | 1.0204833089 | 4.4603839 | Since `vulpea-db-query` loads everything into memory, no surprise that improvement is pretty much the same across test scenarios. The good part here is that on average new implementation performs 4.595 times faster. And since it stays around 1 second for 9554 notes it makes it quite practical. Comparison of specialized queries --------------------------------- | test | result size | regular | view table | ratio | | ------------- | ----------- | ------------ | ------------------ | --------- | | `tags-some` | 30 notes | 0.0605194177 | 0.0066033426 | 9.1649671 | | `tags-every` | 3168 notes | 1.0618131127 | 0.5709392964999999 | 1.8597653 | | `links-some` | 1657 notes | 1.362214091 | 0.4248580532 | 3.2062805 | | `links-every` | 92 notes | 0.1707312557 | 0.0545313596 | 3.1308821 | Since specialized query performance depends on result size (e.g. how many notes it needs to load into memory) the improvement is not stable. But even for big slices it outperforms previous implementation. On average, new implementation performs 4.34 times faster. Not really useful metric in this case, but still. Comparison of db sync --------------------- Now the real question is how synchronisation is affected. We have the following test scenarios. 1. Sync one small sized note. 2. Sync one medium sized note. 3. Sync one big sized note. 4. Sync vulpea-test-notes (e.g. 9554 relatively small sized notes). | test | regular | view table | diff | ratio | | ------------------- | ------------------ | -------------- | ------------ | --------- | | `vulpea-test-notes` | 172.79389154999998 | 337.61603822 | 164.82215 | 1.9538656 | | `small` | 0.000354079889 | 0.000416262194 | 6.2182305e-5 | 1.1756166 | | `medium` | 0.000492199416 | 0.000539389997 | 4.7190581e-5 | 1.0958770 | | `huge` | 0.1732851848 | 0.2240508243 | 0.050765640 | 1.2929601 | As you can see, it affects dramatically full rebuild, but when it comes to singular updates, the difference is pretty small. Conclusion ---------- View table improves query performance dramatically. Since most of the time I am using query functions and reading notes, this small difference on syncing small to medium files is not critical for release of view tables. Though full rebuild penalty is quite big and should improved in the future iterations. P. S. I am leaving the full sync test case disabled because I don't want to add 8.5 minutes to the tests. If needed, just replace xit with it.

This reverts commit 5f5f2c8.

It takes around 8.5 minutes to run this particular benchmark and running it every time provides little gain.

d12frosted · 2021-11-15T07:37:20Z

I will merge it today closer to evening.

publicimageltd · 2021-11-17T08:34:33Z

Wow, these performance differences are so amazing! That must be added to org-roam, please! (And if I have an extra wish, make it accessible with an API so that Delve does not have to replicate the queries...)

Do I understand correctly that this view table also contains the links?

d12frosted · 2021-11-17T08:46:02Z

That must be added to org-roam, please!

I would love to work on that actually. First I wanted to investigate this idea for some time to see if it works. And then ask @jethrokuan if he is willing to accept something like this. But since this conversation started to happen, then... @jethrokuan, what do you think? :)

Do I understand correctly that this view table also contains the links?

Yes. Basically it started from #106. Plus I wanted to provide similar functionality (e.g. to search for links with N depth) for https://github.com/d12frosted/vino library. For now you can use vulpea-db-query-by-links-some and vulpea-db-query-by-links-every - it allows to search for depth = 1. I want to spend some time to optimize more deep search.

d12frosted · 2021-11-17T08:46:35Z

BTW, inclusion to org-roam would easily solve write performance.

d12frosted force-pushed the feature/vulpea-view-table branch 6 times, most recently from 8b90829 to 3cb6b66 Compare November 12, 2021 16:33

d12frosted force-pushed the feature/vulpea-view-table branch from 3cb6b66 to 32ad048 Compare November 12, 2021 16:36

d12frosted added a commit that referenced this pull request Nov 14, 2021

(dev) temporarily disable performance tests

ce25964

They will be enabled back with #116

d12frosted added a commit that referenced this pull request Nov 14, 2021

(dev) temporarily disable performance tests

5f5f2c8

They will be enabled back with #116

d12frosted force-pushed the feature/vulpea-view-table branch 3 times, most recently from b0089d5 to 2d3b6ce Compare November 14, 2021 16:05

d12frosted added 10 commits November 14, 2021 18:27

(dev) use emacsql prepared statements to insert links

be6e499

(dev) revert "temporarily disable performance tests"

d4da50a

This reverts commit 5f5f2c8.

(dev) update performance tests for latest org-roam and test notes

31d1eac

(dev) remove org-roam-v2-ack

209a333

(dev) benchmark full vulpea-test-notes synchronisation

c801b48

(dev) benchmark synchronisation of single file

a3473b0

(dev) disable full test notes sync performance benchmark

9bad196

It takes around 8.5 minutes to run this particular benchmark and running it every time provides little gain.

(dev) align file approach with generated notes

27fcf07

(dev) remove data from notes table when file is removed

b7dada5

d12frosted force-pushed the feature/vulpea-view-table branch from 2d3b6ce to b7dada5 Compare November 14, 2021 17:32

d12frosted added 2 commits November 15, 2021 08:16

(doc) update readme

a74404f

(doc) update changelog

ffd96ea

d12frosted closed this Nov 15, 2021

d12frosted reopened this Nov 15, 2021

d12frosted mentioned this pull request Nov 15, 2021

prepare vulpew view table d12frosted/vulpea-test-notes#1

Merged

d12frosted merged commit a743b7a into master Nov 15, 2021

d12frosted deleted the feature/vulpea-view-table branch November 15, 2021 14:33

d12frosted mentioned this pull request Dec 5, 2021

Materialized view proposal org-roam/org-roam#1997

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(feat) use view table for query operations #116

(feat) use view table for query operations #116

d12frosted commented Nov 12, 2021 •

edited

Loading

d12frosted commented Nov 12, 2021 •

edited

Loading

d12frosted commented Nov 14, 2021

d12frosted commented Nov 15, 2021

publicimageltd commented Nov 17, 2021

d12frosted commented Nov 17, 2021

d12frosted commented Nov 17, 2021

(feat) use view table for query operations #116

(feat) use view table for query operations #116

Conversation

d12frosted commented Nov 12, 2021 • edited Loading

Benchamarks

General benchmark with regular org-roam tables

General benchmark with view table

Comparison of vulpea-db-query

Comparison of specialized queries

Comparison of db sync

Conclusion

d12frosted commented Nov 12, 2021 • edited Loading

d12frosted commented Nov 14, 2021

d12frosted commented Nov 15, 2021

publicimageltd commented Nov 17, 2021

d12frosted commented Nov 17, 2021

d12frosted commented Nov 17, 2021

d12frosted commented Nov 12, 2021 •

edited

Loading

General benchmark with regular `org-roam` tables

Comparison of `vulpea-db-query`

d12frosted commented Nov 12, 2021 •

edited

Loading