Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feat) use view table for query operations #116

Merged
merged 12 commits into from
Nov 15, 2021

Conversation

d12frosted
Copy link
Owner

@d12frosted d12frosted commented Nov 12, 2021

Release plan:

  1. Finish benchmarks.
  2. Document this change.
  3. Use it privately for some time.
  4. If everything is fine, merge to master.

After some time (e.g. maybe one week or more), I release new version of vulpea.


I am still not happy with performance query functions on a set of 9k+
notes. Partially, the culprit here is that horrific SQL. The more
tables I want to multiply, the worse performance becomes.

So idea here is simple. Instead of doing multiplication of matrices
during query operation, use a view table where every row is a full
vulpea-note (in reality multiple notes because of aliases, but
that's irrelevant here). This new view table is used in the following
functions:

  • vulpea-db-query
  • vulpea-db-query-by-ids
  • vulpea-db-get-by-id

This transitively affects all specialized queries.

The way view table is being built is not optimal as it parses the
buffer for the second time in addition to org-roam routine. Ideally
this feature should be proposed to org-roam thus improving build
time.

Benchamarks

Count of notes: 9554.

General benchmark with regular org-roam tables

test result size generic specialized
tags-some 30 notes 4.6693460650999995 0.0605194177
tags-every 3168 notes 4.7333844436999996 1.0618131127
links-some 1657 notes 4.8095771283 1.362214091
links-every 92 notes 4.5517473337999995 0.1707312557

General benchmark with view table

test result size generic specialized
tags-some 30 notes 1.0112478712 0.0066033426
tags-every= 3168 notes 1.0059819176 0.5709392964999999
links-some 1657 notes 1.0462236128999999 0.4248580532
links-every 92 notes 1.0204833089 0.0545313596

Comparison of vulpea-db-query

test result size regular view table ratio
tags-some 30 notes 4.6693460650999995 1.0112478712 4.6174100
tags-every 3168 notes 4.7333844436999996 1.0059819176 4.7052381
links-some 1657 notes 4.8095771283 1.0462236128999999 4.5970833
links-every 92 notes 4.5517473337999995 1.0204833089 4.4603839

Since vulpea-db-query loads everything into memory, no surprise that
improvement is pretty much the same across test scenarios. The good
part here is that on average new implementation performs 4.595 times
faster. And since it stays around 1 second for 9554 notes it makes it
quite practical.

Comparison of specialized queries

test result size regular view table ratio
tags-some 30 notes 0.0605194177 0.0066033426 9.1649671
tags-every 3168 notes 1.0618131127 0.5709392964999999 1.8597653
links-some 1657 notes 1.362214091 0.4248580532 3.2062805
links-every 92 notes 0.1707312557 0.0545313596 3.1308821

Since specialized query performance depends on result size (e.g. how
many notes it needs to load into memory) the improvement is not
stable. But even for big slices it outperforms previous
implementation. On average, new implementation performs 4.34 times
faster. Not really useful metric in this case, but still.

Comparison of db sync

Now the real question is how synchronisation is affected. We have the
following test scenarios.

  1. Sync one small sized note.
  2. Sync one medium sized note.
  3. Sync one big sized note.
  4. Sync vulpea-test-notes (e.g. 9554 relatively small sized notes).
test regular view table diff ratio
vulpea-test-notes 172.79389154999998 337.61603822 164.82215 1.9538656
small 0.000354079889 0.000416262194 6.2182305e-5 1.1756166
medium 0.000492199416 0.000539389997 4.7190581e-5 1.0958770
huge 0.1732851848 0.2240508243 0.050765640 1.2929601

As you can see, it affects dramatically full rebuild, but when it
comes to singular updates, the difference is pretty small.

Conclusion

View table improves query performance dramatically. Since most of the
time I am using query functions and reading notes, this small
difference on syncing small to medium files is not critical for
release of view tables. Though full rebuild penalty is quite big and
should improved in the future iterations.

P. S. I am leaving the full sync test case disabled because I don't
want to add 8.5 minutes to the tests. If needed, just replace xit with
it.

@d12frosted d12frosted force-pushed the feature/vulpea-view-table branch 6 times, most recently from 8b90829 to 3cb6b66 Compare November 12, 2021 16:33
@d12frosted
Copy link
Owner Author

d12frosted commented Nov 12, 2021

update: moved release plan to the first message

d12frosted added a commit that referenced this pull request Nov 14, 2021
They will be enabled back with #116
@d12frosted
Copy link
Owner Author

I am already using this branch and must say... it feels so snappy! Absolutely love it!

d12frosted added a commit that referenced this pull request Nov 14, 2021
They will be enabled back with #116
@d12frosted d12frosted force-pushed the feature/vulpea-view-table branch 3 times, most recently from b0089d5 to 2d3b6ce Compare November 14, 2021 16:05
I am still not happy with performance query functions on a set of 9k+
notes. Partially, the culprit here is that horrific SQL. The more
tables I want to multiply, the worse performance becomes.

So idea here is simple. Instead of doing multiplication of matrices
during query operation, use a view table where every row is a full
`vulpea-note` (in reality multiple notes because of aliases, but
that's irrelevant here). This new view table is used in the following
functions:

- `vulpea-db-query`
- `vulpea-db-query-by-ids`
- `vulpea-db-get-by-id`

This transitively affects all specialized queries.

The way view table is being built is not optimal as it parses the
buffer for the second time in addition to `org-roam` routine. Ideally
this feature should be proposed to `org-roam` thus improving build
time.

Benchamarks
===========

Count of notes: 9554.

General benchmark with regular `org-roam` tables
------------------------------------------------

| test          | result size |            generic |  specialized |
| ------------- | ----------- | ------------------ | ------------ |
| `tags-some`   | 30 notes    | 4.6693460650999995 | 0.0605194177 |
| `tags-every`  | 3168 notes  | 4.7333844436999996 | 1.0618131127 |
| `links-some`  | 1657 notes  |       4.8095771283 |  1.362214091 |
| `links-every` | 92 notes    | 4.5517473337999995 | 0.1707312557 |

General benchmark with view table
---------------------------------

| test          | result size |            generic |        specialized |
| ------------- | ----------- | ------------------ | ------------------ |
| `tags-some`   | 30 notes    |       1.0112478712 |       0.0066033426 |
| `tags-every=` | 3168 notes  |       1.0059819176 | 0.5709392964999999 |
| `links-some`  | 1657 notes  | 1.0462236128999999 |       0.4248580532 |
| `links-every` | 92 notes    |       1.0204833089 |       0.0545313596 |

Comparison of `vulpea-db-query`
-------------------------------

| test          | result size |            regular |         view table |     ratio |
| ------------- | ----------- | ------------------ | ------------------ | --------- |
| `tags-some`   | 30 notes    | 4.6693460650999995 |       1.0112478712 | 4.6174100 |
| `tags-every`  | 3168 notes  | 4.7333844436999996 |       1.0059819176 | 4.7052381 |
| `links-some`  | 1657 notes  |       4.8095771283 | 1.0462236128999999 | 4.5970833 |
| `links-every` | 92 notes    | 4.5517473337999995 |       1.0204833089 | 4.4603839 |

Since `vulpea-db-query` loads everything into memory, no surprise that
improvement is pretty much the same across test scenarios. The good
part here is that on average new implementation performs 4.595 times
faster. And since it stays around 1 second for 9554 notes it makes it
quite practical.

Comparison of specialized queries
---------------------------------

| test          | result size |      regular |         view table |     ratio |
| ------------- | ----------- | ------------ | ------------------ | --------- |
| `tags-some`   | 30 notes    | 0.0605194177 |       0.0066033426 | 9.1649671 |
| `tags-every`  | 3168 notes  | 1.0618131127 | 0.5709392964999999 | 1.8597653 |
| `links-some`  | 1657 notes  |  1.362214091 |       0.4248580532 | 3.2062805 |
| `links-every` | 92 notes    | 0.1707312557 |       0.0545313596 | 3.1308821 |

Since specialized query performance depends on result size (e.g. how
many notes it needs to load into memory) the improvement is not
stable. But even for big slices it outperforms previous
implementation. On average, new implementation performs 4.34 times
faster. Not really useful metric in this case, but still.

Comparison of db sync
---------------------

Now the real question is how synchronisation is affected. We have the
following test scenarios.

1. Sync one small sized note.
2. Sync one medium sized note.
3. Sync one big sized note.
4. Sync vulpea-test-notes (e.g. 9554 relatively small sized notes).

| test                |            regular |     view table |         diff |     ratio |
| ------------------- | ------------------ | -------------- | ------------ | --------- |
| `vulpea-test-notes` | 172.79389154999998 |   337.61603822 |    164.82215 | 1.9538656 |
| `small`             |     0.000354079889 | 0.000416262194 | 6.2182305e-5 | 1.1756166 |
| `medium`            |     0.000492199416 | 0.000539389997 | 4.7190581e-5 | 1.0958770 |
| `huge`              |       0.1732851848 |   0.2240508243 |  0.050765640 | 1.2929601 |

As you can see, it affects dramatically full rebuild, but when it
comes to singular updates, the difference is pretty small.

Conclusion
----------

View table improves query performance dramatically. Since most of the
time I am using query functions and reading notes, this small
difference on syncing small to medium files is not critical for
release of view tables. Though full rebuild penalty is quite big and
should improved in the future iterations.

P. S. I am leaving the full sync test case disabled because I don't
want to add 8.5 minutes to the tests. If needed, just replace xit with
it.
It takes around 8.5 minutes to run this particular benchmark and
running it every time provides little gain.
@d12frosted
Copy link
Owner Author

I will merge it today closer to evening.

@d12frosted d12frosted closed this Nov 15, 2021
@d12frosted d12frosted reopened this Nov 15, 2021
@d12frosted d12frosted merged commit a743b7a into master Nov 15, 2021
@d12frosted d12frosted deleted the feature/vulpea-view-table branch November 15, 2021 14:33
@publicimageltd
Copy link

Wow, these performance differences are so amazing! That must be added to org-roam, please! (And if I have an extra wish, make it accessible with an API so that Delve does not have to replicate the queries...)

Do I understand correctly that this view table also contains the links?

@d12frosted
Copy link
Owner Author

That must be added to org-roam, please!

I would love to work on that actually. First I wanted to investigate this idea for some time to see if it works. And then ask @jethrokuan if he is willing to accept something like this. But since this conversation started to happen, then... @jethrokuan, what do you think? :)

Do I understand correctly that this view table also contains the links?

Yes. Basically it started from #106. Plus I wanted to provide similar functionality (e.g. to search for links with N depth) for https://github.com/d12frosted/vino library. For now you can use vulpea-db-query-by-links-some and vulpea-db-query-by-links-every - it allows to search for depth = 1. I want to spend some time to optimize more deep search.

@d12frosted
Copy link
Owner Author

BTW, inclusion to org-roam would easily solve write performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants