Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add network column to indices #1174

Merged
merged 4 commits into from
Feb 19, 2024
Merged

Conversation

morph-dev
Copy link
Collaborator

What was wrong?

Some of our queries were very slow because our db indices weren't setup correctly.
For example, the TOTAL_DATA_SIZE_QUERY_DB query took 20-25 seconds on 4GB database.

How was it fixed?

I added network column to the indices. Based on Kolby's measurements on the same db, we noticed following improvements:

query before (sec) after (sec)
TOTAL_DATA_SIZE_QUERY_DB 23.087 0.008
XOR_FIND_FARTHEST_QUERY_NETWORK 6.154 0.017

To-Do

@morph-dev morph-dev self-assigned this Feb 16, 2024
Copy link
Member

@KolbyML KolbyML left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing find 🚀.

@KolbyML
Copy link
Member

KolbyML commented Feb 16, 2024

Looking at our sql queries should we keep the index for content_id_long?

@morph-dev
Copy link
Collaborator Author

I looked more online and again into our queries and made few changes.

Looking at our sql queries should we keep the index for content_id_long?

I confirmed both by manual inspection and by this post that index is not needed for the primary key or unique column (and it might hurt) because they are automatically created.

So I also dropped the index for primary key from lc_update table.

I also figured out that:

  • content_id_short_idx_2 should include content_id_long
    • alternative is to put LIMIT 1 to the XOR_FIND_FARTHEST_QUERY_NETWORK query. Currently we only read first item, but potentially we could read more and delete more in a batch in which case the index is useful
  • the (network, content_id_long_idx) was not needed. Instead I simplified the TOTAL_ENTRY_COUNT_QUERY_NETWORK query so it can rely on network_idx
  • the we needed index for content_key because of the PAGINATE_QUERY_DB query (this one shouldn't have network column)

Please take another look

@morph-dev morph-dev requested a review from KolbyML February 17, 2024 08:31
Copy link
Collaborator

@njgheorghita njgheorghita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

trin-storage/src/sql.rs Outdated Show resolved Hide resolved
@morph-dev morph-dev merged commit ebc372a into ethereum:master Feb 19, 2024
8 checks passed
@morph-dev morph-dev deleted the db_indices branch February 19, 2024 08:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants