Skip updating certain secondary indexes during replay #683

abitmore · 2018-02-20T08:33:45Z

Some (if not all) secondary indexes can be generated from current chain state only, no need to be continuously updated during replay.

jmjatlanta · 2018-03-19T08:34:33Z

Note: I am building requirements. I am not claiming this issue. Please comment on this post, and I'll update it with your changes:

According to libraries/chain/db_init.cpp, The account_index has two secondary indexes. account_member_index and account_referrer_index. The proposal_index has 1 secondary index, required_approval_index.

The grouped_orders plugin also adds a secondary index to limit_order_index called limit_order_group_index.

These indexes are kept updated during the replay process. But it is not necessary, as these indexes are only used when the current chain state is up to date. Therefore, delaying the updating of these indexes will increase replay performance.

It has yet to be proven that all 4 of these indexes are not used during the replay process. Each index should be examined to verify it is not used while the replay process is in action. Only if it is not used should updates to it be skipped during the replay process.

It appears the majority of the replay process is contained in libraries/chain/db_management.cpp. The process should be modified to:

Be aware that the replay process is in progress.
Skip updating the secondary indexes that are not used during the replay process.
At the end of the replay process, build these secondary indexes.

Edit: Added limit_order_group_index, added the fact that we must verify that each index is not used within the replay process.

abitmore · 2018-03-19T10:06:10Z

I didn't check how many secondary indexes are there in the code. However, in grouped_orders plugin I did add one more.

We do need to make sure that they're not used in replay.

pmconrad · 2018-03-19T14:16:12Z

If possible, make a test run without secondary indexes first, to see how big the savings would be.

jmjatlanta · 2018-03-19T19:13:00Z

If possible, make a test run without secondary indexes first, to see how big the savings would be.

@pmconrad I will attempt to.

We do need to make sure that they're not used in replay.

@abitmore Do you have suggestions of what should happen if they are? I'm thinking of the scenario of a replay is running, and a client connects and makes an API call that requires a secondary index. Can that happen (I think clients can connect during replay, but unsure)? If so, should it block until complete? Should it return an error?

abitmore · 2018-03-19T19:29:25Z

Clients can't connect during replay. To do a simple test, we can remove related code from db_init.cpp, then try a replay, compare the time elapsed to the result when running old code. If they're needed, the replay should fail.

jmjatlanta · 2018-03-19T19:31:01Z

Clients can't connect during replay.

Awesome. Here are my numbers:
Started witness node as:
witness_node --data-dir data/my_datadir --replay --rpc-endpoint "127.0.0.1:8090" --max-ops-per-account 1000 --partial-operations true

After 2 tries with secondary indexes, 3154084 blocks, avg(241.1075) secs with a difference of less than 0.5 secs

After 2 tries without secondary indexes, 3154084 blocks, avg(234.9055) secs with a difference of less than 1.19 secs.

Between with and without, a difference of 2.6%. So replaying 3154084 blocks from the genesis until the first of February, 2016 costs an extra 6 seconds.

Note: As indexes grow, insertion times can be longer (although usually not linearly, heavily dependent on implementation). So interpolating based on number of current total blocks may not be accurate.

Therefore, I tested again with a larger number of blocks (see further down):

here are the details with a small number of blocks:
try 1 (with secondary indexes):
99.5535% 3140000 of 3154084
2188164ms th_a db_management.cpp:78 reindex ] Writing database to disk at block 3144084
2188380ms th_a db_management.cpp:80 reindex ] Done
99.8705% 3150000 of 3154084
2189139ms th_a db_management.cpp:122 reindex ] Done reindexing, elapsed time: 240.82684299999999666 sec
Try 2 (with secondary indexes):
99.5535% 3140000 of 3154084
2540865ms th_a db_management.cpp:78 reindex ] Writing database to disk at block 3144084
2541077ms th_a db_management.cpp:80 reindex ] Done
99.8705% 3150000 of 3154084
2541826ms th_a db_management.cpp:122 reindex ] Done reindexing, elapsed time: 241.33825699999999870 sec
Try 3 (without secondary indexes):
99.5535% 3140000 of 3154084
3540592ms th_a db_management.cpp:78 reindex ] Writing database to disk at block 3144084
3540819ms th_a db_management.cpp:80 reindex ] Done
99.8705% 3150000 of 3154084
3541556ms th_a db_management.cpp:122 reindex ] Done reindexing, elapsed time: 235.50080600000001141 sec
try 4 (without secondary indexes):
99.5535% 3140000 of 3154084
225939ms th_a db_management.cpp:78 reindex ] Writing database to disk at block 3144084
226161ms th_a db_management.cpp:80 reindex ] Done
99.8705% 3150000 of 3154084
226901ms th_a db_management.cpp:122 reindex ] Done reindexing, elapsed time: 234.31027000000000271 sec

Try 1 with secondary indexes and a larger number of blocks (25380257 blocks):
99.999% 25380000 of 25380257
1115270ms th_a db_management.cpp:122 reindex ] Done reindexing, elapsed time: 6762.67697399999997288 sec
Try 2 without seondary indexes and a larger number of blocks (25380257 blocks):
99.999% 25380000 of 25380257
1144517ms th_a db_management.cpp:122 reindex ] Done reindexing, elapsed time: 6601.85906400000021677 sec

A difference of 160.8179 seconds, which is 2.38%.

abitmore · 2018-03-19T21:14:51Z

Just found that I have some statistics about replay here: bitshares/bitshares-fc#20 (comment).

Replay time with and without grouped_orders plugin (which has a secondary index) is 5951 seconds vs 5603 seconds, the difference is 5%.

jmjatlanta · 2018-03-21T14:31:42Z

My interpretation of the tests above:

Running with 3 secondary indexes increases the replay time by 2 to 3 percent. At the current blockchain size, running on my machine, it was demonstrated to add an extra 2 minutes and 40 seconds to a 110 minute process.
Running without the secondary indexes will require an additional step at the end to generate those indexes (not tested, so I am unsure how long that will take).
Moving some indexes to a plugin (as was done in the grouped_orders plugin, and as suggested in issue Move account_member_index to a plugin #682) is another way to mitigate the performance issue for some end-users.

With respect to the results above, I look forward to your comments, questions, and advice on how to proceed.

pmconrad · 2018-03-21T17:00:25Z

IMO 2-3% performance gain to not justify the risks associated with getting it wrong.

abitmore · 2019-09-13T22:33:41Z

Fixed by #1918.

abitmore added the performance label Feb 20, 2018

abitmore added this to the Future Non-Consensus-Changing Release milestone Feb 20, 2018

abitmore added the low priority label Mar 21, 2018

abitmore mentioned this issue Mar 22, 2018

Move account_member_index to a plugin #682

Closed

pmconrad mentioned this issue Aug 15, 2019

Skip secondary indexes during replay #1918

Merged

abitmore modified the milestones: Future Feature Release, 4.1.0 - Feature Release Aug 15, 2019

abitmore closed this as completed Sep 13, 2019

pmconrad modified the milestones: 4.1.0 - Feature Release, 4.0.0 - Protocol Upgrade Release Oct 15, 2019

MichelSantos mentioned this issue Oct 18, 2019

Release Notes: BitShares Core 4.0.0 #2026

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip updating certain secondary indexes during replay #683

Skip updating certain secondary indexes during replay #683

abitmore commented Feb 20, 2018

jmjatlanta commented Mar 19, 2018 •

edited

Loading

abitmore commented Mar 19, 2018

pmconrad commented Mar 19, 2018

jmjatlanta commented Mar 19, 2018 •

edited

Loading

abitmore commented Mar 19, 2018

jmjatlanta commented Mar 19, 2018 •

edited

Loading

abitmore commented Mar 19, 2018

jmjatlanta commented Mar 21, 2018 •

edited

Loading

pmconrad commented Mar 21, 2018

abitmore commented Sep 13, 2019

Skip updating certain secondary indexes during replay #683

Skip updating certain secondary indexes during replay #683

Comments

abitmore commented Feb 20, 2018

jmjatlanta commented Mar 19, 2018 • edited Loading

abitmore commented Mar 19, 2018

pmconrad commented Mar 19, 2018

jmjatlanta commented Mar 19, 2018 • edited Loading

abitmore commented Mar 19, 2018

jmjatlanta commented Mar 19, 2018 • edited Loading

abitmore commented Mar 19, 2018

jmjatlanta commented Mar 21, 2018 • edited Loading

pmconrad commented Mar 21, 2018

abitmore commented Sep 13, 2019

jmjatlanta commented Mar 19, 2018 •

edited

Loading

jmjatlanta commented Mar 19, 2018 •

edited

Loading

jmjatlanta commented Mar 19, 2018 •

edited

Loading

jmjatlanta commented Mar 21, 2018 •

edited

Loading