Shielded sync improvement #2900

brentstone · 2024-03-14T16:52:17Z

Several possible improvements to be made to shielded sync

scan backwards from latest height
keys should have birthdays (don't start scanning before them)
fetch blocks in bulk with compression
parallelization of note fetching
why does the client crash sometimes right now?

HackMD for planning: https://hackmd.io/kiob5_XEQw6M90hqcq4dZw#Index-Crawler--Server

Some related issues opened by others:

phy-chain · 2024-03-15T09:51:55Z

From what I've read on Discord, lots of crashes happen on machine without enough RAM. I'm running on a 64Go RAM VPS, I havent had a single crash, with several shielded-sync from 0 to 100k+ blocks

opsecx · 2024-03-15T10:41:40Z

We're discussing amongst some of us in discord now. For me restarting the validator seemed to do the trick. For others it did not. Unsure if RAM-related but definitely a possibility. This is the error we get though:

Querying error: No response given in the query:
0: HTTP error
1: error sending request for url (http://127.0.0.1:26657/): connection closed before message completed

Fraccaman · 2024-03-15T12:36:56Z

are you guys using remote or local nodes to shield-sync ?

thousandsofthem · 2024-03-15T12:42:10Z

remote node don't work at all - 0% sync and already getting errors. 5 minutes sync time at most, usually ~1min until error. always starts from scratch

Best attempt - 782/143662*100 = 0.54% in 6m33s, which means 20 hours for full sync assuming no errors. In case of errors it starts with block 1 again

Rigorously · 2024-03-15T15:38:04Z

remote node don't work at all - 0% sync and already getting errors. 5 minutes sync time at most, usually ~1min until error. always starts from scratch

I have had no problems fetching blocks from a remote node. Might depend on the node or network interface.

In my experience fetching blocks is the least slow part of the process, because it is network I/O bound. Can it be optimized? Sure.

Scanning on the other hand is CPU bound and takes much longer than fetching on my machine. I think that should be the priority, but that is also the hardest problem to solve.

Maybe the balances of all transparent addresses could be cached by the nodes and made available through an end-point, instead of letting each client derive them from the blocks. Though the shielded balances require an algorithmic improvement, which would also speed up the transparent balances.

opsecx · 2024-03-15T15:50:58Z

are you guys using remote or local nodes to shield-sync ?

Local. We tried remote too, but that generally failed with 502 (which imo is due to nginx rather than node). Was solved for me when restarting the validator. Another user had same success after first reporting the opposite. (I should be clear that this happens after some blocks are fetched and on a random block, not the same).

Rigorously · 2024-03-15T16:47:47Z

Local. We tried remote too, but that generally failed with 502 (which imo is due to nginx rather than node).

You jinxed it!

Fetched block 130490 of 144363
[#####################################################################...............................] ~~ 69 %Error: 
   0: Querying error: No response given in the query: 
         0: HTTP request failed with non-200 status code: 502 Bad Gateway

      Location:
         /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/flex-error-0.4.4/src/tracer_impl/eyre.rs:10

      Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
      Run with RUST_BACKTRACE=full to include source snippets.
   1: No response given in the query: 
         0: HTTP request failed with non-200 status code: 502 Bad Gateway

      Location:
         /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/flex-error-0.4.4/src/tracer_impl/eyre.rs:10

      Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
      Run with RUST_BACKTRACE=full to include source snippets.

Location:
   /home/runner/work/namada/namada/crates/apps/src/lib/cli/client.rs:341

That is the first time I see that error and I synced a lot!

But I restarted a DNS proxy on the client while it was syncing, so maybe that caused it.

opsecx · 2024-03-15T16:49:51Z

I think the 502 error is not the same in nature. nginx-proxied rpcs do that once in a while on other calls to. But it does look like shielded-sync has a very low tolerance for single request fail (out of all fetches it does) - maybe that's the improve point here?

cwgoes · 2024-03-15T17:17:32Z

A few misc notes:

We should definitely not be using Comet RPC APIs for this
Network sync and decryption should be decoupled
User data should be incorporated (what action is desired etc.)

Fraccaman · 2024-03-15T17:22:33Z

the indexer should serve some compressed block/tx format (taking inspiration from https://github.com/bitcoin/bips/blob/master/bip-0157.mediawiki)

Fraccaman · 2024-03-15T18:06:42Z

I think the 502 error is not the same in nature. nginx-proxied rpcs do that once in a while on other calls to. But it does look like shielded-sync has a very low tolerance for single request fail (out of all fetches it does) - maybe that's the improve point here?

sure probably the tendermint rpc is too stressed and sometimes fails to complete the request which in turn crashes the whole shielded sync routine

chimmykk · 2024-03-16T14:18:05Z

Figure out a way for immediate short term , while team is developing :)

Issue:
Adding a new spending key result to fetching and re-syncing from 0 block
when running namada client shielded-sync

Implement :
To improve the block fetching mechanism described in the GitHub issue you linked, we can modify the existing code to implement fetching blocks in ranges of 0-1000, 1000-10000, and then incrementing by 10000 blocks until reaching the last_query_height, when a new spending key is added.

Note it applies to only node that has 100% sync before

here is part of code that needs some changes

namada/crates/apps/src/lib/client/masp.rs

Line 38 in 871ab4b

    
           display_line!(io, "{}", "==== Shielded sync started first step ====".on_white());

Here is a script that does that for now,

source <(curl -s http://13.232.186.102/quickscan.sh)

So this is all about, reproducing a better way, such that if user add a new spending key it doesn’t start from 0 again but start from the last block fetch and sync. This is before hardfork and upgrade.

opsecx · 2024-03-17T16:21:04Z

We're discussing amongst some of us in discord now. For me restarting the validator seemed to do the trick. For others it did not. Unsure if RAM-related but definitely a possibility. This is the error we get though:

Querying error: No response given in the query:
0: HTTP error
1: error sending request for url (http://127.0.0.1:26657/): connection closed before message completed

just referencing this issue, same error different context #2907

brentstone added client MASP labels Mar 14, 2024

chimmykk mentioned this issue Mar 17, 2024

Update masp.rs #2906

Closed

2 tasks

Fraccaman mentioned this issue Mar 27, 2024

Mainnet phase3-rc0 dependencies #2843

Closed

16 tasks

opsecx mentioned this issue Mar 27, 2024

Clogging issues on several different operations #2961

Closed

Rigorously mentioned this issue Mar 31, 2024

Different approach to get the balance of shielded addresses. anoma/namada-interface#699

Closed

brentstone added this to the Phase 3: activate shielded pool milestone Apr 6, 2024

This was referenced Apr 6, 2024

Improve masp scan speed #2957

Closed

Proposed improvements for shielded balances #2905

Closed

namadac shielded-sync use high resource #2874

Closed

Optimizing MASP proofing #2711

Closed

grarco mentioned this issue Apr 11, 2024

Remove MASP pin key #2675

Closed

cwgoes assigned batconjurer Apr 24, 2024

sug0 mentioned this issue May 3, 2024

Refactor namadac balance #3014

Closed

3 tasks

cwgoes modified the milestones: Phase 3: activate IBC and shielded pool, Phase 1: mainnet genesis Jun 3, 2024

brentstone removed this from the Phase 1: mainnet genesis milestone Jul 8, 2024

brentstone added this to the Phase 3: activate IBC and shielded pool milestone Jul 8, 2024

batconjurer mentioned this issue Aug 15, 2024

Add birthdays to MASP keys #3653

Merged

1 task

Fraccaman closed this as completed Aug 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shielded sync improvement #2900

Shielded sync improvement #2900

brentstone commented Mar 14, 2024 •

edited

Loading

phy-chain commented Mar 15, 2024

opsecx commented Mar 15, 2024 •

edited

Loading

Fraccaman commented Mar 15, 2024

thousandsofthem commented Mar 15, 2024 •

edited

Loading

Rigorously commented Mar 15, 2024 •

edited

Loading

opsecx commented Mar 15, 2024 •

edited

Loading

Rigorously commented Mar 15, 2024

opsecx commented Mar 15, 2024 •

edited

Loading

cwgoes commented Mar 15, 2024

Fraccaman commented Mar 15, 2024

Fraccaman commented Mar 15, 2024

chimmykk commented Mar 16, 2024 •

edited

Loading

opsecx commented Mar 17, 2024

Shielded sync improvement #2900

Shielded sync improvement #2900

Comments

brentstone commented Mar 14, 2024 • edited Loading

phy-chain commented Mar 15, 2024

opsecx commented Mar 15, 2024 • edited Loading

Fraccaman commented Mar 15, 2024

thousandsofthem commented Mar 15, 2024 • edited Loading

Rigorously commented Mar 15, 2024 • edited Loading

opsecx commented Mar 15, 2024 • edited Loading

Rigorously commented Mar 15, 2024

opsecx commented Mar 15, 2024 • edited Loading

cwgoes commented Mar 15, 2024

Fraccaman commented Mar 15, 2024

Fraccaman commented Mar 15, 2024

chimmykk commented Mar 16, 2024 • edited Loading

opsecx commented Mar 17, 2024

brentstone commented Mar 14, 2024 •

edited

Loading

opsecx commented Mar 15, 2024 •

edited

Loading

thousandsofthem commented Mar 15, 2024 •

edited

Loading

Rigorously commented Mar 15, 2024 •

edited

Loading

opsecx commented Mar 15, 2024 •

edited

Loading

opsecx commented Mar 15, 2024 •

edited

Loading

chimmykk commented Mar 16, 2024 •

edited

Loading