Fix(store-sync): Handle unsynced RPC nodes #2901

dhvanipa · 2024-06-13T11:55:26Z

Context: https://indexsupply.com/shovel/docs/#unsynchronized-ethereum-nodes

3rd party Ethereum API providers may load balance RPC requests across a set of unsynchronized nodes. In rare cases, the eth_getLogs request will be routed to a node that doesn’t have the latest block.
This issue has been noticed multiple times on Redstone

Solution: Batch fetch the latest block number with the logs, and ensure that the latest block number is past the toBlock. Otherwise, wait for the RPC to catch up to the latest block.

This fix relies on the client making a batch call to the RPC, which needs to be enabled by clients in their Viem HTTP transport.

For the sync stack, the fromBlock is decided from a local variable and the toBlock is decided by a subscription to the latest block number. So we need an additional filter from the latest block number subscription to not emit when the toBlock is behind the fromBlock, which happens with unsynchronized RPCs.

There is currently silent failure when this happens as when toBlock is less than fromBlock, no error is thrown, and no logs are fetched either, as this while loop will resolve false. So in this PR, we add a filter to not emit incorrect toBlock's and also a safety error check to throw if this case occurs.
- Because the batch call actually seems like only a fix for HTTP, I think this is the real issue that's causing incorrect logs with the sync-stack (since the default is to use WS). This is in-line with what we're seeing on the logs of the Biomes indexer pod, where it doesn't show there are 0 logs, but that the storageAdapter is never called.

…ynced RPCs

changeset-bot · 2024-06-13T11:55:30Z

⚠️ No Changeset found

Latest commit: 0aa7cc6

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

dhvanipa · 2024-06-13T11:57:29Z

packages/store-sync/src/createStoreSync.ts

  const latestBlock$ = createBlockStream({ publicClient, blockTag: followBlockTag }).pipe(shareReplay(1));
  const latestBlockNumber$ = latestBlock$.pipe(
    map((block) => block.number),
    tap((blockNumber) => {
      debug("on block number", blockNumber, "for", followBlockTag, "block tag");
    }),
+    filter((blockNumber) => {
+      return lastBlockNumberProcessed == null || blockNumber > lastBlockNumberProcessed;


unsure if we want to add a debug log here

dhvanipa · 2024-06-13T11:58:08Z

packages/block-logs-stream/src/fetchLogs.ts

@@ -90,7 +90,18 @@ export async function* fetchLogs<TAbiEvents extends readonly AbiEvent[]>({
    try {
      const toBlock = fromBlock + blockRange;
      debug("getting logs", { fromBlock, toBlock });
-      const logs = await publicClient.getLogs({ ...getLogsOpts, fromBlock, toBlock, strict: true });
+
+      const [latestBlockNumber, logs] = await Promise.all([


This only works if batch is enabled in the HTTP transport. This is decided by clients though, so maybe this is put into docs?

I wonder if we should have an option to enable the "validate unsynchronized nodes" behavior, where we require batch to be enabled or, ideally, do the batch call here directly rather than relying on some client configuration.

Since this approach isn't useful unless the call is atomic (going to the same RPC) and I suspect not all RPC providers support batching, seems like we should put this behind a flag to avoid doubling RPC calls for syncing data from RPC. Could make this the default approach with a way to opt out of this new behavior.

dhvanipa · 2024-06-13T11:58:45Z

packages/block-logs-stream/src/fetchLogs.ts

+        publicClient.getLogs({ ...getLogsOpts, fromBlock, toBlock, strict: true }),
+      ]);
+      if (latestBlockNumber < toBlock) {
+        const blockTimeInSeconds = 2;


Not sure what the best way to handle this is, ie should it be a deploy config? I dont think there's a way to detect from the chosen chain what the block time is?

we could use the client's pollingInterval here?

either way, we should probably align this with the other retry logic below

fix(store-sync): batch call block number with logs to account for uns…

0aa7cc6

…ynced RPCs

dhvanipa requested review from alvrs, holic and yonadaaa as code owners June 13, 2024 11:55

dhvanipa commented Jun 13, 2024

View reviewed changes

alvrs mentioned this pull request Dec 11, 2024

feat: add option to explicitly handle fetching logs from load balanced rpcs #3394

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix(store-sync): Handle unsynced RPC nodes #2901

Fix(store-sync): Handle unsynced RPC nodes #2901

dhvanipa commented Jun 13, 2024 •

edited

Loading

changeset-bot bot commented Jun 13, 2024

dhvanipa Jun 13, 2024

dhvanipa Jun 13, 2024

holic Jun 20, 2024

dhvanipa Jun 13, 2024

holic Jun 20, 2024 •

edited

Loading

Fix(store-sync): Handle unsynced RPC nodes #2901

Are you sure you want to change the base?

Fix(store-sync): Handle unsynced RPC nodes #2901

Conversation

dhvanipa commented Jun 13, 2024 • edited Loading

changeset-bot bot commented Jun 13, 2024

⚠️ No Changeset found

dhvanipa Jun 13, 2024

Choose a reason for hiding this comment

dhvanipa Jun 13, 2024

Choose a reason for hiding this comment

holic Jun 20, 2024

Choose a reason for hiding this comment

dhvanipa Jun 13, 2024

Choose a reason for hiding this comment

holic Jun 20, 2024 • edited Loading

Choose a reason for hiding this comment

dhvanipa commented Jun 13, 2024 •

edited

Loading

holic Jun 20, 2024 •

edited

Loading