getLogs seems to be missing events when run at head #12078

ashwinphatak · 2024-06-10T06:17:30Z

Checklist

This is not a security-related bug/issue. If it is, please follow please follow the security policy.
I have searched on the issue tracker and the lotus forum, and there is no existing related issue or discussion.
I did not make any code changes to lotus.

Lotus component

lotus Ethereum RPC
lotus FVM - Lotus FVM interactions
FEVM tooling
Other

Lotus Version

Daemon:  1.27.0-dev+mainnet+git.20640c062+api1.5.0
Local: lotus version 1.27.0-dev+mainnet+git.20640c062

Repro Steps

We don't yet have steps to reproduce. The issue showed up in long running downstream processes and only seems to happen once in a while.

Describe the Bug

So far we've found at least two instances when lotus didn't return matching events during a getLogs RPC call at head (https://filfox.info/en/tipset/3976539). The same calls returned data when made later (couple of days). These missing logs were only discovered in retrospect because the downstream service started throwing errors when processing later events. The downstream service code remained unchanged during these tests.

This screenshot shows a db table during the original run, during which the block which was processed returned no matching events:

This screenshot shows the result of running the same getLogs call after a couple of days:

To confirm this issue exists with getLogs only at HEAD, we've left the service running with changes to stay 16 blocks behind head at all times. It's been running for more than two days so far without similar problems.

Tooling

We're using etherjs to make RPC calls

Configuration Options

config.toml


[Fevm]
  # EnableEthRPC enables eth_ rpc, and enables storing a mapping of eth transaction hashes to filecoin message Cids.
  # This will also enable the RealTimeFilterAPI and HistoricFilterAPI by default, but they can be disabled by config options above.
  #
  # type: bool
  # env var: LOTUS_FEVM_ENABLEETHRPC
  #EnableEthRPC = false

  # EthTxHashMappingLifetimeDays the transaction hash lookup database will delete mappings that have been stored for more than x days
  # Set to 0 to keep all mappings
  #
  # type: int
  # env var: LOTUS_FEVM_ETHTXHASHMAPPINGLIFETIMEDAYS
  #EthTxHashMappingLifetimeDays = 0

run script:

export LOTUS_CHAINSTORE_ENABLESPLITSTORE="true"
export LOTUS_EVENTS_ENABLEACTOREVENTSAPI="true"
export LOTUS_INDEX_ENABLEMSGINDEX="true"
export LOTUS_FEVM_ENABLEETHRPC="true"
export LOTUS_FEVM_ENABLEETHHASHTOFILECOINCIDMAPPING="true"

./bin/lotus daemon

The text was updated successfully, but these errors were encountered:

eshon · 2024-06-10T07:07:26Z

Is this possibly a reorg issue?

Update: @ashwinphatak says the block hash is the same

rvagg · 2024-06-11T03:46:07Z

We're using etherjs to make RPC calls

@ashwinphatak are you able to provide details of the actual call being made? "the block hash is the same". The API gives the ability to specific a precise tipset with the blockHash property, but you'd have to have that tipset from another source and it only gives you events for that tipset only. Alternatively there's the fromBlock and toBlock properties that take a hex encoded epoch number, or "latest" or "earliest".

The behaviour is very different whether you're using blockHash or not. If you're using blockHash, where are you getting the value from before making the call? Perhaps this is a question for etherjs but I'm not familiar with how it works unfortunately.

If it's not using a blockHash, then unfortunately you're probably experiencing reorg/finality issues as @eshon suggested. The hashed epoch in fromBlock or toBlock could be pointing to an entirely different tipset and you have no assurances prior to finality @ 900 epochs, but it's almost always much quicker than that.

ashwinphatak · 2024-06-11T04:08:04Z

Here is the equivalent curl request:

curl -X POST -H "Content-Type: application/json" \
  --data '{"method":"eth_getLogs","params":[{
    "fromBlock": "0x3CAD5B",
    "toBlock": "0x3CAD5B",
    "address": ["0xc35DADB65012eC5796536bD9864eD8773aBc74C4","0xF4d73326C13a4Fc5FD7A064217e12780e9Bd62c3","0x1d1375281265e4Dd496d90455f7C82f4fbd85cC2","0xDaD662D11C33F92E19bB3536eE3e0c833a6a8E0F","0x46188244C463198249c3A3F45c1FdDc6633CD64E","0xF16f045Ec3a1Ef8a002cA093A971EDee0F2F7b25","0xb803896a112eB8988E7B9541c86596cdFAD68D93","0x46Bc9C94b030Adbfdc1EF2725184f6911069f186","0x510Ce4716D33C9d460d2DFf37c9DbfA12daebA22","0xd234D741371dadEB6085B828b1B06768b9DaE5F9","0xB29eB187b3721e2138F6188a384E21D8b084A7Dd","0x4d09aF86521461733d2c484BD676710d3774288E","0x728cFED2232883dAFBE4165c47BFC30cFfB77207","0xB902b8CA0505CC98fC4D82dE0e22E09781aD1570","0x62b623cA3f3323EcdedAA2c5fD5AD403E56c130a","0xFb89125D2361871F1E31eb062c0EA5b87234e11f","0x514559E26782D40AeC0b779e3c76aB44cb7baC78","0x5347a028cfbBeB9983E6c5af3D98d2FF43765905","0xF87E12802E6Ba3FfF1ad0ef97Be7c3BFF2904EeB","0xB9494535d3174dDcaAfEB8D1C5543ee6929Ca325","0x0f23e3D469374225bd1763f7Ed95c7ec94A6c391","0x1DCD0396e92d0c92355f99e60054c0d8467b1DAe","0x5B2B4Be7C22c308e3612EFBa2BD85b945B8489f7","0x913c7C9d6307AD5311D5c023086E0658Cf901B7b","0xAC3B888Bd4adE788BFd60dAeb88EA524E39D1376","0x07d61E795A67eeeb9bb3a8066CEf1719A429339c","0xD02D9A237f3E099970e87d2DC5aC9e462405b81f","0x802C126d8fbBa47d6Ab66E2803b5Db1eB9c86732","0x6c4Bc5239c3EE99C9eADEf92b235F98629024955","0x21d31f071A6336Ee4b25c9487839e458C4d23CD1","0x41d81e6F66590FB6F3315a4A622dc5570C2B91c1","0xaFf6Af999c145E9ea5742c3DCC9C3ece21D293b1","0x2f6E3E6A49958cbd7241640530d7Dcb44258EADD","0x9FFfb60134604A1649F4D85b423728C6781FF981","0xa453055366EA33d54d56bd72941CD5DDd38d21d1","0x298f93B06821995a704072a49aeF187De7337fBF","0xa15AF06Bf8a82D388903E0Fa27ca6950915168e3","0x29F1eCFB0F320aA36B362ea1Bb5DaB8B4894256a","0x094FE771ae75037df778791585a27f6aFf8A1F07","0x82C480ec151ef0e3FF398288f7D327A70692F8fD","0x2fAF35aEE2e9e7856742f3E33862f45078ffbaF3"],
    "topics":[["0x783cca1c0412dd0d695e784568c96da2e9c22ff989357a2e8b1d9b2b4e6b7118","0x3067048beee31b25b2f1681f88dac838c8bba36af25bfb2b7cf7473a5847e35f","0x26f6a048ee9138f2c0ce266f322cb99228e8d619ae2bff30c67f8dcf9d2377b4","0x40d0efd1a53d60ecbf40971b9daf7dc90178c3aadc7aab1765632738fa8b8f01","0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef","0x98636036cb66a9c19a37435efc1e90142190214e8abeb821bdba3f2990dd4c95","0xc42079f94a6350d7e6235f29174924f928cc2ac818eb64fed8004e115fbcca67","0x7a53080ba414158be7ec69b987b5fb7d07dee101fe85488f0853ae16239d0bde","0x0c396cd989a39f4459b5fa1aed6a9a8dcdbc45908acfd67e028cd568da98982c","0xbdbdb71d7860376ba52b25a5028beea23581364a40522f6bcfb86bb1f2dca633"]]
  }],"id":1,"jsonrpc":"2.0"}' \
  <RPC-ENDPOINT-HERE>

If it's not using a blockHash, then unfortunately you're probably experiencing reorg/finality issues as @eshon suggested. The hashed epoch in fromBlock or toBlock could be pointing to an entirely different tipset and you have no assurances prior to finality @ 900 epochs, but it's almost always much quicker than that.

Just so I understand this correctly, are you saying that a query like the curl request above is only reliable after 900 epochs?

Edit: We index the events by the block hash returned by getLogs, so effectively when processing, we're always using the block hash.

eshon · 2024-06-11T07:25:09Z

"HEAD-20 epochs is generally considered safe ( this is also what WindowPoST is based on )"

HEAD-20 is 10 minutes

Hard finality in Filecoin is HEAD-900 or 16.67 hours

But 10 minutes is also apparently statistically ok to use - @rvagg please confirm!

(This will change when Fast Finality comes to Filecoin soon.)

rvagg · 2024-06-11T07:57:37Z

Yes, so 900 is the point at which clients consider the chain strictly canonical and irreversible. It's the "I never want to even think about chain reorgs" option. But there's a gradient from there to 0.

I'm going to have to try and find someone better qualified who can give a firmer answer about the probabilistic gradient between 900 and 0, but it gets significantly less probable that you'll encounter a reorg as you move backward from zero. Apparently some large users are going with 100 as a "safe-enough" and I think some might be going with 30 as an option. It's going to depend on what risk you're willing to deal with vs the recency you need. Ultimately, unless you're going to with 900+ then you really should be writing your application (and this goes for almost any blockchain app) with the assumption that consensus is a bit squishy and that you may have to deal with head reorganisation. You can sample from the head, but you should probably go back at some point and re-sample that same spot and make sure that what you have is canonical.

I was shown this today: https://docs.axelar.dev/learn/txduration#casper-friendly-finality-gadget-eg-eth-pos Alexar's definition of finality across various chains. They're putting Filecoin at 100.

The F3 project, which is currently under heavy development, targeting deployment later this year, will radically change that, the FIP has a bunch of really interesting background on how Filecoin does finality that might be worth a read:
https://github.com/filecoin-project/FIPs/blob/master/FIPS/fip-0086.md - just note though that users will always need to be prepared to fall back to EC (expected consensus) with its 900 epoch rule in cases where F3 may have trouble—it's a mechanism on top of EC to greatly improve the guarantees for the normal case, but with fallbacks for the just-in-case.

In other APIs, such as GetActorEventsRaw, and SubscribeActorEventsRaw, we communicate reorgs for events via a reverted boolean such that you should be able to use it to update your local state to match what the node you're talking to has had to do to arrive at the canonical chain with its peers. e.g. if you subscribe via that API then you'll get a stream of events as the node experience them including when it discovers it's not on the canonical chain so needs to walk back and send you events again but with reverted=true so you can update your local state. But of course that can all be tricky to juggle locally but it's an unfortunate fact of consensus that it can take time to get it right. See here for some discussion of an alternative approach we're looking at for doing a "my last query got me to head X, I now want to get to head Y (perhaps 'latest'), show me the diff in events, including reverts, that get me there", and having the node walk the path, possibly with some backward steps, for you.

jsoares · 2024-06-11T08:19:52Z

But 10 minutes is also apparently statistically ok to use - @rvagg please confirm!

Have a look at FRC-0089. The ECFC hasn't been implemented in lotus, so you can't use that directly, but results show 30 epochs is generally very safe, though how safe depends on system conditions. We'll have a paper soon with more extensive evaluation.

ashwinphatak · 2024-06-11T08:24:17Z

Using lotus, how can I check/confirm if block hash 0x8d5612b14d72fc9a47d59836812ac15438c9b01f956e9c905732c5c855f75b11 is canonical?

AFDudley · 2024-06-11T11:18:51Z

@rvagg do you understand how completely unlike Ethereum this behavior is? We understand finality and reorg safety just fine. The problem is Filecoin does a very poor job of pretending to be compatible when it's not. FEVM is acutely NOT EVM-compatible, given it half exposes it's finality model that is compatible with neither PoW or PoS Ethereum.

AFDudley · 2024-06-11T11:25:23Z

I'm sorry folks this is pretty insane, you provide an Etheruem blockhash but the data it represents is mutable? How is that even possible? How can we pretend this is anything but a deep violation of system invariants. If what Rod is saying is true how could that blockhash have been valid in both cases?

aarshkshah1992 · 2024-06-11T11:51:53Z

@AFDudley

"I'm sorry folks this is pretty insane, you provide an Etheruem blockhash but the data it represents is mutable"

What do you mean here when you say that the "data represented by a blockhash is mutable" ?

The blockhash pointed to by an epoch is mutable -> the same epoch can point to a different block if there is a re-org and so point to different events which is what you are observing here

However, the data/events a blockhash points to do not get mutated. The problem here is that you are fetching events by specifying a certain epoch and then indexing those events by the hash of the block. However, if the block at that epoch gets re-orged, you will see a different set of events for the same epoch.

AFDudley · 2024-06-11T12:36:55Z

The issue is we have a workaround that works for geth-based systems that doesn't work for Lotus. Basically, with geth if you get a blockhash from a blocknumber and then immediately call getLogs for that number you'll get the logs for the blockhash at that number. With Lotus, it appears that no logs are returned in that case, not because of reorg, but because the logs have yet to be processed. This hypothesis fits the data I have at the moment better than a reorg happened in the interval of time between the calls. if the logs were non-zero then we could positively assert a reorg. but because we don't have a blockhash with the 0 result, we can't confirm a reorg, this is an issue with the Ethereum "spec".

AFDudley · 2024-06-11T12:39:28Z

#12078 (comment) digresses this conversation in a confusing direction. Please positively assert to us that the block number we provided was reorg'ed instead of everyone just assuming it was. We have other reorgs in our database that don't exhibit this behavior. Furthermore, we should also see the block with zero logs that was reorg'ed out. Thanks.

EDIT: There was not a fork at that block height:
https://filecoin.blockscout.com/block/0x8d5612b14d72fc9a47d59836812ac15438c9b01f956e9c905732c5c855f75b11
https://filecoin.blockscout.com/blocks?tab=reorgs&page=2&next_page_params=%257B%2522block_number%2522%253A3980402%252C%2522items_count%2522%253A50%257D

Again maybe Lotus reporting 0 is correct behavior but it was not doing that because of a reorg.

eshon · 2024-06-11T13:44:19Z

It looks like eth_getBlockByNumber has these params to identify reorg safety, wondering what Lotus does for safe?

ZenGround0 · 2024-06-11T21:30:01Z

I think this is what @aarshkshah1992 is working on here: https://filecoinproject.slack.com/archives/CP50PPW2X/p1718124022999279

ashwinphatak · 2024-06-12T04:35:45Z

EDIT: There was not a fork at that block height:

This is consistent with what the downstream processes (watchers) saw. The watcher code is fully reorg aware.

The first screenshot above already shows it found only one block at height 3976539. The following screenshot shows the reorgs it ran into and how it correctly pruned out the non-canonical blocks.

aarshkshah1992 · 2024-06-20T08:01:53Z

Closed by #12080. Will go out in the next Lotus release.

ashwinphatak · 2024-06-26T12:17:49Z

Closed by #12080. Will go out in the next Lotus release.

Has this fix made it into a release yet? If not, when is it expected?

I believe this issue exists even when getLogs is called with block hash. Can you confirm?

rvagg · 2024-06-26T23:58:26Z

Didn't make it into 1.27.1 but we discussed a follow up with a 1.27.2 to get this in. No ETA at the moment because there's so much nv23 activity at the moment focused on 1.28.0.
If you were super keen you could try out the 1.28.0 rc, it has most of what's on master including this. But this has a database migration that would make downgrading a node difficult if you wanted to go backwards.

And off the top of my head I believe this deals with the block hash case too.

aarshkshah1992 · 2024-06-27T05:07:34Z

@ashwinphatak Yes, this fix will work for blockhash as well. Let me see if we can get this in an RC for you.

ashwinphatak added the kind/bug Kind: Bug label Jun 10, 2024

rjan90 added this to FilOz Jun 10, 2024

rjan90 moved this to 📌 Triage in FilOz Jun 10, 2024

rjan90 assigned aarshkshah1992 Jun 11, 2024

rjan90 moved this from 📌 Triage to 🐱Todo in FilOz Jun 11, 2024

eshon mentioned this issue Jun 12, 2024

Add finality-related params for eth_getBlockByNumber #12083

Closed

9 tasks

rjan90 added the area/eth-api label Jun 12, 2024

BigLep moved this from 🐱Todo to ⌨️In Progress in FilOz Jun 12, 2024

aarshkshah1992 mentioned this issue Jun 14, 2024

fix: events index: record processed epochs and tipsets for events and eth_get_log blocks till requested tipset has been indexed #12080

Merged

aarshkshah1992 closed this as completed Jun 20, 2024

rjan90 moved this from ⌨️In Progress to 🎉 Done in FilOz Jun 30, 2024

rjan90 moved this from 🎉 Done to ☑️Done (Archive) in FilOz Jun 30, 2024

ashwinphatak mentioned this issue Jul 10, 2024

ETH RPC issues #12205

Closed

7 tasks

eshon mentioned this issue Jul 26, 2024

FilecoinAddressToEthAddress RPC API call does not support f1/f2/f3 #12308

Closed

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

getLogs seems to be missing events when run at head #12078

getLogs seems to be missing events when run at head #12078

ashwinphatak commented Jun 10, 2024

eshon commented Jun 10, 2024 •

edited

Loading

rvagg commented Jun 11, 2024

ashwinphatak commented Jun 11, 2024 •

edited

Loading

eshon commented Jun 11, 2024 •

edited

Loading

rvagg commented Jun 11, 2024

jsoares commented Jun 11, 2024

ashwinphatak commented Jun 11, 2024

AFDudley commented Jun 11, 2024

AFDudley commented Jun 11, 2024

aarshkshah1992 commented Jun 11, 2024 •

edited

Loading

AFDudley commented Jun 11, 2024

AFDudley commented Jun 11, 2024 •

edited

Loading

eshon commented Jun 11, 2024

ZenGround0 commented Jun 11, 2024

ashwinphatak commented Jun 12, 2024

aarshkshah1992 commented Jun 20, 2024

ashwinphatak commented Jun 26, 2024

rvagg commented Jun 26, 2024

aarshkshah1992 commented Jun 27, 2024

getLogs seems to be missing events when run at head #12078

getLogs seems to be missing events when run at head #12078

Comments

ashwinphatak commented Jun 10, 2024

Checklist

Lotus component

Lotus Version

Repro Steps

Describe the Bug

Tooling

Configuration Options

eshon commented Jun 10, 2024 • edited Loading

rvagg commented Jun 11, 2024

ashwinphatak commented Jun 11, 2024 • edited Loading

eshon commented Jun 11, 2024 • edited Loading

rvagg commented Jun 11, 2024

jsoares commented Jun 11, 2024

ashwinphatak commented Jun 11, 2024

AFDudley commented Jun 11, 2024

AFDudley commented Jun 11, 2024

aarshkshah1992 commented Jun 11, 2024 • edited Loading

AFDudley commented Jun 11, 2024

AFDudley commented Jun 11, 2024 • edited Loading

eshon commented Jun 11, 2024

ZenGround0 commented Jun 11, 2024

ashwinphatak commented Jun 12, 2024

aarshkshah1992 commented Jun 20, 2024

ashwinphatak commented Jun 26, 2024

rvagg commented Jun 26, 2024

aarshkshah1992 commented Jun 27, 2024

eshon commented Jun 10, 2024 •

edited

Loading

ashwinphatak commented Jun 11, 2024 •

edited

Loading

eshon commented Jun 11, 2024 •

edited

Loading

aarshkshah1992 commented Jun 11, 2024 •

edited

Loading

AFDudley commented Jun 11, 2024 •

edited

Loading