-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
getLogs seems to be missing events when run at head #12078
Comments
Is this possibly a reorg issue? Update: @ashwinphatak says the block hash is the same |
@ashwinphatak are you able to provide details of the actual call being made? "the block hash is the same". The API gives the ability to specific a precise tipset with the The behaviour is very different whether you're using If it's not using a |
Here is the equivalent curl request:
Just so I understand this correctly, are you saying that a query like the curl request above is only reliable after 900 epochs? Edit: We index the events by the block hash returned by getLogs, so effectively when processing, we're always using the block hash. |
"HEAD-20 epochs is generally considered safe ( this is also what WindowPoST is based on )" HEAD-20 is 10 minutes Hard finality in Filecoin is HEAD-900 or 16.67 hours But 10 minutes is also apparently statistically ok to use - @rvagg please confirm! (This will change when Fast Finality comes to Filecoin soon.) |
Yes, so 900 is the point at which clients consider the chain strictly canonical and irreversible. It's the "I never want to even think about chain reorgs" option. But there's a gradient from there to 0. I'm going to have to try and find someone better qualified who can give a firmer answer about the probabilistic gradient between 900 and 0, but it gets significantly less probable that you'll encounter a reorg as you move backward from zero. Apparently some large users are going with 100 as a "safe-enough" and I think some might be going with 30 as an option. It's going to depend on what risk you're willing to deal with vs the recency you need. Ultimately, unless you're going to with 900+ then you really should be writing your application (and this goes for almost any blockchain app) with the assumption that consensus is a bit squishy and that you may have to deal with head reorganisation. You can sample from the head, but you should probably go back at some point and re-sample that same spot and make sure that what you have is canonical. I was shown this today: https://docs.axelar.dev/learn/txduration#casper-friendly-finality-gadget-eg-eth-pos Alexar's definition of finality across various chains. They're putting Filecoin at 100. The F3 project, which is currently under heavy development, targeting deployment later this year, will radically change that, the FIP has a bunch of really interesting background on how Filecoin does finality that might be worth a read: In other APIs, such as |
Have a look at FRC-0089. The ECFC hasn't been implemented in lotus, so you can't use that directly, but results show 30 epochs is generally very safe, though how safe depends on system conditions. We'll have a paper soon with more extensive evaluation. |
Using lotus, how can I check/confirm if block hash |
@rvagg do you understand how completely unlike Ethereum this behavior is? We understand finality and reorg safety just fine. The problem is Filecoin does a very poor job of pretending to be compatible when it's not. FEVM is acutely NOT EVM-compatible, given it half exposes it's finality model that is compatible with neither PoW or PoS Ethereum. |
I'm sorry folks this is pretty insane, you provide an Etheruem blockhash but the data it represents is mutable? How is that even possible? How can we pretend this is anything but a deep violation of system invariants. If what Rod is saying is true how could that blockhash have been valid in both cases? |
"I'm sorry folks this is pretty insane, you provide an Etheruem blockhash but the data it represents is mutable" What do you mean here when you say that the "data represented by a blockhash is mutable" ? The blockhash pointed to by an epoch is mutable -> the same epoch can point to a different block if there is a re-org and so point to different events which is what you are observing here However, the data/events a blockhash points to do not get mutated. The problem here is that you are fetching events by specifying a certain epoch and then indexing those events by the hash of the block. However, if the block at that epoch gets re-orged, you will see a different set of events for the same epoch. |
The issue is we have a workaround that works for geth-based systems that doesn't work for Lotus. Basically, with geth if you get a blockhash from a blocknumber and then immediately call getLogs for that number you'll get the logs for the blockhash at that number. With Lotus, it appears that no logs are returned in that case, not because of reorg, but because the logs have yet to be processed. This hypothesis fits the data I have at the moment better than a reorg happened in the interval of time between the calls. if the logs were non-zero then we could positively assert a reorg. but because we don't have a blockhash with the 0 result, we can't confirm a reorg, this is an issue with the Ethereum "spec". |
#12078 (comment) digresses this conversation in a confusing direction. Please positively assert to us that the block number we provided was reorg'ed instead of everyone just assuming it was. We have other reorgs in our database that don't exhibit this behavior. Furthermore, we should also see the block with zero logs that was reorg'ed out. Thanks. EDIT: There was not a fork at that block height: Again maybe Lotus reporting |
It looks like |
I think this is what @aarshkshah1992 is working on here: https://filecoinproject.slack.com/archives/CP50PPW2X/p1718124022999279 |
This is consistent with what the downstream processes (watchers) saw. The watcher code is fully reorg aware. The first screenshot above already shows it found only one block at height 3976539. The following screenshot shows the reorgs it ran into and how it correctly pruned out the non-canonical blocks. |
Closed by #12080. Will go out in the next Lotus release. |
Has this fix made it into a release yet? If not, when is it expected? I believe this issue exists even when getLogs is called with block hash. Can you confirm? |
Didn't make it into 1.27.1 but we discussed a follow up with a 1.27.2 to get this in. No ETA at the moment because there's so much nv23 activity at the moment focused on 1.28.0. And off the top of my head I believe this deals with the block hash case too. |
@ashwinphatak Yes, this fix will work for |
Checklist
Lotus component
Lotus Version
Repro Steps
We don't yet have steps to reproduce. The issue showed up in long running downstream processes and only seems to happen once in a while.
Describe the Bug
So far we've found at least two instances when lotus didn't return matching events during a getLogs RPC call at head (https://filfox.info/en/tipset/3976539). The same calls returned data when made later (couple of days). These missing logs were only discovered in retrospect because the downstream service started throwing errors when processing later events. The downstream service code remained unchanged during these tests.
This screenshot shows a db table during the original run, during which the block which was processed returned no matching events:
This screenshot shows the result of running the same getLogs call after a couple of days:
To confirm this issue exists with getLogs only at HEAD, we've left the service running with changes to stay 16 blocks behind head at all times. It's been running for more than two days so far without similar problems.
Tooling
Configuration Options
run script:
The text was updated successfully, but these errors were encountered: