-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: chain: cancel long operations upon ctx cancelation #11206
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I identified this issue. The reason I focused on an RPC API crash is because Filecoin pays $500k for them: Or do they? I was just offered $6k instead of $500k for this bug. It should be documented in this PR that the Filecoin team absolutely, without question, is trying to scam us - but they won't get away with it and the issue halts block production against the target node, desyncing it, and any dapps/wallets/explorers/etc. associated with the node w/ API open (e.g. public API nodes which are CRITICAL for network functionality) |
CI failure has been fixed by #11212 |
Hi @visualbasic6 , After carefully reviewing the information provided, we determined that the attack vector provided does not affect block production. The public API endpoints mentioned, such as halting dapps, wallets, and the public JSON RPC API, will still not impact block production, as they do not participate in consensus or block production. The root cause of this issue will never lead to block production disruption unless the majority of the network block producers change their node default setting and mis-configure their JSON RPC API endpoints’ accessibility. It is recommended that Storage Providers participating in block production do not expose their JSON RPC API endpoints to the public. See Lotus documentation, Setting the Listen Address. As you can see from the Lotus documentation, the listen address is bound to the local loopback interface by default. To open access to the miner API for other machines, you have to set this to the IP address of the network interface you want to use. You can also set it to 0.0.0.0 to allow all interfaces. This may affect security, as is also explained in the documentation. Moreover, API access is protected by JWT tokens, and under the Lotus documentation, it should not be open to the internet. Accordingly, after the Filecoin Security Team reviewed and discussed that situation internally, our assessment was that the impact of the reported vulnerability on network consensus and block production does not apply. It's important to note that the bounty table clarifies the types of impacts we are interested in and how severity is categorized. However, not every vulnerability will necessarily result in such an impact. The proposed reward was for the computation issue which could lead to an out-of-memory (OOM) problem on a lotus node that has JSON RPC API endpoints exposed to the public. The main Lotus users that may be impacted by this bug are public API service providers or nodes that support applications. The bounty table states that a vulnerability is considered critical if it causes an "RPC API crash capable of impacting block production," which does not apply in this case. Accordingly, the Filecoin Security Team has evaluated the significance of the disclosed vulnerability, determined it was low severity, and assigned a bounty pursuant to our policy. The Filecoin Security team remains committed to working with the security community through the bug bounty programs. If you still disagree with our assessment, we would be happy to review and evaluate a further submission that shows an attack vector that would have an impact on block production. Otherwise, we are ready to process the bounty once you provide us with the necessary information asked via email. |
You are deflecting. There's no such thing as a block production node with RPC API exposed unless the operator exposes it haphazardly. It just doesn't exist. Here your position becomes that your bug bounty program doesn't make sense. Public RPC APIs are CRITICALLY important to developers for any meaningful blockchain network, e.g. the ones listed in your official docs which are all vulnerable to this attack: https://docs.filecoin.io/networks/mainnet/rpcs/ It should also be mentioned that it DOES halt block production on the target node. Our submission satisfies ALL of criteria for the $500k bug. Suddenly pretending that public RPC API isn't important to get out of paying a bug bounty is questionable. Also questionable is how little you've paid on the Immunefi platform despite boasting huge payouts: Release the funds or publicly admit to false advertising and siphoning exploits for free or near-free - even if they clearly, in no uncertain terms and by all comp-sci academic standards, qualify as $500k bugs in the Filecoin bug bounty program. Immunefi recently moved RPC API crashes from Critical to High - but many programs, including yours, have it listed as Critical. And for good reason. You're pretending that this doesn't exist? Sorry - you'll need to do better than this. This isn't going to fly. |
Related Issues
Currently there are some operations that are relatively long winded and can be triggered through a remote RPC call and that will carry out even in case of a context cancellation.
Sadly, the ratelimiting done in
proxy_fil.go
is not enough since it doesn't take into account the fact that some operations can take many minutes to finish.Proposed Changes
Given how Context are passed to all operations, a RPC call that is being cancelled by the client will cause a Context cancellation that trickles down to the lengthy operations, therefore checking for such context cancellation in the lengthy operations allows us to cancel early when we can.
Additional Info
Sadly it seems this area of the code is heavily untested and I could not make a test for it. I have however tested it locally as well as on a mainnet node running 1.23.3 and confirmed it did now cancel the operations upon cancelling the RPC calls.
Should we do the same for more operations? I noticed these two that are fairly expensive, but there might be other I did not spot?
A better fix might be to have a limit on the max amount of concurrent expensive operations we agree to do, instead of just ratelimiting the incoming requests, but this would require significant changes.
Checklist
Before you mark the PR ready for review, please make sure that:
<PR type>: <area>: <change being made>
fix: mempool: Introduce a cache for valid signatures
PR type
: fix, feat, build, chore, ci, docs, perf, refactor, revert, style, testarea
, e.g. api, chain, state, market, mempool, multisig, networking, paych, proving, sealing, wallet, deps