fix: chain: cancel long operations upon ctx cancelation #11206

AnomalRoil · 2023-08-24T15:05:23Z

Related Issues

Currently there are some operations that are relatively long winded and can be triggered through a remote RPC call and that will carry out even in case of a context cancellation.

Sadly, the ratelimiting done in proxy_fil.go is not enough since it doesn't take into account the fact that some operations can take many minutes to finish.

Proposed Changes

Given how Context are passed to all operations, a RPC call that is being cancelled by the client will cause a Context cancellation that trickles down to the lengthy operations, therefore checking for such context cancellation in the lengthy operations allows us to cancel early when we can.

Additional Info

Sadly it seems this area of the code is heavily untested and I could not make a test for it. I have however tested it locally as well as on a mainnet node running 1.23.3 and confirmed it did now cancel the operations upon cancelling the RPC calls.

Should we do the same for more operations? I noticed these two that are fairly expensive, but there might be other I did not spot?

A better fix might be to have a limit on the max amount of concurrent expensive operations we agree to do, instead of just ratelimiting the incoming requests, but this would require significant changes.

Checklist

Before you mark the PR ready for review, please make sure that:

Commits have a clear commit message.
PR title is in the form of of <PR type>: <area>: <change being made>
- example: fix: mempool: Introduce a cache for valid signatures
- PR type: fix, feat, build, chore, ci, docs, perf, refactor, revert, style, test
- area, e.g. api, chain, state, market, mempool, multisig, networking, paych, proving, sealing, wallet, deps
New features have usage guidelines and / or documentation updates in
- Lotus Documentation
- Discussion Tutorials
Tests exist for new functionality or change in behavior
CI is green

arajasek

LGTM

visualbasic6 · 2023-08-26T01:23:41Z

I identified this issue. The reason I focused on an RPC API crash is because Filecoin pays $500k for them:

Or do they?

I was just offered $6k instead of $500k for this bug. It should be documented in this PR that the Filecoin team absolutely, without question, is trying to scam us - but they won't get away with it

and the issue halts block production against the target node, desyncing it, and any dapps/wallets/explorers/etc. associated with the node w/ API open (e.g. public API nodes which are CRITICAL for network functionality)

arajasek · 2023-08-28T17:32:59Z

CI failure has been fixed by #11212

parthshah1 · 2023-08-29T23:03:29Z

Hi @visualbasic6 ,
Thank you for your vulnerability submission. We understand that you are dissatisfied with the outcome, and in particular the significance of the vulnerability and specific bounty amount assigned by the Filecoin Security Team. While, as you know, that determination is at the sole discretion of the Filecoin Security Team, we hope that the following discussion helps you and the community understand why we believe that this was not a critical vulnerability.

After carefully reviewing the information provided, we determined that the attack vector provided does not affect block production. The public API endpoints mentioned, such as halting dapps, wallets, and the public JSON RPC API, will still not impact block production, as they do not participate in consensus or block production. The root cause of this issue will never lead to block production disruption unless the majority of the network block producers change their node default setting and mis-configure their JSON RPC API endpoints’ accessibility. It is recommended that Storage Providers participating in block production do not expose their JSON RPC API endpoints to the public. See Lotus documentation, Setting the Listen Address.

As you can see from the Lotus documentation, the listen address is bound to the local loopback interface by default. To open access to the miner API for other machines, you have to set this to the IP address of the network interface you want to use. You can also set it to 0.0.0.0 to allow all interfaces. This may affect security, as is also explained in the documentation. Moreover, API access is protected by JWT tokens, and under the Lotus documentation, it should not be open to the internet. Accordingly, after the Filecoin Security Team reviewed and discussed that situation internally, our assessment was that the impact of the reported vulnerability on network consensus and block production does not apply.

It's important to note that the bounty table clarifies the types of impacts we are interested in and how severity is categorized. However, not every vulnerability will necessarily result in such an impact. The proposed reward was for the computation issue which could lead to an out-of-memory (OOM) problem on a lotus node that has JSON RPC API endpoints exposed to the public. The main Lotus users that may be impacted by this bug are public API service providers or nodes that support applications. The bounty table states that a vulnerability is considered critical if it causes an "RPC API crash capable of impacting block production," which does not apply in this case.

Accordingly, the Filecoin Security Team has evaluated the significance of the disclosed vulnerability, determined it was low severity, and assigned a bounty pursuant to our policy.

The Filecoin Security team remains committed to working with the security community through the bug bounty programs. If you still disagree with our assessment, we would be happy to review and evaluate a further submission that shows an attack vector that would have an impact on block production. Otherwise, we are ready to process the bounty once you provide us with the necessary information asked via email.

visualbasic6 · 2023-08-29T23:05:30Z

You are deflecting.

There's no such thing as a block production node with RPC API exposed unless the operator exposes it haphazardly. It just doesn't exist. Here your position becomes that your bug bounty program doesn't make sense.

Public RPC APIs are CRITICALLY important to developers for any meaningful blockchain network, e.g. the ones listed in your official docs which are all vulnerable to this attack: https://docs.filecoin.io/networks/mainnet/rpcs/

It should also be mentioned that it DOES halt block production on the target node. Our submission satisfies ALL of criteria for the $500k bug. Suddenly pretending that public RPC API isn't important to get out of paying a bug bounty is questionable.

Also questionable is how little you've paid on the Immunefi platform despite boasting huge payouts:

and see: https://user-images.githubusercontent.com/38997186/263425194-4eed6ccf-0a9d-4dbb-8034-966f87049ba9.png

Release the funds or publicly admit to false advertising and siphoning exploits for free or near-free - even if they clearly, in no uncertain terms and by all comp-sci academic standards, qualify as $500k bugs in the Filecoin bug bounty program.

Immunefi recently moved RPC API crashes from Critical to High - but many programs, including yours, have it listed as Critical. And for good reason. You're pretending that this doesn't exist?

Sorry - you'll need to do better than this. This isn't going to fly.

fix: cancel long operations upon ctx cancelation

13ade62

AnomalRoil requested a review from a team as a code owner August 24, 2023 15:05

arajasek approved these changes Aug 24, 2023

View reviewed changes

magik6k approved these changes Aug 25, 2023

View reviewed changes

arajasek merged commit c4214e2 into filecoin-project:master Aug 28, 2023

visualbasic6 mentioned this pull request Aug 28, 2023

ci: Use larger executor for cli tests #11212

Merged

7 tasks

AnomalRoil deleted the fix/ctxonlongops branch August 30, 2023 07:23

simlecode mentioned this pull request Aug 31, 2023

fix: chain: cancel long operations upon ctx cancelation filecoin-project/venus#6144

Merged

6 tasks

Fatman13 mentioned this pull request Sep 4, 2023

[pm] product planning / 需求收集 ipfs-force-community/venus-core-devs#8

Closed

21 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: chain: cancel long operations upon ctx cancelation #11206

fix: chain: cancel long operations upon ctx cancelation #11206

AnomalRoil commented Aug 24, 2023

arajasek left a comment

visualbasic6 commented Aug 26, 2023 •

edited

Loading

arajasek commented Aug 28, 2023

parthshah1 commented Aug 29, 2023

visualbasic6 commented Aug 29, 2023 •

edited

Loading

fix: chain: cancel long operations upon ctx cancelation #11206

fix: chain: cancel long operations upon ctx cancelation #11206

Conversation

AnomalRoil commented Aug 24, 2023

Related Issues

Proposed Changes

Additional Info

Checklist

arajasek left a comment

Choose a reason for hiding this comment

visualbasic6 commented Aug 26, 2023 • edited Loading

arajasek commented Aug 28, 2023

parthshah1 commented Aug 29, 2023

visualbasic6 commented Aug 29, 2023 • edited Loading

visualbasic6 commented Aug 26, 2023 •

edited

Loading

visualbasic6 commented Aug 29, 2023 •

edited

Loading