Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Consider switching to panic="abort" #10874

Open
pepyakin opened this issue Feb 17, 2022 · 6 comments
Open

Consider switching to panic="abort" #10874

pepyakin opened this issue Feb 17, 2022 · 6 comments

Comments

@pepyakin
Copy link
Contributor

At the moment of writing, we have panic = "unwind" requirement for all substrate binaries.

This is a historical artifact of us having a native runtime. After paritytech/polkadot-sdk#62 and #10872 we migth remove the native runtime.

In that case we might be able to lift the panic = "unwind" requirement. Binaries with panic = "abort" should be smaller and more performant, although it's not clear on how much.

Also, since we had this requirement for so long, there is a chance that we relied on this property somewhere either here or in polkadot. Before jumping the gun we should check whether there are any such places.

@arkpar
Copy link
Member

arkpar commented Feb 17, 2022

I have not measured performance yet, but I recall that the performance hit was about 10% longer block import time when panic = "unwind" was turned on.
As for the binary size, substrate shrinks from 60Mb to 46Mb simply by switching to "abort" in current master.

Also, since we had this requirement for so long, there is a chance that we relied on this property somewhere either here or in polkadot. Before jumping the gun we should check whether there are any such places.

This should be as simple as searching for any custom panic handlers, right?

@pepyakin
Copy link
Contributor Author

This should be as simple as searching for any custom panic handlers, right?

Yes, likely, but I won't bet on that.

@koute
Copy link
Contributor

koute commented Feb 24, 2022

Hmm.... switching the whole binary to panic="abort" might be somewhat risky and might open us up to a nasty denial of service attack. It'd just take one out-of-bounds array access or a stray unwrap() to crash the whole node, and we need to remember that we do have, like, a thousand dependencies and our nodes are exposed to the network.

So I don't think this is necessarily a good idea to do globally due to the risks involved. That said, if we could maybe split a part of the executable into a separate process (e.g. run the WASM executor in a separate process?) which would be well isolated (especially from the network) and whose potential failure we could handle from the outside (e.g. quickly restart it in case of an abort) then I think it'd make sense to consider using panic="abort" over there.

@arkpar
Copy link
Member

arkpar commented Feb 24, 2022

Current panic handling looks like a bit of a mess currently. Main thread sets a custom panic handler here which aborts. Spawned tasks are split into "essential" that abort on panic, and "non-essentials" somewhat arbitrary. E.g. transaction pool is considered "essential" but libp2p networking and block proposer are not 🤔

@bkchr
Copy link
Member

bkchr commented Feb 24, 2022

Any panic outside of the runtime was already always an abort?

@arkpar
Copy link
Member

arkpar commented Feb 24, 2022

Any panic outside of the runtime was already always an abort?

I assumed so, but looking at the code this does not seem to be the case. Only "essential" tasks abort everything.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants