Use wasm start-function linking and __cxa_atexit #25

sunfishcode · 2017-12-03T15:05:53Z

.init_array/.fini_array and .ctors/.dtors have a history of being exploitable. For example, "Abusing .CTORS and .DTORS For FUN and PROFIT", and other examples are easy to find. Modern ELF systems have mitigated it with RELRO which makes these sections read-only. However, WebAssembly doesn't support read-only memory. This may change in the future (or perhaps we could use a separate linear memory space), though at present there are no proposals. If we implement traditional .init_array/.fini_array support now, we'd be opening up an attack vector.

WebAssembly would be less vulnerable than traditional ELF sysystems without RELRO, because wasm's indirect calls can only call into defined function entry points, and type signatures have to match, however theoretical exploits are still possible.

I propose to avoid .init_array and .fini_array, and instead:

use wasm start functions for initialization code, and add support for linking them, rather than use .init_array, as suggested here.
lower destructor code, such as __attribute__((destructor)), into functions registered with __cxa_atext by initializers. I've implemented this in LLVM here.

To be sure, with current __cxa_atexit implementations, function pointers are still stored in writeable linear memory, so the problem already exists. However, .init_array/.fini_array are dense arrays of pointers, making them easier to hit, and they're more likely to live at a predictable address.

__cxa_atexit is also generally more robust than .fini_array because

If cleanups are registered dynamically while global ctors are being run, it properly orders all the cleanups, including the dynamically registered ones, in the reverse order.
If the process exits before all the global constructors have run, it doesn't run destructors that haven't been registered yet.

This approach also doesn't preclude implementing .init_array and .fini_array in the future. We could always add support for .fini_array in the tools without breaking the ABI.

The main downside of this approach is that it's different from how other platforms work, and would require more adaptation when porting libc and other low-level tools. However, I believe it's worth the effort.

The text was updated successfully, but these errors were encountered:

dschuff · 2017-12-04T18:12:17Z

I'm a little ambivalent about this.
I guess we have 2 separate proposals: to have the linker synthesize a start function, containing calls to each of the start functions of the object files; and to use __cxa_atexit or equivalent to call destructors (instead of a mechanism that mirrors init).

First, destruction because to me that's a simpler argument.
In favor of .fini_array:

Symmetric with .init_array
(not so much in favor of fini_array but against the proposed alternative): As written in the patch it requires the program to link __cxa_atexit (and have it use the C calling convention) even in C or another non-C++ language. Or failing that, the language/runtime author has to roll their own equivalent.

In favor of the alternative:

Shutdown ordering works correctly out of the box
Since wasm does have first-class construction support but not destruction, it makes some sense to expose that asymmetry.

On balance I think it makes sense to go with the alternative for now but I don't have a strong opinion.

Construction is more interesting:
First, the argument in favor of .init_array:

Consistency with ELF, ease of porting system/language runtimes
It's "declarative" in the sense that neither the linker nor other tools need to synthesize or understand any code, nor even to move code around or create code sections (this is one of the advantages .init_array has over the older .init scheme). All the startup and shutdown code is under the control of the system library author. I think this is an advantage that we haven't considered in the discussion so far.

In favor of the alternative:

The security mitigation.
Constructors do not need to be address-taken, reducing the size of the wasm table.

The security issue is an interesting one. It's basically in the general class of "don't have read-write function pointers anywhere it can be helped", and specifically "don't have read-write init function pointers." (and possibly fini function pointers, in the case of .fini_array). In general this is a great idea. I'm a little skeptical about how much protecting the init functions buys us though. The init functions are called before any user code is run, and never again (and if the attacker controls the contents of the binary as in the linked example, then it's already over anyway). So modifying them once the program has started doesn't matter. (We can protect the original startup function that calls them by not exporting it or taking its address at all). There are of course the fini functions; although if those are handled using atexit then pointers to them are stored in some library data structure. So in this particular case I'm not sure I buy the security argument.

Making the constructors not address-taken might be useful though?
To me it kind of comes down to how we feel about making the linker generate code. In abstract principle I think linkers shouldn't be required to have that sort of intelligence, but 1) wasm is a bit different in that regard and we will probably end up having some of that in LLD anyway, and 2) there's even precedent for linker-generated code in ELF where the benefit is big enough (e.g. TLS model reduction).

dschuff · 2017-12-04T18:18:14Z

I guess one other advantage of synthesizing a start function would be that if we do LTO, we could inline the init functions into it (and move it to the front of the code section). That seems like a win. And I guess a smart VM could also take advantage by e.g. not running the slow optimized codegen or something.

dschuff · 2017-12-04T18:21:03Z

And while I still don't really buy the security advantage of protecting the init process itself, keeping the init functions out of the table is probably still good as a means of reducing the amount of code that an attacker could cause to be called in some other context.

Ok, so... maybe I'm sold.

kripken · 2017-12-04T18:22:55Z

Any concerns about the reentrancy issues with the start function, that forced us to introduce __post_instantiate for dynamic linking?

sbc100 · 2017-12-04T18:26:54Z

I think we should consider separately whether or not to use the wasm start function. How this function is called can be a different discussion than whether or not this start function is linker synthesized.

sunfishcode · 2017-12-04T18:28:17Z

I agree with @sbc100; if it turns out that we can't use the wasm start function, we can use the __post_instantiate convention or so instead, and still do the rest of what's proposed here.

See also WebAssembly/design#1160 for another report of this issue.

sunfishcode · 2017-12-05T16:48:17Z

One issue is that plain start-function linking doesn't have an obvious way to implement init function priorities.

It could be implemented with linker metadata. It would consist of a list of (function index, priority) pairs. This isn't quite as pretty, but it does retain most of the advantages discussed above.

Optionally, it might be nice to continue to use the wasm start function, as a special case for priority 65535, which is the default priority for C++ constructors and __attribute__((constructor)), so it'd be what most user code uses.

dschuff · 2017-12-07T01:05:23Z

Related: The new data-segment initialization in the current threads proposal (see specifically https://github.com/WebAssembly/threads/blob/master/proposals/threads/ConditionalSegmentInitialization.md and the discussion at WebAssembly/threads#62) switches data segment initialization from being done implicitly by the VM to being done explicitly, in the start function. This means we'll have to at least have some initialization function that's called during instantiation (i.e. from the start function), at least for threaded code. Maybe we want to just have the same mechanism for that, as for constructors.

sunfishcode · 2017-12-07T14:16:58Z

https://reviews.llvm.org/D40759 is now updated with a patch implementing the LLVM side of both the ctor and dtor parts of this proposal, including support for init priorities.

Summary: This change lays the groundwork lowering of @llvm.global_ctors and @llvm.global_dtors for the wasm object format. Some parts of this patch are subset of: https://reviews.llvm.org/D40759 See WebAssembly/tool-conventions#25 Subscribers: jfb, dschuff, jgravelle-google, aheejin, sunfish Differential Revision: https://reviews.llvm.org/D41208 llvm-svn=320742

Summary: This change lays the groundwork lowering of @llvm.global_ctors and @llvm.global_dtors for the wasm object format. Some parts of this patch are subset of: https://reviews.llvm.org/D40759 See WebAssembly/tool-conventions#25 Subscribers: jfb, dschuff, jgravelle-google, aheejin, sunfish Differential Revision: https://reviews.llvm.org/D41208 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320742 91177308-0d34-0410-b5e6-96231b3b80d8

Summary: - lowers @llvm.global_dtors by adding @llvm.global_ctors functions which register the destructors with `__cxa_atexit`. - impements @llvm.global_ctors with wasm start functions and linker metadata See [here](WebAssembly/tool-conventions#25) for more background. Subscribers: jfb, dschuff, mgorny, jgravelle-google, aheejin, sunfish Differential Revision: https://reviews.llvm.org/D41211 llvm-svn=320774

Summary: - lowers @llvm.global_dtors by adding @llvm.global_ctors functions which register the destructors with `__cxa_atexit`. - impements @llvm.global_ctors with wasm start functions and linker metadata See [here](WebAssembly/tool-conventions#25) for more background. Subscribers: jfb, dschuff, mgorny, jgravelle-google, aheejin, sunfish Differential Revision: https://reviews.llvm.org/D41211 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320774 91177308-0d34-0410-b5e6-96231b3b80d8

This change create a new synthetic function in the final output binary which calls the static constructors in sequence. See: WebAssembly/tool-conventions#25 Differential Revision: https://reviews.llvm.org/D41893 llvm-svn=322388

kripken mentioned this issue Dec 5, 2017

__ATINIT__ defined in JS emscripten-core/emscripten#5841

Closed

sunfishcode added the Linking label Dec 7, 2017

sbc100 closed this as completed Jan 17, 2018

daboross mentioned this issue Jan 30, 2019

Support static constructors (ctors) rustwasm/wasm-bindgen#1216

Open

theidiotmachine mentioned this issue Feb 9, 2020

Error in WASM_INIT_FUNCS docs #137

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use wasm start-function linking and __cxa_atexit #25

Use wasm start-function linking and __cxa_atexit #25

sunfishcode commented Dec 3, 2017 •

edited

Loading

dschuff commented Dec 4, 2017 •

edited

Loading

dschuff commented Dec 4, 2017

dschuff commented Dec 4, 2017

kripken commented Dec 4, 2017

sbc100 commented Dec 4, 2017

sunfishcode commented Dec 4, 2017

sunfishcode commented Dec 5, 2017

dschuff commented Dec 7, 2017

sunfishcode commented Dec 7, 2017

Use wasm start-function linking and __cxa_atexit #25

Use wasm start-function linking and __cxa_atexit #25

Comments

sunfishcode commented Dec 3, 2017 • edited Loading

dschuff commented Dec 4, 2017 • edited Loading

dschuff commented Dec 4, 2017

dschuff commented Dec 4, 2017

kripken commented Dec 4, 2017

sbc100 commented Dec 4, 2017

sunfishcode commented Dec 4, 2017

sunfishcode commented Dec 5, 2017

dschuff commented Dec 7, 2017

sunfishcode commented Dec 7, 2017

sunfishcode commented Dec 3, 2017 •

edited

Loading

dschuff commented Dec 4, 2017 •

edited

Loading