-
Notifications
You must be signed in to change notification settings - Fork 261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Roadmap to determinism in WASI #190
Comments
Looks good to me! Should we also address potential races with other processes and/or threads or just leave that unaddressed as out-of-scope-for-now? |
Here are a few more ideas: Filesystems:
Other:
|
cookies and fds could be fixed by using anyref, right? |
Yeah. To handle cookies with anyref we'd probably need to reorganize the |
@kubkon I probably meant a method which implied using multiple threads, but are such methods in scope for now? If we have multiple threads, then poll could be leveraged to get a safe race condition (i.e. a race condition which doesn't lead to undefined behavior), but this required multiple threads. There are other places where one could get a safe race conditions, e.g. IMO, there are other, more fundamental ways of achieving non-determinism in a single-threaded WASI program: (I'm not sure if the runtime takes any special care about any of these issues)
As for clocks, we could probably expose a logical clock, where every query would increase the time by (say) 1 second. This may be useful for programs which use clocks in some minor way. As for randomness, disabling it completely may be too heavy-handed an approach. Some applications require pseudo-randomness, in particular some numerical simulations or Monte Carlo methods. Some simple approach would be to allow passing a seed value for runtime and use a seeded, thread-specific deterministic rng with the seed provided. |
Some of these items are not issues in wasm:
|
|
The issue of filenames is so incredibly complex that I wonder if it is even possible in the first place without exposing some details of the OS to the software, or if the only solution that would give complete determinism is to expose a virtual filesystem inside of an image. Take case mapping for example. I'm fairly sure that both Apple's and Microsoft's implementations of it would allow certain filename pairs that the other wouldn't. A cross-platform set of rules which doesn't allow anything which would be forbidden on any major platform might work for a while, but in the case of Windows the case mapping isn't stable, and is subject to change with updates. After such an update, it would be possible for OS-specific behaviour to suddenly become visible. |
Yes, the only possibility to make this deterministic is by using a virtual FS layer for WASI that does not even have text as its paths but an array of numeric VFS node identifiers.
So the safest strategy would be to go numeric. Have a virtual node Note that this is just all a maximum caution suggestion post. Whatever people may come up with here may look very differently. |
@Evrey Even then, normal OS paths would have to be read-only, to prevent experimentation with file naming rules. My thoughts on this would be that it would make sense to have two tiers of filesystem access. A virtual but completely deterministic filesystem, which could be used for things like reproducible builds, and nondeterministic access to the real filesystem. |
We could make a capability that allows this two-tiered access - when used, filename validation would be skipped by the runtime so that any filename would be allowed (as long as the underlying os allows it) |
This thread exposed a lot more potential problems than I expected, some of which could potentially also impact portability IMHO, so I'm glad I started this discussion. Kudos to everyone for active participation and ideas, this is highly appreciated! :-) In terms of the problem of nondeterminism, I was wondering, if we were to constrain the problem space somewhat more to the application of simple compute functions (I guess you could think AWS Lambda or similar) and worry only about deterministic result of the computation, then I figured we potentially wouldn't have to touch so many avenues where everything might go wrong. In essence, since one of the cornerstones of WASI is preopens, I was wondering if we could toy with the idea of "preopening" any sort of a byte stream which then would get assigned an Here, such a byte stream could be anything from a preopened and preloaded file from host to a (named) pipe to a socket. Now that I mention all this, I believe someone already mentioned something similar in the past. @sunfishcode was it you or perhaps @pchickey? In any case, IMHO this kind of abstraction would lend itself nicely to the problem of edge/serverless computing, or in fact, any environment where we'd like to run a MapReduce style problems in a distributed fashion (deterministic or not). In any case, I hope this makes sense, if not, please shout out. I'd love to hear your thoughts on this! |
@kubkon Not entirely by accident, this aligns well with the design principles I'm proposing, specifically the parts about shared filesystem views :-). |
@sunfishcode Oh oops, my apologies! Somehow, I forgot about this completely! Anyhow, this is excellent and indeed aligns itself really well the your proposal in #192. Very exciting! :-) |
Out of curiosity and in preparation for my talk about determinism in WASI in distributed settings at FOSDEM2020, I've started tinkering a little bit with @sunfishcode I'd love your input on this and how this could perhaps be a starting point for discussions towards facilitating (even as a draft) your proposal on the design principles! |
Perhaps |
On the topic of cryptographic uses of randomness versus determinism, I filed WebAssembly/wasi-crypto#30 to investigate whether wasi-crypto is deterministic. It's interesting because it allows applications to do crypto without interacting with private key data directly. Beyond that, thinking about randomness APIs, I think we can make two top-level categories that we care about:
|
I think this discussion would be helped by incorporating the interplay between determinism and reproducibility. Determinism is not something that is desirable in most deployments, as it can lead to DoS attacks and screws over crypto in ways that may be difficult to fix. If a programmer wants to introduce non-determinism, they could exploit side channels or simply engage in feature detection. Usually when someone says they want determinism, what they really want is reproducibility. A CPU or bytecode should be fully deterministic, but a runtime should be reproducible. For example, if you need a seed for hash tables that is externally difficult to guess but not cryptographically secure, then an I would still lockout any non-deterministic APIs. So a real |
@indolering Deterministic math (arithmetic, trigonometry, and random) is important in game applications. Synchronizing PRNG for accurate real-time simulation via entity replication over network (multi-player) is often one use case. The question isn't so much whether one or the other should be default, rather how interfaces for both could become standard. That's because it's a lot of work to make the math deterministic AND cross-platform. Certainly in the case of cryptographic security, you would not use the deterministic functions (ie. a game application would use separate RNG functions for each case.) |
The discussion about enforcing (ensuring?) determinism in WASI has already been started and touched upon in a couple of issues here and there (#185, #118, bytecodealliance/wasmtime#748, if I missed any, please feel free to mention it in this thread). I'd like to gather all the knowledge, ideas, perceived issues, etc. here creating essentially a meta-issue that we could use to track this, and come up with solutions, or at least guidance as to what direction to take.
I'll try and describe all potential sources of nondeterminism below leaving out sockets for now though. Feel free to correct me, add more, etc.
Randomness and entropy
This is an obvious one, and from what I understand, the current consensus is to have it require a capability (see #185 and bytecodealliance/wasmtime#748 for more details).
random_get
also will get its own module in the upcoming WASI snapshot: wasi_ephemeral_random.witx.Clocks
Access to system/thread/process clocks will also lead to nondeterminism, and as far as I understand, like in the randomness case, the consensus is to have it require a capability (see #118 and bytecodealliance/wasmtime#748 for more details). Also as in the randomness case,
clock_time_get
will get its own module in the upcoming WASI snapshost: wasi_ephemeral_clock.witx.File access/modification/change times
This one concerns four WASI syscalls that may introduce nondeterminism into the picture, namely:
fd_filestat_get
,fd_filestat_set_times
,path_filestat_get
, andpath_filestat_set_times
. The nondeterminism may sneak in if the client app makes use in some way of the access (atim
), modification (mtim
) or change (ctim
) times of a file descriptor which can change between any two runs of the app.I'm not sure what the best approach to handle this would be, so I'd like to start some brain storming on this. Could we only perhaps populate the filestat time values if a clock capability was requested/provided? Or introduce a different type of capability?
Readdir
This one is potentially of lesser importance/impact since, I assume that on the same host,
fd_readdir
syscall should return the same ordering of the entries---according to the macOS man ofreaddir
I assume similar will hold on all *nixes, so as long as the same host with the same filesystem is used, the order should be the same between the app runs. The problem, however, may become more pronounced in distributed settings where we'd like to execute an app on two unknown and potentially different hosts, and expect deterministic, comparable results on both.
There already was some discussion about ordering of results, seeking, and
fd_readdir
in general in #61.Poll
I'm adding that one in since I remember having a discussion with @marmistrz about this one, and he was convinced he could generate entropy with a clever use of
poll_oneoff
, hence bringing in nondeterminism into the picture. @marmistrz perhaps you could shed more light on this?The text was updated successfully, but these errors were encountered: