-
Notifications
You must be signed in to change notification settings - Fork 695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Atomic and thread operators #1019
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a way to identify post-MVP features. ⚛️ is an easy on here! We then need a page that identifies all these symbols (MVP, each post-MVP, and 🦄 as "future").
This is also missing the specification of how non-atomic accesses behave. We could just wait for the JS work to finish thought?
How do we intend to move such proposals forward?
Semantics.md
Outdated
@@ -42,7 +42,7 @@ environment such as a browser, a trap results in throwing a JavaScript exception | |||
If developer tools are active, attaching a debugger before the | |||
termination would be sensible. | |||
|
|||
## Stack Overflow | |||
## Stack overflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capitalization was inconsistent in the file, but seemed to be "Upper lower lower." Probably should be a different change though.
Semantics.md
Outdated
@@ -81,7 +81,7 @@ Note that the value types `i32` and `i64` are not inherently signed or | |||
unsigned. The interpretation of these types is determined by individual | |||
operators. | |||
|
|||
## Linear Memory | |||
## Linear memory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
Semantics.md
Outdated
memory if the module's memory import doesn't specify that it allows shared | ||
memory. | ||
|
||
### Linear memory accesses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capitalization.
Semantics.md
Outdated
| `i64.atomic_xchg32_u` | 4 bytes, zero-extended i32 to i64 | nop | 4 bytes, wrapped from i64 to i32 | | ||
| `i64.atomic_xchg` | 8 bytes | nop | 8 bytes | | ||
|
||
The atomic compare exchange RMW operators take three operands: a linear memory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kinda weird to classify cmpxchg as RMW. It's a 3rd category IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Semantics.md
Outdated
@@ -211,7 +356,7 @@ Thus, it is recommend that WebAssembly producers align frequently-used data to | |||
permit the use of natural alignment access, and use loads and stores with the | |||
greatest alignment values practical, while always avoiding misaligned accesses. | |||
|
|||
### Out of Bounds | |||
### Out of bounds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
Semantics.md
Outdated
@@ -461,7 +606,7 @@ supported (including NaN values of all possible bit patterns). | |||
* `f32.const`: produce the value of an f32 immediate | |||
* `f64.const`: produce the value of an f64 immediate | |||
|
|||
## 32-bit Integer operators | |||
## 32-bit integer operators |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
Semantics.md
Outdated
## Thread operators | ||
|
||
* `is_lock_free`: return 1 if an atomic memory access of a given size can be | ||
completed without the execution engine acquiring a lock. If not, return `0`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you import the wording from https://tc39.github.io/ecma262/ ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Semantics.md
Outdated
@@ -678,6 +823,48 @@ outside the range which rounds to an integer in range) traps. | |||
This trap is intended to be impossible for user code to catch or handle, even in the future when it may be possible to | |||
handle some other kinds of traps or exceptions. | |||
|
|||
## Thread operators | |||
|
|||
* `is_lock_free`: return 1 if an atomic memory access of a given size can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code-quote 1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Semantics.md
Outdated
memory. | ||
|
||
The wait operators take three operands: a linear memory index, an | ||
expected value, and a relative timeout in milliseconds. The wait operation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it be negative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Described below, but I moved it up.
Good point, perhaps we should open a separate issue to discuss?
Yeah, I was considering just referencing the ecma262 memory model, but it's not immediately obvious to a reader how it would map.
I seem to remember there being a plan about this. Little help, anyone? :-) |
I think we're still figuring out what, post-MVP, is the process for proposing extensions. Also, with the concurrent emergence of a proper spec in the spec repo, the shape of a PR/proposal might be quite different. |
I'm not sure we want to block everything on the spec repo at this point in time. |
I guess it depends on how long each of these things (process for evolving post-MVP, spec repo, threads) take. |
Agree we don't want to block if possible. |
BinaryEncoding.md
Outdated
|
||
## Atomic compare exchange operators ([described here](Semantics.md#atomic-memory-accesses)) | ||
|
||
| `i32.atomic_cmpxchg` | `0xe61c` | `memory immediate` | atomic compare exchange with memory | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The formatting on this table is broken, possible because of the lack of a heading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, thanks!
BinaryEncoding.md
Outdated
| `i32.atomic_xor8_s` | `0xe612` | `4` `memory immediate` | atomic xor with memory | | ||
| `i32.atomic_xor8_u` | `0xe613` | `4` `memory immediate` | atomic xor with memory | | ||
| `i32.atomic_xor16_s` | `0xe614` | `4` `memory immediate` | atomic xor with memory | | ||
| `i32.atomic_xor16_u` | `0xe615` | `4` `memory immediate` | atomic xor with memory | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Names like i32.atomic_xor16_s
are a little awkward with _s
/_u
apparently associated with otherwise sign-agnostic operators. What would you think of any of the following conventions?
i32.atomic16_s.xor
i32.atomic_rmw16_s.xor
i32.atomic.rmw16_s.xor
(in this case, following atomic with.
could be a broader convention)
These would make it clear that the xor isn't signed itself; it's just the loaded value from the rmw that's being sign-extended. This isn't super important with the current opcodes, but could be in the future if wasm ever wants atomic rotate, popcnt, or other operators which care about the number of bits in the actual arithmetic part.
Such a clarification would also align with an idea anticipated here concerning the possibility of adding a real add_s
and similar operators (which could trap or do other things on overflow) in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like i32.atomic.rmw16_s.xor
, it nicely matches the proposed opcode layout too. Though I wonder about xchg
and cmpxchg
:
i32.atomic.rmw16_s.xchg
? And as @jfbastien mentioned, cmpxchg isn't really RMW, so i32.atomic.cmpxchg16_s
? And I suppose for the non-extending forms:
i32.atomic.rmw.xor
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i32.atomic.cmpxchg16_s
and i32.atomic.rmw.xor
seem to fit the pattern nicely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, done
@flagxor I'm not sure what would be better without knowing what the intended process is for proposing and making changes. Since threads are a high priority, we could just cheat and carry on like we did pre-MVP. |
BinaryEncoding.md
Outdated
| Name | Opcode | Immediate | Description | | ||
| ---- | ---- | ---- | ---- | | ||
| `i32.atomic_load` | `0xe628` | `memory immediate` | atomic load from memory | | ||
| `i64.atomic_load` | `0xe629` | `memory immediate` | atomic load from memory | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all these load/store ops, could we reuse the existing load/store ops, adding a "consistency" annotation to the memory_immediate.flags
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and I believe that's what was discussed before. But I had a thought it might be nice to have all atomic/threading operators use the same prefix, which led me down this route. One additional nicety is that the opcodes for atomic load/store operators match the non-atomic ones, e.g. i32.atomic_load
is 0xe628
and i32.load
is 0x28
, etc.
Since the opcode specifies that it is atomic, this also means that we don't need a special "non-atomic" value for the memory immediate for RMW ops too.
But I'm not married to the idea, just thought it was kinda neat. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, those are fair points. I guess we are already adding a bunch of operators, so it's not a huge difference, and there would be a natural extension for using the atomic prefix on global ops (when we eventually allow shared globals).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm partial to calling the "atomic" prefix the "lock" prefix, and numbering it 0xF0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jfbastien can't tell if serious...
Semantics.md
Outdated
|
||
| Name | Read | Modify | Write | | ||
| ---- | ---- | ---- | ---- | | ||
| `i32.atomic_cmpxchg8_s` | 1 byte, sign-extended i8 to i32 | nop | (conditional) 1 byte, wrapped from i32 to i8 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One detail that's not clear from this presentation is whether the comparison is the sign-extended value with the full 32 bits of the expected value, or the expected value wrapped to 8 bits with the 8-bit loaded value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to clarify, ptal
What about atomic operations on globals? |
@AndrewScheidecker I think first we need to define a new |
Semantics.md
Outdated
|
||
| Name | Read | Modify | Write | | ||
| ---- | ---- | ---- | ---- | | ||
| `i32.atomic.rmw8_s.add` | 1 byte, sign-extended i8 to i32 | sign-agnostic addition | 1 byte, wrapped from i32 to i8 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cmpxchg clarification looks good. So good, that I'm now wondering about a Return column for the rmw table too :-). This is admittedly a really minor nit at this point because none of the current rmw operators care whether their inputs are extended or wrapped, but if we add other rmw operators in the future, they could potentially care.
Semantics.md
Outdated
## Thread operators | ||
|
||
* `is_lock_free`: If the atomic step of an atomic primitive (see | ||
[Atomic Memory Accesses](#atomic-memory-accesses)) on a datum of size N |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is N coming from, is that an i32 operand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Semantics.md
Outdated
* `f32.atomic.store`: (no conversion) atomically store 4 bytes | ||
* `f64.atomic.store`: (no conversion) atomically store 8 bytes | ||
|
||
Atomic read-modify-write (RMW) operators atomically read a value from a memory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you specify in which order the operands are expected on the stack for these instructions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find another place in the doc to crib from, so I just specified it by describing the operand in order. Should I be more explicit?
Semantics.md
Outdated
| `i64.atomic.rmw32_u.xchg` | 4 bytes | nop | 4 bytes | zero-extended i32 to i64 | | ||
| `i64.atomic.rmw.xchg` | 8 bytes | nop | 8 bytes | as i64 | | ||
|
||
The atomic compare exchange operators take three operands: a linear memory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: the linear memory is an immediate, not an operand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, I meant address. :-)
Semantics.md
Outdated
change. It is a validation error to use these operators on non-shared linear | ||
memory. | ||
|
||
The wait operators take three operands: a linear memory index, an expected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: as above, I suppose the memory is an immediate.
BinaryEncoding.md
Outdated
|
||
| Name | Opcode | Immediate | Description | | ||
| ---- | ---- | ---- | ---- | | ||
| `is_lock_free` | `0xe600` | | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we allocate prefix opcodes at the end of the opcode space instead of the middle? So 0xff
for threads being the first such proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Semantics.md
Outdated
|
||
### Wake | ||
|
||
The wake operators take two operands: a linear memory index, and a wake count. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: same here.
BinaryEncoding.md
Outdated
| `i32.wait` | `0xe601` | `offset` | | | ||
| `i64.wait` | `0xe602` | `offset` | | | ||
| `i32.wake` | `0xe603` | `offset` | | | ||
| `i64.wake` | `0xe604` | `offest` | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't these all have memory immediates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
BinaryEncoding.md
Outdated
| `i64.atomic.rmw16_u.xor` | `0xe619` | `4` `memory immediate` | atomic xor with memory | | ||
| `i64.atomic.rmw32_s.xor` | `0xe61a` | `4` `memory immediate` | atomic xor with memory | | ||
| `i64.atomic.rmw32_u.xor` | `0xe61b` | `4` `memory immediate` | atomic xor with memory | | ||
| `i32.atomic.rmw.xchg` | `0xe610` | `5` `memory immediate` | atomic exchange with memory | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is xchg actually a RMW operation? Seems like it could be its own group, with names like i32.atomic.xchg16_s
, in line with cmpxchg
.
Semantics.md
Outdated
| `i64.atomic.rmw32_s.xor` | 4 bytes | 32-bit sign-agnostic bitwise exclusive or | 4 bytes | sign-extended i32 to i64 | | ||
| `i64.atomic.rmw32_u.xor` | 4 bytes | 32-bit sign-agnostic bitwise exclusive or | 4 bytes | zero-extended i32 to i64 | | ||
| `i64.atomic.rmw.xor` | 8 bytes | 64-bit sign-agnostic bitwise exclusive or | 8 bytes | as i64 | | ||
| `i32.atomic.rmw8_s.xchg` | 1 byte | nop | 1 byte | sign-extended i8 to i32 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is xchg actually a RMW operation? It seems like it would fit better with the cmpxchg
group -- it's an unconditional version of it. With analogous names like i32.atomic.xchg16_s
. Or as its own group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xchg is usually considered an RMW operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with it either way, but it probably shouldn't be grouped with cmpxchg
since those have 3 operands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm primarily wondering by what criterion xchg is RMW but cmpxchg is not. Some consistency would be nice.
Bikeshedding more generally, why the complicated names in the first place? Wouldn't i32.atomic.add8_s etc work just fine? Sure it's RMW, but that's kind of implied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cmpxchg
(compare-and-swap) is fundamentally more powerful than other RMW operations. Between that, and having a distinct type signature, it commonly gets its own category, even if it is also an RMW operation.
cmpxchg
isn't just a conditional version of xchg
. For example, an abstract formal semantics wouldn't be able to reduce cmpxchg
to xchg
when the compare succeeds; that would imply the comparison is a separate step from the swap, breaking atomicity.
add8_s
under wasm's existing naming scheme would seem to imply that the add
, or
, and
, xor
, etc., are signed/unsigned, which is inconsistent with the existing operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sunfishcode, interesting, thanks for the pointer!
Regarding the naming, I see your argument. My only remaining quibble is that, even being in different expressive categories, it would still be nice to have a somewhat more symmetric naming scheme between cmpxchg and the other RMW operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i32.atomic.rmw8_s.cmpxchg
would be the maximally symmetric convention, it seems. I don't think it's critical, but I wouldn't be opposed to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Semantics.md
Outdated
The atomic compare exchange operators take three operands: a linear memory | ||
index, an `expected` value, and a `replacement` value. The operators | ||
conditionally store the `replacement` value, but only if the `loaded` value is | ||
equal to the `expected` value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic of this sentence seems a little strange to me. "X conditionally does Y, but only when Z".. I think what you are trying to say is that its always conditional, and then describe said condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
I think we've got a great opportunity to call the "atomic" prefix the "lock" prefix, and number it |
BinaryEncoding.md
Outdated
| `i64.wake` | `0xc3` | `offest` | SAB like wake, takes an index + `offset` | | ||
| `i32.thread_create` | `0xc4` | `table` | Creates a thread, stores thread in `table` at an i32 index. Returns 0 on success or non-zero otherwise. | | ||
| `i32.thread_join` | `0xc5` | `table` | Waits to join to thread stored at an i32 index in `table`. Returns 0 on success or non-zero otherwise. | | ||
| `thread_exit` | `0xc6` | | Halts current thread. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These three thread ops were dropped in 4b94ac6#diff-7c076c52f23f4a770974fb209259f34bL962 seemingly without comment (that I can find). Is there somewhere else they went?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to document this! Is the intent to start off with workers, mimic JS' API, and then add better capabilities later?
Threads.md
Outdated
threads of execution. The shared memory can be imported or defined in the | ||
module. It is a validation error to attempt to import shared linear | ||
memory if the module's memory import doesn't specify that it allows shared | ||
memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be good to flesh this out a bit more, including these sorts of details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some of the details, PTAL (still no JS.md stuff yet)
We need to specify:
|
OK, I've added a little bit here about agents/agent clusters, mostly copied from https://tc39.github.io/ecma262/#sec-agents. Not sure how much detail we want to go into here. Gonna write up some JS.md stuff next. |
... | ||
|
||
If linear memory is marked as shared, the | ||
[maximum memory size](Modules.md#linear-memory-section) must be specified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same restriction for JS.md WebAssembly.Memory.
`grow_memory` of a shared linear memory is allowed, but may fail if the new | ||
size is greater than the specified maximum size, or if reserving the additional | ||
memory fails. All agents in the executing agent's cluster will have access to | ||
the additional linear memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to discuss neutering?
Yay, bikeshedding! Count me in! Reserving the last row of 16 bytes for extension prefixes sounds like a good idea to me. In particular because the potential cost of some future version running out of possibilities for extension seems far higher than the cost of putting some of the more rare prospective one byte opcodes behind a prefix due to lack of space in the lower bytes. Putting the first extension byte at 0xf0 would be a nice way to mark off that reserved range. Opcode space fragmentation would then be a feature here, not a bug, keeping room for (possibly unanticipated) future extensions. The only freedom gained by going backwards from 0xff (i.e. defragmentation) seems to be to allow for the possibility of one byte opcodes 'bleeding' into that reserved range, which I think is exactly what one would like to prevent. Also, extension opcodes could still grow from 0xf0 backwards if at some point in the future it becomes clear that more than 16 prefixes are needed and one byte opcodes haven't filled the lower bytes yet. So one doesn't give up any growth potential for extensions here compared to starting from 0xff and going backwards, the only difference is the explicit reservation of the last row of bytes. It also feels kind of awkward to spec all the extension opcodes backwards, in a list of operations one would either end up with features running from most obscure to most common, or otherwise with opcodes being listed backwards for extension prefixes and the other way around for the rest... just feels icky :) |
One question that also needs answering is how shared memories interact with instantiation. In particular, what is the amount of atomicity, if any, that is provided for writes performed for data sections? |
Good point. Ordering of these as well (i.e. are they nonatomic writes, from start to end, or are they sequentially consistent?). I assume we don't intend to allow shared Tables for now? |
w.r.t. misaligned atomic accesses trapping:
|
Some ISAs "just work", others tear (sometimes just when crossing cachelines), others silently align, and others trap. It's unclear which of these ISAs we care about.
No: there's no "polyfill" possible. having a side-lock for misaligned accesses won't do the right thing.
We'd need to quantify how many atomic instructions actually occur. I expect very few, so low cost. Would be good to confirm. |
What about just x86/x64/ARM32/ARM64/MIPS?
Well, not a polyfill that provides atomicity, just a path that does the load non-atomically w/o trapping. Auto-aligning via mask (like asm.js+SAB would do) would be another option. |
The behavior I described is covered by just x86 and ARM. Which ARM is another question. The auto-align and trapping ones aren't the mainstream ones AFAIK (or rather, the ones which run browsers, definitely true for "embedded" types ARMs).
If the intent is to have atomicity, except the code is OK without atomicity, then... Why even bother with atomicity? The polyfill you describe makes no sense. We could auto-align, but I don't see how that's useful compared to trapping. Trapping is much more consistent with what WebAssembly otherwise does, and as I mentioned above I think the cost is minimal given how few of these instructions I expect to see. |
Again, I'm not talking about a "polyfill"; I'm wondering whether we'd get into a situation (as we've seen with the trapping conversion ops) where a compiler is forced to keep code "working" (i.e, not trapping) by adding its own branches/masks so that unaligned atomic accesses don't trap.
I'm totally not suggesting that; I'm anti-suggesting that, even. I'm trying to determine whether it's better for us to not trap since, it sounds like, on all our tier 1 platforms, that's what the ISA gives us and trapping just adds several minor sources of overhead. |
That's what I was calling a polyfill. I don't think it can be made to work.
Sure, I'm listing possibilities.
I understand what you're asking. I just don't understand which alternative you prefer out of the ones I listed. All our tier 1 platforms don't have the same behavior. |
It depends what you mean by "work". If "work" means "loads the bytes as if by a non-atomic load", then sure it can.
Ok, maybe I've misunderstood the cases here. I assumed there are three: trap, auto-align (boo!), do the unaligned load (without atomicity guarantees). Is that right? If so, then when you said above "The auto-align and trapping ones aren't the mainstream ones AFAIK", I assumed that meant all the tier 1 platforms are in the last case. |
But then it's non-atomic. Why would a user bother with atomics if their code is OK not being atomic? I'm totally happy making that type of code slower.
AFAIK x86 "just works" and will be atomic even across cache lines, whereas "mainstream" mobile ARMs will tear. These are "tier 1". Other ARMs (e.g. for embedded) will trap or auto-align. I agree we should tune for tier 1. We shouldn't ignore tier 2 if it's easy. In this case tier 1 differs, and trapping is cheap enough that I'd go with that because it prevents "I tested on Chrome on x86 and it worked, ship it!". Silently tearing is really bad! It's hard to reliably observe. |
If the compiler has to blindly work around it (like it had to do with float conversion traps), it would penalize the whole app, not just the dumb code. Then again, maybe explicit atomics + unaligned bugs are a rare enough combination in practice that this situation wouldn't arise in practice.
Yeah, I think that is the central tradeoff: making all tier-1 fast paths slower for some amount of bug-catching. It'd certainly help to have some measurements here. |
If the loaded value is not equal to the expected value, the operator returns 1 | ||
("not-equal"). If the values are equal, the agent is suspended. If the agent | ||
is woken, the wait operator returns 0 ("ok"). If the timeout expires before | ||
another agent wakes this one, this operator returns 2 ("timed-out"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you know, in the HTML embedding of ES2017 we do not allow Atomics.wait to wait on the main thread of a window (or, in Firefox at present, on the main thread of a SharedWorker). In ES, a call to Atomics.wait on such a thread results in a TypeError being thrown. Wasm can be used on the main thread, so is there going to be the same restriction there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I never really liked this AgentCanSuspend compromise since it seems like it would just lead to spin-locking instead. That being said, it seems like wasm should be symmetric with JS here, so either we make the restriction too or we try to remove the restriction from JS. If we make the restriction, it seems like we'd throw a RuntimeError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will lead to some spin-locking, but those that dabble in that will quickly discover that the browser locks up, as the main thread is not able to service requests made on behalf of other parts of the system. (Though I sort of doubt that will be apparent to many of the people bitten by this problem.)
(Edited to add:)
IMO we should expect to live with this problem and to ease the creation of content that lives with it, too, by providing structures for easily running Wasm off-main-thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I think we need to be symmetric with JS here.
|
||
The embedder is also permitted to suspend or wake an agent. A suspended agent | ||
can be woken by the embedder or the wake operator, regardless of how the agent | ||
was suspended (e.g. via the embedder or a wait operator). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this attempting to formalize the push-into-history / restore-from-history mechanism of browsers, or to allow spurious wakeups (which are not in the ES version of Atomics.wait)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, it was allowing spurious wakeups, but of course you're right, that's not allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry -- this is meant to allow the thread to be suspended/woken by JavaScript as well as WebAssembly. But it's a good point that it is similar to a spurious wakeup. I wonder if there's a good way to make a distinction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you reference the abstract WakeWaiter operation? If the wasm wake
operator uses that, that'll unify wasm with how the embedder can explicitly wake up agents.
I created a new repo for this: https://github.com/WebAssembly/threads |
Moved here: WebAssembly/threads#1, I'll close this for now. Please move relevant comments there! |
This is an initial pass at adding some atomic and thread operators, based on some work that @flagxor did earlier.
There's still a lot that needs to be fleshed out, but I wanted to get something going.