-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normative: Strengthen Atomics.wait/notify synchronization to the level of other Atomics operations #1127
Conversation
Why is this marked needs consensus? My understanding is that, although this is a normative change, it really reflects how any reasonable implementation would do things; it's more like a bug fix. |
I'm not in a position to judge whether this needs consensus, but I briefly discuss in the implementation schemes section why this proposal might not match existing implementations, depending on how defensively they're currently synchronizing. In short, if implementations have been treating wait/wake as like ECMAScript atomic loads/stores (in the absence of formal spec indication), they should already be conformant to this change. But implementations delegating to the OS directly without any barriers of their own might become non-conformant on weaker architectures. To tie in to issue #1119, the existing spec is much weaker than bare OS calls. This proposal is slightly stronger, in the same way that ECMAScript atomic load/stores are slightly stronger than the relaxed atomic load/stores of ARM/Power. |
My understanding is that any normative change needs consensus, unless in the editor's judgement it doesn't need consensus. I lack the context on Atomics to know one way or the other; I'm hoping @bterlson will have an opinion. |
I suppose it's arguable whether it is a bug fix. The strengthened behavior in the PR disallows some compiler transforms that the spec technically allows, e.g., moving non-atomic stores across I am comfortable treating this as a bug fix. But to fully cover our bases, we could discuss very quickly in March with other interested parties (Waldemar). I'll defer to editor group. I'll make time to review this before next week. |
If it's appropriate, I could also attend any discussion in person. I'm in London giving a presentation at the Formal Methods Meets JavaScript meeting the day before TC39 anyway. I don't know what the rules are for external contributors attending TC39 meetings. |
I should also point out that the current semantics of the spec are far more troubling than I've perhaps motivated in my earlier examples. In #1119 I gave an example which could look like an over-eager compiler transformation of non-atomics, but for the following program, an execution where thread 1 reads 0 at line 2 (assuming tA initially zeroed) is allowed, which seems completely incoherent, and certainly won't be observable in practice.
This all stems from the lack of happens-before between wake and wait, meaning that the store and load look racy to the axiomatic model. |
Going back to a fairly random earlier draft, eg, https://github.com/tc39/ecmascript_sharedmem/blob/7c10268005c7ad11552aa333dc06229f29fd6c54/tc39/spec.html (mid 2016), we find that there are definitions of synchronizes-with and happens-before that include the wake -> wait edge in the synchronization order as a special case. That draft also has a provision for "... an embedding-specific synchronizing event that has a sender and a receiver (such as sending a SharedArrayBuffer from one agent to another), the sending action synchronizes-with the receiving action". That seems reflected in current prose in the So it would appear that the wake -> wait edge was truly left out by accident, not being a memory operation and not being a host operation. That is, this is simply a bug. Which of course should not dissuade anyone from discussing it. |
@conrad-watt To attend TC39 as a subject matter expert and present on a proposal, one prerequisite is to sign this form to license any IPR related to your presentation for use in the ECMAScript specification. We've historically had TC39 members invite experts to participate; not sure if any more formal signoff would be needed here, cc @ecmageneva @RexJaeschke. It looks like you're associated with Cambridge University--Ecma offers free memberships to academic institutions which could make it easier to participate in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach and logic looks good to me.
In addition to the specific changes below, please reorganize the commits into an editorial commit for the EventsRecords rename, and a normative commit for the strengthening. Their commit messages should be prefixed with "Normative:" and "Editorial:".
spec.html
Outdated
</tr> | ||
<tr> | ||
<td>[[Order]]</td> | ||
<td>`"SeqCst"`</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intended future-proofing for release-acquire?
The reason RMW events have a fixed [[Order]] field is because they're quantified over along with other Shared Data Blocks that refer to [[Order]] generically. Synchronize events don't participate in that stuff, so we could leave it out for simplicity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partially this, and also to typographically give Synchronize events their own table structure on par with SDB events. I've changed to a textual description, which doubles as an explanatory note.
spec.html
Outdated
@@ -39885,6 +39929,18 @@ <h1>ValueOfReadEvent( _execution_, _R_ )</h1> | |||
1. Return ComposeWriteEventBytes(_execution_, _R_.[[ByteIndex]], _Ws_). | |||
</emu-alg> | |||
</emu-clause> | |||
|
|||
<emu-clause id="sec-agentssynchronizewith" aoid="AgentsSynchronizeWith"> | |||
<h1>AgentsSynchronizeWith( _execution_ )</h1> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having both AgentsSynchronizeWith and AgentSynchronizesWith is confusing to me.
In conjunction with my above comment below making [[AgentSynchronizesWith]] a List, I recommend inline the logic in this AO directly into the synchronizes-with section, should be a 1-liner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed a comment.
spec.html
Outdated
@@ -39722,6 +39750,11 @@ <h1>Agent Events Records</h1> | |||
<td>A List of events</td> | |||
<td>Events are appended to the list during evaluation.</td> | |||
</tr> | |||
<tr> | |||
<td>[[AgentSynchronizesWith]]</td> | |||
<td>A Relation on Synchronize events</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change this to a List.
The framework I've tried to adhere to is that the operational semantics appends events directly to Lists, and Relations are built by the axiomatic semantics from those Lists. (Also the Relations are not constructed by AOs to avoid implying that the axiomatic semantics is a step-by-step thing.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done - I may submit a separate editorial PR to make the spec's general use of the word "pair" in different parts of the memory model slightly more consistent.
rebased + comments addressed, PTAL |
I've been told that if I want to attend any discussion on this at next week's meeting (pending sign-off from the ECMA SG), I should note time constraints/preferences here. I would prefer the 20th, although on that day I would have to leave after 2pm. Failing that I'm available any other day for the duration. I'm assuming this still needs to pass through the committee as a formality, in the absence of indications to the contrary. Longer term I will look at convincing my department/university to affiliate with TC39 officially. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, I like the simplifications.
Please address the one editorial comment and merge this fixup into the previous normative commit, and it should be good to go on my end.
spec.html
Outdated
@@ -40009,7 +39979,7 @@ <h1>synchronizes-with</h1> | |||
</ul> | |||
</li> | |||
<li>For each pair (_E_, _D_) in _execution_.[[HostSynchronizesWith]], (_E_, _D_) is in _execution_.[[SynchronizesWith]].</li> | |||
<li>For each pair of Synchronize events (_S_, _Sw_) in AgentsSynchronizeWith(_execution_), (_S_, _Sw_) is in _execution_.[[SynchronizesWith]].</li> | |||
<li>For each element _eventsRecord_ of _execution_.[[EventsRecords]], for each pair (_S_, _Sw_) in _eventsRecord_.[[AgentSynchronizesWith]], (_S_, _Sw_) is in _execution_.[[SynchronizesWith]].</li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Editorial nit: I think it'd be a clearer to have the [[AgentSynchronizesWith]] addition be its own <li>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done + rebased
Hi @conrad-watt @syg - I've been working on adding test coverage to Atomics.wait and Atomics.wake. What implications (if any) does this change have on test coverage? |
@nomadtechie here are some representative program fragments for the new behaviour. All assume tA initially zeroed. I should stress that it's very unlikely that any of these were observable before - it would have required a very specific combination of architecture + compilation scheme that I don't have any evidence actually existed in the "wild". To what extent is test262 the right venue for these kind of concurrent "litmus" tests? They're widely used in the experimental validation of architectural models in academia, but they're highly probabilistic and are often run hundreds of times in an attempt to enumerate the space of observable behaviours. Example 1: nonsensical, would require specific (crazy) compiler transformations to observe:
Before: x must be 0 or 1 Example 2: store buffering across wake, would require a very unusual compilation scheme to observe (or memory model sensitive compiler transformations). Prevented by even underlying OS barriers on all architectures.
Before: Thread 1 may get stuck, but if it does not, x must be 0 or 1 This example isn't a good test, since it can potentially fail to terminate. Below is a modification which always terminates, but may fail to detect some miscompilations of wake due to extra barriers potentially introduced by the atomic store.
Before: x must be 0 or 1 Example 3: load buffering across wake: my understanding is that, taking our best formal models of ARM, this is theoretically observable even with OS barriers (requires additional barriers in the compilation scheme). Of the people I talked to at TC39, everyone who was in a position to check said they're already doing the right barriers.
Before: x must be 0 or 1, thread 1 may get stuck Terminating version (same caveats):
Before: x must be 0 or 1 |
We discussed this some here, but the issue seems to have stalled. One issue is that the tool generated many litmus tests ("The litmus tests cover all possible memory interactions of sizes 1-4. They take up 202MB compressed and about 1.1GB uncompressed."). Another issue, as you mention, is that they are inherently probabilistic. It seemed as though there was value to adding them to test262, though, perahps as an addendum. |
Thanks for the analysis of tests, @binji and @conrad-watt. Seems to me like this fix is ready to land without blocking on tests. |
I think I do not need to review this. Can I painlessly withdraw as a reviewer of this? |
@erights yup, just wanted to make sure since I was curious about this after your spectre talk and saw @syg 's comment about ordering. Was more curious if you agreed on that issue in particular. However, if nothing seems controversial we should land this at this point. |
@conrad-watt would you mind rebasing this on the latest master? |
done |
It does look like this change goes strictly in the right direction --- towards a smaller space of possible outcomes, where the remaining outcomes are more intuitive than the ones ruled out. So on these grounds I am in favor. But I have not dug into this enough to consider my positive reaction a review; nor do I expect to. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@conrad-watt now that #1220 has landed, wake
is likely no longer a term we should be using. Could you update the section i've indicated to use the updated terminology?
spec.html
Outdated
1. Let _execution_ be the [[CandidateExecution]] field of the surrounding agent's Agent Record. | ||
1. Let _eventsRecord_ be the Agent Events Record in _execution_.[[EventsRecords]] whose [[AgentSignifier]] is AgentSignifier(). | ||
1. Let _agentSynchronizesWith_ be _eventsRecord_.[[AgentSynchronizesWith]]. | ||
1. Let _wakerEventList_ be _eventsRecord_.[[EventList]]. |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
(linking to #1220) |
See issue #1119 for background on the current behaviour.
This proposal strengthens the synchronization between a waking thread and its wakees as though they were each a coinciding atomic write/read (respectively) to the same memory location, in line with some previous comments by @lars-t-hansen on the informal language of an earlier draft of the memory model proposal.
Since each waker/wakee pair is determined purely operationally, there is no need to struggle to reassociate the events from scratch in the axiomatic semantics (a la the ReadsBytesFrom relation for Shared Data Block events). Instead, each agent tracks which waking events it is responsible for in a new [[AgentSynchronizesWith]] field of its Events Record, which is then unified with the global synchronizes-with relation.
I chose to introduce a rather general new "Synchronize" action rather than abuse any of the existing ones, since this spec machinery would be useful to specify a future fork or similar thread spawning operation, or explicit barrier operations, either in ECMAScript or in WebAssembly. This also limits the potential for unfortunate interactions with other parts of the spec which deal with Shared Data Block events.
I also rename the [[EventLists]] property of candidate executions to [[EventsRecords]] since Events Records now contain more than just EventLists, with the added bonus that it makes sentences which formerly read as "Let eventList be the [[EventList]] field of the element in execution.[[EventLists]] whose [[AgentSignifier]] is AgentSignifier()." easier to parse, considering the element of [[EventLists]] in question is actually an Events Record.
Some initial discussion points
1 -
Is "Synchronize" the right name for the new action? I mulled over "Scheduling" or something similar, but as I mentioned earlier these actions could potentially also be used to specify barriers in the future.
2 -
This change should not invalidate any user code - it is a strict subsetting of the previously allowed observable behaviour.
3 -
A scattering of informative notes throughout the PR may still be required.
4 -
On second thoughts I don't believe this is true
This new specification is still slightly weak. It doesn't incorporate all of the guarantees you might get from an OS barrier in Atomics.wait. For example, an execution for the following program is still allowed where x = 1, y = 1, z = 0.In a real implementation using OS calls for wake/wait, if x = 1 I'd expect to only observe y = z = 1. However this certainly isn't a property I'd expect as many people to care about or try to rely on. In particular, the execution where x = 0, y = 1, z = 0 must be allowed by the model, so it seems that the only programs that could benefit from a further strengthened model would be rather pathological.5 -
On x86, this specification is satisfied by implementing Atomics.wake as a regular ECMAScript atomic load followed by a bare Linux wait, and Atomics.wake as a bare Linux wake, but (I believe) it is stronger than the same on ARM/Power, which will necessitate an extra read barrier at least (see below). I'm also not familiar enough with the barrier behaviour of other OS's to know what extra barriers are required, although I would be surprised if they do not at least issue a write barrier upon waking, as Linux does. It should be noted that this is not new - ECMAScript atomic writes are stronger than the (relaxed) atomic writes of ARM/Power, and this is reasonably well-known.
6 -
Implementation schemes - a "defensive" implementation scheme to ensure conformance after this change, assuming nothing about the underlying architecture/OS, is simply to do a full barrier immediately before any call to Atomics.wake and immediately after any call to Atomics.wait - this will always produce the correct behaviour, and is analogous to the "naive code generation scheme" suggested in the ECMAScript standard for atomic loads/stores.
An implementation wishing to do better than this (although I hope there are no benchmarks that depend on a very fast Atomics.wake!) must make use of knowledge about its particular architecture/OS combination. Below is my to-the-best-of-my-knowledge opinion on possible improvements.
Atomics.wait
On x86, a read barrier after Atomics.wait is all that is required for correctness. On ARM/Power, a full barrier is required.
On Linux a full barrier is carried out by the OS call to sleep, so no additional barrier is required by the ECMAScript implementation no matter the architecture. I don't know the behaviour of other OS's off the top of my head, but I imagine similar guarantees could be determined about their barrier behaviour.
Atomics.wake
No barrier at all is required at Atomics.wake if the call to Atomics.wake does not wake any waiting threads, so the cases afterwards will assume it wakes at least one.
On x86, a write barrier is sufficient for correctness, and therefore on an OS where the underlying syscall already performs this (Linux, at least, when at least one thread is woken) no additional barrier is required in the ECMAScript implementation.
On ARM/Power, a full barrier is required overall, but if the OS performs a write barrier already, the ECMAScript implementation need only perform an additional read barrier.
@lars-t-hansen, @syg, and @rossberg PTAL.