Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normative: Strengthen Atomics.wait/notify synchronization to the level of other Atomics operations #1127

Merged
merged 2 commits into from
Aug 23, 2018
Merged

Normative: Strengthen Atomics.wait/notify synchronization to the level of other Atomics operations #1127

merged 2 commits into from
Aug 23, 2018

Conversation

conrad-watt
Copy link
Contributor

@conrad-watt conrad-watt commented Mar 5, 2018

See issue #1119 for background on the current behaviour.

This proposal strengthens the synchronization between a waking thread and its wakees as though they were each a coinciding atomic write/read (respectively) to the same memory location, in line with some previous comments by @lars-t-hansen on the informal language of an earlier draft of the memory model proposal.

Since each waker/wakee pair is determined purely operationally, there is no need to struggle to reassociate the events from scratch in the axiomatic semantics (a la the ReadsBytesFrom relation for Shared Data Block events). Instead, each agent tracks which waking events it is responsible for in a new [[AgentSynchronizesWith]] field of its Events Record, which is then unified with the global synchronizes-with relation.

I chose to introduce a rather general new "Synchronize" action rather than abuse any of the existing ones, since this spec machinery would be useful to specify a future fork or similar thread spawning operation, or explicit barrier operations, either in ECMAScript or in WebAssembly. This also limits the potential for unfortunate interactions with other parts of the spec which deal with Shared Data Block events.

I also rename the [[EventLists]] property of candidate executions to [[EventsRecords]] since Events Records now contain more than just EventLists, with the added bonus that it makes sentences which formerly read as "Let eventList be the [[EventList]] field of the element in execution.[[EventLists]] whose [[AgentSignifier]] is AgentSignifier()." easier to parse, considering the element of [[EventLists]] in question is actually an Events Record.

Some initial discussion points
1 -
Is "Synchronize" the right name for the new action? I mulled over "Scheduling" or something similar, but as I mentioned earlier these actions could potentially also be used to specify barriers in the future.

2 -
This change should not invalidate any user code - it is a strict subsetting of the previously allowed observable behaviour.

3 -
A scattering of informative notes throughout the PR may still be required.

4 -
On second thoughts I don't believe this is true
This new specification is still slightly weak. It doesn't incorporate all of the guarantees you might get from an OS barrier in Atomics.wait. For example, an execution for the following program is still allowed where x = 1, y = 1, z = 0.

Line Thread 1 Thread 2
1 tA[2] = 1 Atomics.store(tA, 0, 1)
2 tA[1] = 1 var x = Atomics.wake(tA, 0, 1)
3 Atomics.wait(tA, 0, 0) var y = tA[1]
4 - var z = tA[2]

In a real implementation using OS calls for wake/wait, if x = 1 I'd expect to only observe y = z = 1. However this certainly isn't a property I'd expect as many people to care about or try to rely on. In particular, the execution where x = 0, y = 1, z = 0 must be allowed by the model, so it seems that the only programs that could benefit from a further strengthened model would be rather pathological.

5 -
On x86, this specification is satisfied by implementing Atomics.wake as a regular ECMAScript atomic load followed by a bare Linux wait, and Atomics.wake as a bare Linux wake, but (I believe) it is stronger than the same on ARM/Power, which will necessitate an extra read barrier at least (see below). I'm also not familiar enough with the barrier behaviour of other OS's to know what extra barriers are required, although I would be surprised if they do not at least issue a write barrier upon waking, as Linux does. It should be noted that this is not new - ECMAScript atomic writes are stronger than the (relaxed) atomic writes of ARM/Power, and this is reasonably well-known.

6 -
Implementation schemes - a "defensive" implementation scheme to ensure conformance after this change, assuming nothing about the underlying architecture/OS, is simply to do a full barrier immediately before any call to Atomics.wake and immediately after any call to Atomics.wait - this will always produce the correct behaviour, and is analogous to the "naive code generation scheme" suggested in the ECMAScript standard for atomic loads/stores.

An implementation wishing to do better than this (although I hope there are no benchmarks that depend on a very fast Atomics.wake!) must make use of knowledge about its particular architecture/OS combination. Below is my to-the-best-of-my-knowledge opinion on possible improvements.

Atomics.wait
On x86, a read barrier after Atomics.wait is all that is required for correctness. On ARM/Power, a full barrier is required.

On Linux a full barrier is carried out by the OS call to sleep, so no additional barrier is required by the ECMAScript implementation no matter the architecture. I don't know the behaviour of other OS's off the top of my head, but I imagine similar guarantees could be determined about their barrier behaviour.

Atomics.wake
No barrier at all is required at Atomics.wake if the call to Atomics.wake does not wake any waiting threads, so the cases afterwards will assume it wakes at least one.

On x86, a write barrier is sufficient for correctness, and therefore on an OS where the underlying syscall already performs this (Linux, at least, when at least one thread is woken) no additional barrier is required in the ECMAScript implementation.

On ARM/Power, a full barrier is required overall, but if the OS performs a write barrier already, the ECMAScript implementation need only perform an additional read barrier.

@lars-t-hansen, @syg, and @rossberg PTAL.

@ljharb ljharb added normative change Affects behavior required to correctly evaluate some ECMAScript source text needs consensus This needs committee consensus before it can be eligible to be merged. labels Mar 5, 2018
@ljharb ljharb requested a review from syg March 5, 2018 06:05
@littledan
Copy link
Member

Why is this marked needs consensus? My understanding is that, although this is a normative change, it really reflects how any reasonable implementation would do things; it's more like a bug fix.

@conrad-watt
Copy link
Contributor Author

conrad-watt commented Mar 5, 2018

I'm not in a position to judge whether this needs consensus, but I briefly discuss in the implementation schemes section why this proposal might not match existing implementations, depending on how defensively they're currently synchronizing.

In short, if implementations have been treating wait/wake as like ECMAScript atomic loads/stores (in the absence of formal spec indication), they should already be conformant to this change. But implementations delegating to the OS directly without any barriers of their own might become non-conformant on weaker architectures.

To tie in to issue #1119, the existing spec is much weaker than bare OS calls. This proposal is slightly stronger, in the same way that ECMAScript atomic load/stores are slightly stronger than the relaxed atomic load/stores of ARM/Power.

@ljharb
Copy link
Member

ljharb commented Mar 5, 2018

My understanding is that any normative change needs consensus, unless in the editor's judgement it doesn't need consensus. I lack the context on Atomics to know one way or the other; I'm hoping @bterlson will have an opinion.

@syg
Copy link
Contributor

syg commented Mar 5, 2018

I suppose it's arguable whether it is a bug fix.

The strengthened behavior in the PR disallows some compiler transforms that the spec technically allows, e.g., moving non-atomic stores across Atomic.wake calls. (I say compiler because I don't think there are any OSes that would make such a thing observable.) Those transforms cause astonishment, and the broad consensus @conrad-watt referred to is that in e-mail correspondence, we agreed among us that those transforms were probably intended to be disallowed from the beginning, but slipped through the cracks.

I am comfortable treating this as a bug fix. But to fully cover our bases, we could discuss very quickly in March with other interested parties (Waldemar). I'll defer to editor group.

I'll make time to review this before next week.

@conrad-watt
Copy link
Contributor Author

If it's appropriate, I could also attend any discussion in person. I'm in London giving a presentation at the Formal Methods Meets JavaScript meeting the day before TC39 anyway.

I don't know what the rules are for external contributors attending TC39 meetings.

@conrad-watt
Copy link
Contributor Author

conrad-watt commented Mar 6, 2018

I should also point out that the current semantics of the spec are far more troubling than I've perhaps motivated in my earlier examples. In #1119 I gave an example which could look like an over-eager compiler transformation of non-atomics, but for the following program, an execution where thread 1 reads 0 at line 2 (assuming tA initially zeroed) is allowed, which seems completely incoherent, and certainly won't be observable in practice.

Line Thread 1 Thread 2
1 Atomics.wait(tA, 0, 0) Atomics.store(tA, 0, 1)
2 var x = Atomics.load(tA, 0) Atomics.wake(tA, 0, 1)

This all stems from the lack of happens-before between wake and wait, meaning that the store and load look racy to the axiomatic model.

@lars-t-hansen
Copy link
Contributor

Going back to a fairly random earlier draft, eg, https://github.com/tc39/ecmascript_sharedmem/blob/7c10268005c7ad11552aa333dc06229f29fd6c54/tc39/spec.html (mid 2016), we find that there are definitions of synchronizes-with and happens-before that include the wake -> wait edge in the synchronization order as a special case.

That draft also has a provision for "... an embedding-specific synchronizing event that has a sender and a receiver (such as sending a SharedArrayBuffer from one agent to another), the sending action synchronizes-with the receiving action". That seems reflected in current prose in the host-synchronizes-with relation.

So it would appear that the wake -> wait edge was truly left out by accident, not being a memory operation and not being a host operation. That is, this is simply a bug. Which of course should not dissuade anyone from discussing it.

@ljharb ljharb added spec bug and removed needs consensus This needs committee consensus before it can be eligible to be merged. labels Mar 6, 2018
@littledan
Copy link
Member

@conrad-watt To attend TC39 as a subject matter expert and present on a proposal, one prerequisite is to sign this form to license any IPR related to your presentation for use in the ECMAScript specification. We've historically had TC39 members invite experts to participate; not sure if any more formal signoff would be needed here, cc @ecmageneva @RexJaeschke.

It looks like you're associated with Cambridge University--Ecma offers free memberships to academic institutions which could make it easier to participate in the future.

Copy link
Contributor

@syg syg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach and logic looks good to me.

In addition to the specific changes below, please reorganize the commits into an editorial commit for the EventsRecords rename, and a normative commit for the strengthening. Their commit messages should be prefixed with "Normative:" and "Editorial:".

spec.html Outdated
</tr>
<tr>
<td>[[Order]]</td>
<td>`"SeqCst"`</td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended future-proofing for release-acquire?

The reason RMW events have a fixed [[Order]] field is because they're quantified over along with other Shared Data Blocks that refer to [[Order]] generically. Synchronize events don't participate in that stuff, so we could leave it out for simplicity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partially this, and also to typographically give Synchronize events their own table structure on par with SDB events. I've changed to a textual description, which doubles as an explanatory note.

spec.html Outdated
@@ -39885,6 +39929,18 @@ <h1>ValueOfReadEvent( _execution_, _R_ )</h1>
1. Return ComposeWriteEventBytes(_execution_, _R_.[[ByteIndex]], _Ws_).
</emu-alg>
</emu-clause>

<emu-clause id="sec-agentssynchronizewith" aoid="AgentsSynchronizeWith">
<h1>AgentsSynchronizeWith( _execution_ )</h1>
Copy link
Contributor

@syg syg Mar 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having both AgentsSynchronizeWith and AgentSynchronizesWith is confusing to me.

In conjunction with my above comment below making [[AgentSynchronizesWith]] a List, I recommend inline the logic in this AO directly into the synchronizes-with section, should be a 1-liner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@syg syg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed a comment.

spec.html Outdated
@@ -39722,6 +39750,11 @@ <h1>Agent Events Records</h1>
<td>A List of events</td>
<td>Events are appended to the list during evaluation.</td>
</tr>
<tr>
<td>[[AgentSynchronizesWith]]</td>
<td>A Relation on Synchronize events</td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change this to a List.

The framework I've tried to adhere to is that the operational semantics appends events directly to Lists, and Relations are built by the axiomatic semantics from those Lists. (Also the Relations are not constructed by AOs to avoid implying that the axiomatic semantics is a step-by-step thing.)

Copy link
Contributor Author

@conrad-watt conrad-watt Mar 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - I may submit a separate editorial PR to make the spec's general use of the word "pair" in different parts of the memory model slightly more consistent.

@conrad-watt
Copy link
Contributor Author

rebased + comments addressed, PTAL

@conrad-watt
Copy link
Contributor Author

I've been told that if I want to attend any discussion on this at next week's meeting (pending sign-off from the ECMA SG), I should note time constraints/preferences here. I would prefer the 20th, although on that day I would have to leave after 2pm. Failing that I'm available any other day for the duration.

I'm assuming this still needs to pass through the committee as a formality, in the absence of indications to the contrary.

Longer term I will look at convincing my department/university to affiliate with TC39 officially.

@IgnoredAmbience

Copy link
Contributor

@syg syg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, I like the simplifications.

Please address the one editorial comment and merge this fixup into the previous normative commit, and it should be good to go on my end.

spec.html Outdated
@@ -40009,7 +39979,7 @@ <h1>synchronizes-with</h1>
</ul>
</li>
<li>For each pair (_E_, _D_) in _execution_.[[HostSynchronizesWith]], (_E_, _D_) is in _execution_.[[SynchronizesWith]].</li>
<li>For each pair of Synchronize events (_S_, _Sw_) in AgentsSynchronizeWith(_execution_), (_S_, _Sw_) is in _execution_.[[SynchronizesWith]].</li>
<li>For each element _eventsRecord_ of _execution_.[[EventsRecords]], for each pair (_S_, _Sw_) in _eventsRecord_.[[AgentSynchronizesWith]], (_S_, _Sw_) is in _execution_.[[SynchronizesWith]].</li>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Editorial nit: I think it'd be a clearer to have the [[AgentSynchronizesWith]] addition be its own <li>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done + rebased

@ljharb ljharb added the has consensus This has committee consensus. label Mar 20, 2018
@nomadtechie
Copy link

Hi @conrad-watt @syg - I've been working on adding test coverage to Atomics.wait and Atomics.wake. What implications (if any) does this change have on test coverage?

@ajklein
Copy link
Contributor

ajklein commented Mar 21, 2018

@binji @dtig @flagxor heads-up that this change reached consensus at TC39, want to make sure V8 folks have seen it.

@conrad-watt
Copy link
Contributor Author

conrad-watt commented Mar 21, 2018

@nomadtechie here are some representative program fragments for the new behaviour. All assume tA initially zeroed. I should stress that it's very unlikely that any of these were observable before - it would have required a very specific combination of architecture + compilation scheme that I don't have any evidence actually existed in the "wild".

To what extent is test262 the right venue for these kind of concurrent "litmus" tests? They're widely used in the experimental validation of architectural models in academia, but they're highly probabilistic and are often run hundreds of times in an attempt to enumerate the space of observable behaviours.

Example 1: nonsensical, would require specific (crazy) compiler transformations to observe:

Line Thread 1 Thread 2
1 Atomics.wait(tA, 0, 0) Atomics.store(tA, 0, 1)
2 var x = Atomics.load(tA, 0) Atomics.wake(tA, 0, 1)

Before: x must be 0 or 1
Now: x must be 1

Example 2: store buffering across wake, would require a very unusual compilation scheme to observe (or memory model sensitive compiler transformations). Prevented by even underlying OS barriers on all architectures.

Line Thread 1 Thread 2
1 Atomics.wait(tA, 0, 0) tA[1] = 1
2 var x = tA[1] Atomics.wake(tA, 0, 1)

Before: Thread 1 may get stuck, but if it does not, x must be 0 or 1
Now: If thread 1 doesn't get stuck, x must be 1

This example isn't a good test, since it can potentially fail to terminate. Below is a modification which always terminates, but may fail to detect some miscompilations of wake due to extra barriers potentially introduced by the atomic store.

Line Thread 1 Thread 2
1 Atomics.wait(tA, 0, 0) tA[1] = 1
2 var x = tA[1] Atomics.store(tA, 0, 1)
3 - Atomics.wake(tA, 0, 1)

Before: x must be 0 or 1
Now: x must be 1

Example 3: load buffering across wake: my understanding is that, taking our best formal models of ARM, this is theoretically observable even with OS barriers (requires additional barriers in the compilation scheme). Of the people I talked to at TC39, everyone who was in a position to check said they're already doing the right barriers.

Line Thread 1 Thread 2
1 Atomics.wait(tA, 0, 0) var x = tA[1]
2 tA[1] = 1 Atomics.wake(tA, 0, 1)

Before: x must be 0 or 1, thread 1 may get stuck
After: x must be 0, thread 1 may get stuck

Terminating version (same caveats):

Line Thread 1 Thread 2
1 Atomics.wait(tA, 0, 0) var x = tA[1]
2 tA[1] = 1 Atomics.store(tA, 0, 1)
3 - Atomics.wake(tA, 0, 1)

Before: x must be 0 or 1
Now: x must be 0

@binji
Copy link

binji commented Mar 26, 2018

To what extent is test262 the right venue for these kind of concurrent "litmus" tests?

We discussed this some here, but the issue seems to have stalled. One issue is that the tool generated many litmus tests ("The litmus tests cover all possible memory interactions of sizes 1-4. They take up 202MB compressed and about 1.1GB uncompressed."). Another issue, as you mention, is that they are inherently probabilistic.

It seemed as though there was value to adding them to test262, though, perahps as an addendum.

@littledan
Copy link
Member

Thanks for the analysis of tests, @binji and @conrad-watt. Seems to me like this fix is ready to land without blocking on tests.

@ljharb ljharb requested review from bterlson, ljharb and bmeck April 4, 2018 14:10
@bmeck bmeck requested a review from erights August 9, 2018 18:35
@erights
Copy link

erights commented Aug 9, 2018

I think I do not need to review this. Can I painlessly withdraw as a reviewer of this?

@bmeck
Copy link
Member

bmeck commented Aug 10, 2018

@erights yup, just wanted to make sure since I was curious about this after your spectre talk and saw @syg 's comment about ordering. Was more curious if you agreed on that issue in particular. However, if nothing seems controversial we should land this at this point.

@ljharb ljharb removed the request for review from erights August 10, 2018 17:32
@ljharb
Copy link
Member

ljharb commented Aug 10, 2018

@conrad-watt would you mind rebasing this on the latest master?

@conrad-watt
Copy link
Contributor Author

done

@erights
Copy link

erights commented Aug 11, 2018

It does look like this change goes strictly in the right direction --- towards a smaller space of possible outcomes, where the remaining outcomes are more intuitive than the ones ruled out. So on these grounds I am in favor. But I have not dug into this enough to consider my positive reaction a review; nor do I expect to. Thanks.

@ljharb ljharb removed their request for review August 22, 2018 18:59
@ljharb ljharb assigned ljharb and unassigned bterlson Aug 22, 2018
Copy link
Member

@ljharb ljharb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@conrad-watt now that #1220 has landed, wake is likely no longer a term we should be using. Could you update the section i've indicated to use the updated terminology?

spec.html Outdated
1. Let _execution_ be the [[CandidateExecution]] field of the surrounding agent's Agent Record.
1. Let _eventsRecord_ be the Agent Events Record in _execution_.[[EventsRecords]] whose [[AgentSignifier]] is AgentSignifier().
1. Let _agentSynchronizesWith_ be _eventsRecord_.[[AgentSynchronizesWith]].
1. Let _wakerEventList_ be _eventsRecord_.[[EventList]].

This comment was marked as resolved.

This comment was marked as resolved.

@ljharb
Copy link
Member

ljharb commented Aug 22, 2018

(linking to #1220)

@ljharb ljharb changed the title Normative: Strengthen Atomics.wait/wake synchronization to the level of other Atomics operations Normative: Strengthen Atomics.wait/notify synchronization to the level of other Atomics operations Aug 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
has consensus This has committee consensus. normative change Affects behavior required to correctly evaluate some ECMAScript source text spec bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.