Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR for blog post on async-iterators #1

Open
wants to merge 14 commits into
base: blog-post-baseline
Choose a base branch
from
Open

Conversation

jcouv
Copy link
Owner

@jcouv jcouv commented Jan 2, 2019

No description provided.

## Consumption with `await foreach`

The `await foreach` statement offers the most straightforward way of enumerating an async-enumerable's items.
It is very similar to `foreach` for enumerables. The main difference is that `await foreach` awaits each item (and also disposal).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awaits each item => awaits the retrieval of each item?

As the code continues to execute in the background:
- if it reaches a `yield return`, the background execution completes and the task we've previously handed to the caller is fulfilled with `true`,
- if it reaches the end of the method, that task is fulfilled with `false`,
- if it reaches an `await`, execution continues in the background.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This last bullet isn't really necessary. From the caller's perspective, this isn't visible, separate from the lifecycle of the ValueTask<bool> that's returned to them.


That task is complete (rather than pending). When the caller tries to access its result the task will throw the exception it holds.

When background execution is moving the method forward and an exception is thrown without handling, the exception is similarly caught and passed onto the caller via the task.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, exceptions that go unhandled in the method are caught and stored onto the task returned from MoveNextAsync.

When background execution is moving the method forward and an exception is thrown without handling, the exception is similarly caught and passed onto the caller via the task.


### Parallels with iterator and async control flow (optional)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe avoid the word "parallel", as it has a meaning very related to concurrency, which is applicable to this discussion, and it might confuse readers.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: "Relationship between iterator and async control flow"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or "Comparison"


The execution of iterator methods shifts the control between the caller and the method:
- the caller repeatedly calls `MoveNext()` (which executes the method),
- the method yields control back to the caller (when reaching a `yield return`).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"when reaching a yield return"... or the end of the method, or a yield break, or an unhandled exception

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this a more comfortable explanation of what is happening than the earlier discussion of "continues in background"


The execution of async methods starts by the caller giving control to the method and background execution.
From there on, the control shifts between the method and background execution:
- the background execution resumes the method after a long-running task completes,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"long-running task"... "LongRunning" is a specific concept in the Task APIs, so I'd avoid using that term here. Background execution resumes the method after an awaited task completes asynchronously.

From there on, the control shifts between the method and background execution:
- the background execution resumes the method after a long-running task completes,
- the method yields control back to the background execution (when reaching an `await`).
Only when the method completes with a `return` is the control fully returned to the caller.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean? Control is returned for yield returns, etc... do you just mean there's still additional work to be done if the caller calls back into the iterator? While they should do so, they're fully in control of whether they do so.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, this does'nt seem true given exceptions. and, while pedantic, i'd like these docs to be accurate and to cover cancellation/exceptions as well as i view both are so core to the async/task world. Thanks!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, await only yields control to the caller when the task is still pending. It does not yield control if the awaited task completes synchronously.


#### Locals

Although I won't go into much details on this, it is worth mentioning that all the local variables in the method are converted to fields on a compiler-generated type, so they maintain their values across suspensions/resumptions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"all the local variables in the method are converted to fields"... all? Do we not optimize this to avoid lifting locals that don't need to be?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, only locals that are used across suspensions get hoisted to a field. Thanks

```

This relies on locals (instead of the stack) to store the two results.
So when we rewrite async or async-iterator methods, we can assume that all awaits have been spilled.
Copy link

@stephentoub stephentoub Jan 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But temp2 in this example doesn't actually need to be lifted to the state machine, right? It should be able to remain a local. In fact, it shouldn't need a local, either, should it? The awaiter will need to be stored in the state machine, but temp2 is just the result of calling GetResult on the awaiter, and that can be on the stack to be passed directly to Method1.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In M1(M2(), await expr); the result of M2() needs to be saved and restored during the await suspension. (example)

But in M1(await expr, M2()) there is no need for that. We just put the result of the await on the stack, the result of M2() on the stack, then invoke M1. (example)

This returns a `ValueTask<bool>` where the `bool` represents the presence or absence of an item.
When you call `MoveNextAsync()`:
- if the code reaches a `yield return` statement, you immediately get a `true`,
- if the code reaches the end of the method, you immediately get a `false`,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about yield break?

When you call `MoveNextAsync()`:
- if the code reaches a `yield return` statement, you immediately get a `true`,
- if the code reaches the end of the method, you immediately get a `false`,
- if you reach an `await`, you get a pending task and the code continues to execute in the background.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about exceptions? would also like to know how exceptions work before/after the first yield and first await.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BillWagner Is this "continues to execute in the background" consistent with the way we talk about await elsewhere?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We try to avoid the term "execute in the background" for async tasks. As others have said, many readers conflate "parallel" with "async".

Only when the method completes with a `return` is the control fully returned to the caller.

The execution of async-iterator methods mixes both of those patterns, with the control shifting between caller (repeatedly calls `MoveNextAsync()`), method (gives control to the caller when reaching a `yield return` and to the background execution when reaching an `await`) and background execution (resumes the method after an `await`).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: if it gets very redundant to keep on having to mention yield break or exception info, it would be fine with me if there was a preliminary paragraph mentioning that these are important, but will be elided for brevity in the following paragraphs, but will then have dedicated info explaining them later.

{
/* set state to N */
/* save awaiter temp */
/* kick off background execution */

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wouldn't mind explanations of these. if those explanations come later, can the comment mention that fact? For me, i like having the pseudo-code, knowing that each bit will be expanded on more fully. Thanks!

... code after `await expr` ...
```

Note this is the same pattern used to lower awaits in async methods.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would mention this before the pseudo-code. effectively "we do the same as normal async methods, but for expository purposes, we'll show what we do for both".

/* set state to N */
/* save awaiter temp */
/* kick off background execution */
return;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this confuses me a bit. if the sig of the method is that it gives back a Task, then clearly this 'return' isn't following that. So, presumably this is in some other method that relates more to the control flow logic to eventually move the original task back to the completed/faulted state.

When either the caller or background execution need to move the method forward, execution will resume from the right label because the state variable was set to `N` and the dispatching logic will jump to that label.

Because there is a cost to starting background execution, suspending and resuming, the lowering pattern is optimized: if the task was already completed, we avoid that overhead and instead just move ahead with the result.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be useful to show the pseudocode for that.

return;

resumeLabelN:
/* reset state */
Copy link

@CyrusNajmabadi CyrusNajmabadi Jan 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does it mean to 'reset state', and why is it necessary? why can't the value of state just be 'N' until the next time we need to suspend, at which point we give it the new value?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just means state = -1; . I agree it's not super useful, but it's something we do for iterator and async methods. It indicates "running" state.
I spelled out the state diagram here.

As far as I can tell this "RunningState" state value is currently only used to throw an exception when calling DisposeAsync() when the method is already running.


resumeLabelN:
/* reset state */
/* restore awaiter temp */

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does thsi mean?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whenever there is a suspension (return in lowered code) we must save all locals so they can be restored when coming back. In this case awaiterTemp.

- the `state` value,
- persisting `awaiterTemp`,
- the `current` value, (see section on "yield return" suspension)
- the machinery for background execution and continuation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if these will get sections explaining them, consider having links here to those sections. this feedback applies to this entire doc.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stopping now. it's too late for my brain to take on hte more complex stuff :)

Copy link

@BillWagner BillWagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm about half way through this, and it's really detailed.

I left a few suggestions. I'll read more later and add any more suggestions.

| `IAsyncEnumerator<T>` used with `await MoveNextAsync()` and `Current` | `IEnumerator<T>` used with `MoveNext()` and `Current` |
| `IAsyncDisposable` used with `await DisposeAsync()` | `IDisposable` used with `Dispose()` |

So it is unsurprising that the lowering for `await foreach` is very similar to that of a `foreach`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: instead of "lowering for" use "code generated from"

The `await foreach` statement offers the most straightforward way of enumerating an async-enumerable's items.
It is very similar to `foreach` for enumerables. The main difference is that `await foreach` awaits each item (and also disposal).

The parallel jumps out in the interfaces involved:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: change this sentence to "The API for async enumerables should be familiar to developers that have used enumerables".

This sentence was awkward for me on first read. I think it's because of the word "parallel" which made me wonder what parallel computing was involved.


When the caller is moving the method forward and an exception is thrown (without being caught by user code in the async-iterator method), `MoveNextAsync()` will return a task as normal.

That task is complete (rather than pending). When the caller tries to access its result the task will throw the exception it holds.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of saying the task is complete, would it be more correct to say the task is faulted? /cc @stephentoub

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IsCompleted property on Task/ValueTask returns true if the task is in a final state, meaning it's either RanToCompletion/Faulted/Canceled, so technically "complete" is correct, though faulted would be, too.

When background execution is moving the method forward and an exception is thrown without handling, the exception is similarly caught and passed onto the caller via the task.


### Parallels with iterator and async control flow (optional)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: "Relationship between iterator and async control flow"

From there on, the control shifts between the method and background execution:
- the background execution resumes the method after a long-running task completes,
- the method yields control back to the background execution (when reaching an `await`).
Only when the method completes with a `return` is the control fully returned to the caller.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, await only yields control to the caller when the task is still pending. It does not yield control if the awaited task completes synchronously.

The execution of async-iterator methods mixes both of those patterns, with the control shifting between caller (repeatedly calls `MoveNextAsync()`), method (gives control to the caller when reaching a `yield return` and to the background execution when reaching an `await`) and background execution (resumes the method after an `await`).


## Lowering

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider everything starting here as a second blog post. Everything above provides a great mental model for any developer using C# that wants to use IAsyncEnumerable (either consuming them, or creating methods that return them.) Everything that follows is a discussion of how the compiler does this magic. Mixing the two in one post runs the risk that devs wanting to learn how to use the new feature get discouraged, feeling "this is really hard" after reading.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stoping for now. (I got to the "quick recap so far" heading.)

This may be more than 2 posts. There's a lot of advanced topics to cover in one read. I'll review more later.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include a one sentence, or a few word clause, to define lowering

I'd rather this be named something akin to "What the compiler does to implement async iterators"


Because the method has a type parameter, the generated type will have one as well (let's call it `T2`). As part of rewriting the body of the method, all references to `T` will be converted to references to `T2`:
```C#
class UnspeakableType<T2>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😆 I'd always used AnonymousMumbleMumble as my types for these compiler generated classes.

Copy link

@KathleenDollard KathleenDollard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This post has two distinct sections:

  • Understanding async iterators so you can use them
  • Understanding how they are implemented

I would definitely like to see these split as they have a different audience.

I think the first will have a wide audience and it would be good to incorporate something like this in docs. For that purpose, it can use a bit of grammatical smoothing.

When you call `MoveNextAsync()`:
- if the code reaches a `yield return` statement, you immediately get a `true`,
- if the code reaches the end of the method, you immediately get a `false`,
- if you reach an `await`, you get a pending task and the code continues to execute in the background.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BillWagner Is this "continues to execute in the background" consistent with the way we talk about await elsewhere?

- if it reaches the end of the method, that task is fulfilled with `false`,
- if it reaches an `await`, execution continues in the background.

Note that the method is moved forward either by the caller or by background execution, but never both. Calling `MoveNextAsync()` before the task from the previous call complets yield unspecified results.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling: completes

When background execution is moving the method forward and an exception is thrown without handling, the exception is similarly caught and passed onto the caller via the task.


### Parallels with iterator and async control flow (optional)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or "Comparison"


The execution of iterator methods shifts the control between the caller and the method:
- the caller repeatedly calls `MoveNext()` (which executes the method),
- the method yields control back to the caller (when reaching a `yield return`).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this a more comfortable explanation of what is happening than the earlier discussion of "continues in background"

- the caller repeatedly calls `MoveNext()` (which executes the method),
- the method yields control back to the caller (when reaching a `yield return`).

The execution of async methods starts by the caller giving control to the method and background execution.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it would add clarity to say

"giving control to the iterator method.."

The execution of async-iterator methods mixes both of those patterns, with the control shifting between caller (repeatedly calls `MoveNextAsync()`), method (gives control to the caller when reaching a `yield return` and to the background execution when reaching an `await`) and background execution (resumes the method after an `await`).


## Lowering

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include a one sentence, or a few word clause, to define lowering

I'd rather this be named something akin to "What the compiler does to implement async iterators"

Copy link

@BillWagner BillWagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finished reading. This covers a lot of information well, and in detail.

I added additional comments as I finished reading.

A few additional writing style comments:

  • Overall, you define terms well when they are first used. I caught a couple where you didn't. When a final draft is ready, we should look for those again.
  • You sometimes use marker sentences to call out what you don't explain. "There are more details here, but I won't cover them." Consider deleting those. Don't alert readers to what you're not covering. Focus on what you are covering. You may want to add a note at the summary where you say there is additional complexity that you may cover later.

Then I'll explain `yield return` suspensions, disposal and yield breaks.


### `await`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section (and the examples that follow would really benefit from a simple concrete example. The labels and the code in the comments would be much more clear by showing an example where the reader can build a mental map from what a C# developer would write to what the C# compiler generates.


Note this is the same pattern used to lower awaits in async methods.

As you can see, we're introducing a suspension and resumption in the middle of the method.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: bold suspension and resumption on first use here. You define them well, and the bold will help readers note that these are important terms you use throughout the remainder of the article.

The resumption is the code starting from label `resumeLabelN:`.

You might wonder how execution resumes from that label. The lowering also adds logic at the beginning of the method to jumps to `resumeLabelN:` when the **state** is `N`.
So `{ ... method body... }` is lowered to start with a **dispatching** `switch` statement:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a prime example of my earlier comment regarding a concrete example. I think readers will understand this concept much better by seeing where blocks of code are rearranged to resumeLabel1, resumeLabel2 and so on

```


#### Spilling (optional)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should start this section with a sentence that defines "spilling".

}
```

This pattern can be expanded to extract `catch` handlers: each `catch` block is replaced with logic to pend the exception (save the exception into a local, remember which exception handler we were in) and the original handlers are moved out into the extracted block.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to look up "pend". Is that a common term? Is there a more well-known substitute?

`Dispose()` should execute different `finally` block(s) depdending on the state the method was suspended at:
- No code should execute when disposing an iterator that was not moved forward.
- After calling `MoveNext()` once or twice (ie. after reaching `yield return 1;` or `yield return 2;`) the `finally` should be evaluated.
- After three or four steps (ie. after reaching `yield return 3;` or the end of the method) the `finally` should not be evaluated.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: it might help readers to point out that Dispose should not execute the finally block in this case because the finally block has already been executed.

A few notes before getting started:
- Although both iterator and async-iterator methods produce disposal logic, they are implemented quite differently in the Roslyn compiler. I'll focus on disposal of async-iterator methods.
- One reason for this difference is that unlike disposal of iterator methods, the disposal of async-iterator methods can encounter suspensions from awaits. For instance, an `await` inside a `finally`.
- The caller should only invoke `DisposeAsync()` when the async-iterator method is suspended on a `yield return` statement. Invoking it at other times during the method's execution produces unspecified behavior.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This confuses me. I would think that DisposeAsync would be called after AsyncMoveNext returns false. But, I would think the async-iterator method wouldn't be suspended in that case. What am I missing?


The pseudo-code below shows what we generate. It was manually edited for readability. In particular, it more closely reflects the lowering patterns and resembles the original code structure than what you would normally see from decompilation.

You can see that:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an excellent example, and a great summary of the main concepts.

@jcouv
Copy link
Owner Author

jcouv commented Jan 10, 2019

Thanks all. I'll go over the feedback this weekend.

genlu pushed a commit that referenced this pull request Mar 7, 2019
jcouv pushed a commit that referenced this pull request Mar 16, 2019
made sync class view to get document once workspace is fully loaded
jcouv pushed a commit that referenced this pull request Jan 4, 2020
jcouv pushed a commit that referenced this pull request Jul 13, 2020
Forward integration from dotnet/roslyn
jcouv pushed a commit that referenced this pull request Feb 16, 2022
The computation of completion items is divided into two tasks:
1. "Core" items(i.e.non - expanded) which should be included in the list regardless of the selection of expander. Right now this includes all items except those from unimported namespaces.
2 .Expanded items which only show in the completion list when expander is selected, or by default if the corresponding features are enabled. Right now only items from unimported namespaces are associated with expander.

 #1 is the essence of completion so we'd always wait until its task is completed and return the results. However, because we have a really tight perf budget in completion, and computing those items in #2 could be expensive especially in a large solution (e.g.requires syntax / symbol indices and/ or runs in OOP,) we decide to kick off the computation in parallel when completion is triggered, but only include its results if it's completed by the time task #1 is completed, otherwise we don't wait on it and  just return items from #1 immediately. Task #2 will still be running in the background (until session is dismissed/committed,) and we'd check back to see if it's completed whenever we have a chance to update the completion list, i.e.when user typed another character, a filter was selected, etc. If so, those items will be added as part of the refresh.

 The reason of adopting this approach is we want to minimize typing delays. There are two ways user might perceive a delay in typing. First, they could see a delay between typing a character and completion list being displayed if they want to examine the items available. Second, they might be typing continuously w/ o paying attention to completion list, and simply expect the completion to do the "right thing" when a commit char is typed(e.g.commit "cancellationToken" when typing 'can$TAB$'). However, the commit could be delayed if completion is  still waiting on the computation of all available items, which manifests as UI delays and in worst case timeouts in commit which results in unexpected behavior(e.g.typing 'can$TAB$' results in a 250ms UI freeze and still ends up with "can" instead of "cancellationToken".)

This approach would ensure the computation of #2 will not be the cause of such delays, with the obvious trade off of potentially not providing expanded items until later(or never) in a completion session even if the feature is enabled.Note that in most cases we'd expect task #2 to finish in time and complete result would be available from the start of the session.However, even in the case only partial result is returned at the start, we still believe this is acceptable given how critical perf is in typing scenario. Additionally, expanded items are usually considered complementary. The need for them only rise occasionally(it's rare when users need to add imports,) and when they are needed, our hypothesis is because of their more intrusive nature(adding an import to the document) users would more likely to contemplate such action thus typing slower before commit and / or spending more time examining the list, which give us some opportunities to still provide those items later before they are truly required.

In this commit we have to handle adding those delayed expanded items into completion list in Roslyn. This is because the `CompletionContext.IsIncomplete` flag isn't fully supported in classic mode. Will need to move to that API once it's implemented properly.
jcouv pushed a commit that referenced this pull request Apr 2, 2024
jcouv pushed a commit that referenced this pull request Oct 23, 2024
…tionAPI

Access services on Solution rather than Workspace
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants