-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] EventHubListener causes message lost in shutdown #41784
Comments
@JoshLove-msft: Would you please advise on whether this is related to the ask on the Functions team to expose whether an execution would be retried if the host is not shutting down? If so, please transfer to the Functions host repo. |
Thank you for your feedback. Tagging and routing to the team member best able to assist. |
@yfujiwara-sansan, I couldn't reproduce having _functionExecutionToken be canceled with linkedCts still not being cancelled. That said, it looks like you are correct that cancellation is not atomic when it comes to linked token sources, so it is possible that one of the sources can be canceled while the linked source is still not canceled. I will make a fix to explicitly check each of the token sources when making the checkpointing decision. |
@JoshLove-msft Thank you for your work! I guess that the repro code sometimes fails to repro due to scheduling of continuations. By the way, as far as I read #38067 again, it looks that |
The semantics of the token passed to the function are that it is signaled only when shutting down in a way that is not guaranteed to allow the function to complete execution. In terms of whether it makes sense to also cancel this token when partition ownership is lost, I would have to defer to @jsquire on that. |
I understand the semantics of the token passeed to the app. The cancellation behavior for ownership lost should be discussed as a different issue because it is intentional behavior. Thank you for your answer! I and my colleagues are waiting for fixking the race condition. |
@JoshLove-msft : The token that the processor passes to |
I don't think there was a good reason - it may have just been an oversight. Updated the PR to pass linkedCts token to TryExecute. |
Thank you for fix, but let me confirm just in case. Is it intentional to fix only single dispatch path? It looks following arguments should be fixed, too:
|
You are right again. Apologies for the oversight. I will add these updates in. |
Library name and version
Microsoft.Azure.WebJobs.Extensions.EventHubs 6.0.2 and 5.5.0
Describe the bug
EventHubListener.PartitionProcessor
progresses the checkpoint even when the application is shutting down (for example, configuration change, scaling, etc.). It causes message lost.Note that I found that this issue is occurred occasionally.
I and my colleague guess that this issue should be fixed #36432, but reintroduced with #38067 as following out investigation results.
Our investigation results
In this situation, while
_functionExecutionToken.IsCancellationRequested
becametrue
,linkedCts.IsCancellationRequested
had not becometrue
. So, the checkpoint was progressed even if the application execution had been cancelled.By
LinkedCancellationTokenSource
source code, the following facts were found:LinkdedCancellationTokenSource
(linkedCts
) is cancelled via callback of "linked" cancellation token (_functionExecutionToken
's source).Task.Delay(CancellationToken)
implementation.By watching the callstack in the checkpointing, following sequence was occurred:
WebHost
calls listener'sStopAsync()
CancellationTokenSource.Cancel()
(source of_functionExecutionToken
)linkedCts.IsCancellationRequested
totrue
as described above. So, the checkpoint is progressed becauselinkedCts.IsCancellationRequested
has not beentrue
yet.Expected behavior
The checkpoint is never progressed when application process is shutting down (
_functionExecutionToken
is cancelled).Actual behavior
The checkpoint is progressed occasionally.
Reproduction Steps
Recieve
method started, pressCtrl + C
to shutdown process.linkedCts
and_functionExecutionToken
states with break point in the checkpointing.Environment
The text was updated successfully, but these errors were encountered: