-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure operators are always closed #13721
Conversation
I do not understand why in the previous code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you find the issue? Is it a regression? Would it be possible to write a test?
Yeah, it's a tricky one. I tried my best to explain it with words in the description, but I can see how it can still be difficult to understand. Let me include some pictures to better display the race. Let's assume a thread that was executing a task that throws has executed the At the same very moment some other thread calls But since the first thread throws and won't execute the post script: As a result a |
@sopel39 The issue is discovered when debugging flakiness from #11275. I was trying to write a test, but it become very cumbersome very fast, as it requires rather tricky interactions. To write a test I would need to stop one thread after executing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank @arhimondr for images. This is super clear now. And indeed the fix looks valid.
f3a62ff
to
4401c1f
Compare
throw new RuntimeException(failure); | ||
} | ||
|
||
verify(result != null, "result is null"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary. The Optional.of()
construction will fail if it's null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this more as a documentation so the intention is clearly stated. Otherwise when some task starts returning null
somebody may interpret the Optional.of
as a mistake and switch it to Optional.ofNullable
instead of fixing the actual problem.
Due to a race condition in Driver#tryWithLock there was a chance an operator might have end up not being properly closed upon completion. - Driver#process executes an operator under the Driver#exclusiveLock - An operator throws an exception and triggers a task failure - A TaskStateMachine listener closes all drivers calling Driver#close - Driver#close is not able to acquire the Driver#exclusiveLock and assumes the driver will be terminated by the lock owner - The lock owner throws an exception and never runs post execution code in Driver#tryWithLock that was expected to close the operators
4401c1f
to
20037c9
Compare
Description
Due to a race condition in Driver#tryWithLock there was a chance an
operator might have end up not being properly closed upon completion.
assumes the driver will be terminated by the lock owner
execution code in Driver#tryWithLock that was expected to close
the operators
Fix
Core engine
N/A
Related issues, pull requests, and links
#11275
Documentation
(X) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
(X) No release notes entries required.
( ) Release notes entries required with the following suggested text: