Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter out fully acknowledged split assignments #14668

Merged

Conversation

arhimondr
Copy link
Contributor

@arhimondr arhimondr commented Oct 17, 2022

Here is a sequence of interaction that leads to a failure at the
!this.noMoreSplits.get() || noMoreSplits check in
DriverSplitRunnerFactory#enqueueSplits:

  • The coordinator sends a TaskUpdateRequest[1] containing some splits and a noMoreSplits
    set to false
  • A TaskUpdateRequest[1] times out on coordinator and coordinator
    re-sends it as TaskUpdateRequest[1']
  • The TaskUpdateRequest[1'] succeeds and the coordinator sends a TaskUpdateRequest[2]
    with noMoreSplits set to true.
  • The TaskUpdateRequest[2] succeeds
  • The original TaskUpdateRequest[1] arrives to the worker node. The
    splits in this requests are getting filtered out by the logic in the updateSplitAssignments
  • DriverSplitRunnerFactory#enqueueSplits is called with an empty list of
    splits, so the first (really important check) "!this.noMoreSplits.get() || splits.isEmpty()"
    passes, but the second check (not so important) "!this.noMoreSplits.get() || noMoreSplits" fails
    what leads to a task failure

To avoid a failure of the check mentioned we filter out fully
acknowledged SplitAssignment's (ones that contain no unacknowledged
splits)

Description

Non-technical explanation

  • Fix rare query failures with com.google.common.base.VerifyException: cannot unset noMoreSplits

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Core engine
* Fix rare query failures with `com.google.common.base.VerifyException: cannot unset noMoreSplits`

@@ -589,7 +589,6 @@ public void enqueueSplits(Set<ScheduledSplit> splits, boolean noMoreSplits)
verify(driverFactory.getSourceId().isPresent(), "not a source driver");
verify(!this.noMoreSplits.get() || splits.isEmpty(), "cannot add splits after noMoreSplits is set");
queuedSplits.addAll(splits);
verify(!this.noMoreSplits.get() || noMoreSplits, "cannot unset noMoreSplits");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you like to add a document here to explain the problem?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - please add a sentence of explanation that here we can get non empty splits list even if we already transitioned to noMoreSplits state, and this is not a big deal as those will be filtered out later.
Actually we can probably drop it here already.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait - those are filtered out before, not after the call :) Please just note that in the comment.

- Rename local variable
- Use toImmutableSet collector to avoid creating a copy in the
  SplitAssignment constructor
Here is a sequence of interaction that leads to a failure at the
 `!this.noMoreSplits.get() || noMoreSplits` check in
DriverSplitRunnerFactory#enqueueSplits:

- The coordinator sends a TaskUpdateRequest[1] containing some splits and a noMoreSplits
set to false
- A TaskUpdateRequest[1] times out on coordinator and coordinator
re-sends it as TaskUpdateRequest[1']
- The TaskUpdateRequest[1'] succeeds and the coordinator sends a TaskUpdateRequest[2]
with noMoreSplits set to true.
- The TaskUpdateRequest[2] succeeds
- The original TaskUpdateRequest[1] arrives to the worker node. The
splits in this requests are getting filtered out by the logic in the updateSplitAssignments
- DriverSplitRunnerFactory#enqueueSplits is called with an empty list of
splits, so the first (really important check) "!this.noMoreSplits.get() || splits.isEmpty()"
passes, but the second check (not so important) "!this.noMoreSplits.get() || noMoreSplits" fails
what leads to a task failure

To avoid a failure of the check mentioned we filter out fully
acknowledged SplitAssignment's (ones that contain no unacknowledged
splits)
@arhimondr arhimondr force-pushed the remove-unnecessary-no-more-splits-check branch from 71b9fc2 to 8dbddb2 Compare October 20, 2022 20:01
@arhimondr arhimondr changed the title Remove incorrect noMoreSplits check in DriverSplitRunnerFactory Filter out fully acknowledged split assignments Oct 20, 2022
@arhimondr
Copy link
Contributor Author

Instead of writing an elaborative comment I decided to simply fix the issue. Now I'm filtering out split assignments that are fully acknowledged. @linzebing @losipiuk Please let me know what you think.

@zhengxingmao zhengxingmao self-requested a review October 21, 2022 02:16
@arhimondr arhimondr merged commit 924dfdd into trinodb:master Oct 21, 2022
@arhimondr arhimondr deleted the remove-unnecessary-no-more-splits-check branch October 21, 2022 14:46
@github-actions github-actions bot added this to the 401 milestone Oct 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants