Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make scheduling of remote accessible splits with addresses more strict #22190

Merged
merged 1 commit into from
Jun 11, 2024

Conversation

sopel39
Copy link
Member

@sopel39 sopel39 commented May 29, 2024

Description

UniformNodeSelector or FTE scheduler will schedule remote accessible splits on selected nodes if such nodes are available and only fallback to other nodes is nodes are no longer part of cluster. Connector might have stalled node information while creating splits which could result in selecting nodes which are now offline. Additionally, in FTE mode nodes can go down so split addresses could no longer be valid then task is restarted.

Additionally, this commit simplifies UniformNodeSelector optimizedLocalScheduling which was hard to reason about and was not taking advantages of recent improvements like adaptive split queue length.

Extracted from: #21888

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# General
* Make scheduling of splits more strict when preferred workers are present. ({issue}`issuenumber`)

@sopel39 sopel39 force-pushed the ks/remote_accessible branch from bf5f078 to 445636a Compare May 29, 2024 21:29
Copy link
Contributor

@dekimir dekimir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit message definitely needs improving. This is an important change, and it deserves a careful description.

@sopel39 sopel39 force-pushed the ks/remote_accessible branch from 445636a to 215e0dc Compare June 5, 2024 22:06
@sopel39 sopel39 requested a review from dekimir June 5, 2024 22:06
@sopel39 sopel39 force-pushed the ks/remote_accessible branch from 215e0dc to 2c5c0e6 Compare June 7, 2024 13:33
@@ -139,7 +139,7 @@ public AssignmentResult assign(PlanNodeId planNodeId, ListMultimap<Integer, Spli
.orElse(ImmutableSet.of());
assignment.addPartition(new Partition(
taskPartitionId,
new NodeRequirements(catalogRequirement, hostRequirement)));
new NodeRequirements(catalogRequirement, hostRequirement, true)));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@losipiuk @dekimir can hash distribution split be remotely accessible even if sourcePartitioningScheme.getNodeRequirement is not empty?
Should it be conditional on hostRequirement.isEmpty()?

@@ -83,7 +83,7 @@ public AssignmentResult finish()
if (!partitionAdded) {
partitionAdded = true;
result
.addPartition(new Partition(0, new NodeRequirements(Optional.empty(), hostRequirement)))
.addPartition(new Partition(0, new NodeRequirements(Optional.empty(), hostRequirement, true)))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here @losipiuk @dekimir

@sopel39 sopel39 force-pushed the ks/remote_accessible branch from 2c5c0e6 to 48841b2 Compare June 10, 2024 19:47
UniformNodeSelector or FTE scheduler will schedule remote accessible splits
on selected nodes if such nodes are available and only fallback to other
nodes if preferred nodes are no longer part of cluster. Connector might have stale
node information when creating splits which could result in choosing offline nodes.
Additionally, in FTE mode nodes can go down so split addresses could no longer
be valid then task is restarted.

Additionally, this commit simplifies UniformNodeSelector optimizedLocalScheduling
which was hard to reason about and was not taking advantages of
recent improvements like adaptive split queue length.

Co-authored-by: Karol Sobczak <[email protected]>
@sopel39 sopel39 force-pushed the ks/remote_accessible branch from 48841b2 to 8e8ca54 Compare June 11, 2024 09:53
@sopel39 sopel39 merged commit 2bedffb into trinodb:master Jun 11, 2024
95 checks passed
@sopel39 sopel39 deleted the ks/remote_accessible branch June 11, 2024 12:14
@github-actions github-actions bot added this to the 450 milestone Jun 11, 2024
@colebow
Copy link
Member

colebow commented Jun 11, 2024

We don't really document that there are selected nodes as part of FTE... so I'm not sure how user-visible this change would be or if a release note is necessary?

@sopel39
Copy link
Member Author

sopel39 commented Jun 11, 2024

We don't really document that there are selected nodes as part of FTE... so I'm not sure how user-visible this change would be or if a release note is necessary?

It's more for devs, so probably should be in SPI section

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

6 participants