Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure dynamic filters are always delivered to coordinator #13695

Merged
merged 4 commits into from
Sep 7, 2022

Conversation

arhimondr
Copy link
Contributor

Description

In certain scenarios dynamic filters were not getting delivered to coordinator as expected. This was causing problems with tests on CI (#13507).

The tests are configured to wait for dynamic filters indefinitely for consistency reasons. If filters are not delivered the query will never finish.

Currently dynamic filters are delivered via DynamicFilterFetcher. When a TaskStatus contains an updated dynamic filter version the DynamicFilterFetcher is responsible for fetching the new dynamic filters from a worker.

In certain cases the dynamic filter version was never updated:

  1. Task was getting transitioned to finished before the LocalDynamicFilterConsumer#setPartitionCount is set (addressed by the Transition task to FINISHED after noMoreOperators is set commit).
  2. Task was getting blank task status due to a race in SqlTaskExecution creation (addressed by the Update task holder before starting task execution commit)

Is this change a fix, improvement, new feature, refactoring, or other?

Fix

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Core engine

How would you describe this change to a non-technical end user or system administrator?

N/A

Related issues, pull requests, and links

#13507
#12152

Documentation

(X) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

(X) No release notes entries required.
( ) Release notes entries required with the following suggested text:

@sopel39
Copy link
Member

sopel39 commented Aug 17, 2022

Is this a regression? Should this be marked as bug or release-blocker?

@raunaqmorarka
Copy link
Member

raunaqmorarka commented Aug 17, 2022

Is this a regression? Should this be marked as bug or release-blocker?

Task was getting transitioned to finished before the LocalDynamicFilterConsumer#setPartitionCount is set 
(addressed by the Transition task to FINISHED after noMoreOperators is set commit).

This part is definitely a regression as we weren't relying on noMoreOperators for setPartitionCount before #12594
It should probably be a release blocker since we rely on blocking probe side by default for JDBC dynamic filtering

@sopel39 sopel39 added bug Something isn't working RELEASE-BLOCKER labels Aug 17, 2022
@sopel39
Copy link
Member

sopel39 commented Aug 17, 2022

This part is definitely a regression as we weren't relying on noMoreOperators for setPartitionCount before #12594
It should probably be a release blocker since we rely on blocking probe side by default for JDBC dynamic filtering

I guess if we had JDBC DF in Master, then we would spot that.

@arhimondr
Copy link
Contributor Author

It shouldn't matter for pipelined execution, as with pipelined execution dynamic filters are reported before task is finished. For fault tolerant execution there's a small possibility that the execution might get stuck if the underlying connector waits indefinitely for dynamic filters. Is this the case for JDBC based connectors?

@sopel39
Copy link
Member

sopel39 commented Aug 17, 2022

Is this the case for JDBC based connectors?

JDBC will wait until timeout. However, waitinf until timeout for every query (even if build side completed quickly) would introduce regressions

@arhimondr
Copy link
Contributor Author

@sopel39 The likelihood of it happening is rather very small. As long as a query is capable to unblock on timeout it shouldn't be a big issue.

}

@GuardedBy("this")
private PerPipelineStatus per(int pipelineId)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Earlier updates to different pipeline status could not be done in parallel, now they can be (which appears to be a good thing), was this level of synchronisation unnecessary ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like so, unless there was a necessity when the grouped execution was still there

Remove DriverSplitRunnerFactory#Status
Currently dynamic filters are delivered via DynamicFilterFetcher.
When a TaskStatus contains an updated dynamic filter version the
DynamicFilterFetcher is responsible for fetching the new dynamic
filters from a worker. However tasks were getting transitioned
to FINISHED before the LocalDynamicFilterConsumer#setPartitionCount
is set and dynamic filters updated. When task is transitioned
to FINISHED the TaskStatusFetcher no longer tries to update TaskStatus
so the coordinator is not able to learn that there is dynamic filters to
be fetched.
Otherwise there is a chance that the task may finish before task holder
is updated resulting in incomplete final TaskStatus being created.
Since the TaskStatus is used to signal DynamicFilterFetcher to get
dynamic filters the coordinator may never learn that there are dynamic
filters to be retrieved.
@arhimondr
Copy link
Contributor Author

@raunaqmorarka Updated

@arhimondr arhimondr merged commit e5f6e10 into trinodb:master Sep 7, 2022
@arhimondr arhimondr deleted the fix-df-task-race branch September 7, 2022 18:27
@github-actions github-actions bot added this to the 395 milestone Sep 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cla-signed
Development

Successfully merging this pull request may close these issues.

3 participants