Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable failure recovery for Delta connector #11666

Merged
merged 9 commits into from
Mar 30, 2022

Conversation

losipiuk
Copy link
Member

Description

Enable failure recovery and test coverage for it for Delta connector

Is this change a fix, improvement, new feature, refactoring, or other?

improvement

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Delta connector

Related issues, pull requests, and links

FIxes: #11591

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Delta
* Allow running queries performing DML on Delta tables with fault-tolerant
  execution. ({issue}`11591`)

@@ -187,6 +191,7 @@ public static DistributedQueryRunner createDockerizedDeltaLakeQueryRunner(
.build();

DistributedQueryRunner.Builder<?> builder = DistributedQueryRunner.builder(session);
coordinatorProperties.forEach(builder::setSingleCoordinatorProperty);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not setCoordinatorProperties?

(also,. move after extra)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not setCoordinatorProperties?

so we are not overriding props for keys we do not care about, shoud there be some.

{
Session defaultSession = getQueryRunner().getDefaultSession();
return Session.builder(defaultSession)
.setSystemProperty(ENABLE_DYNAMIC_FILTERING, Boolean.toString(enabled))
.setSystemProperty(JOIN_REORDERING_STRATEGY, NONE.name())
.setSystemProperty(JOIN_DISTRIBUTION_TYPE, PARTITIONED.name())
.setCatalogSessionProperty(defaultSession.getCatalog().orElseThrow(), "dynamic_filtering_wait_timeout", "1h")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added in 504c9cf ....

@@ -812,14 +812,13 @@ private static StageStats getRootStage(MaterializedResult result)
return requireNonNull(statementStats.getRootStage(), "root stage is null");
}

private Session enableDynamicFiltering(boolean enabled)
protected Session enableDynamicFiltering(boolean enabled)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't DF enabled by default?
Why do we have this method?

Why is it also disabling CBO?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raunaqmorarka can you comment on this one? You introduced this test initially.

Copy link
Member

@raunaqmorarka raunaqmorarka Mar 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CBO is disabled in DF tests so that the join order remains the syntactic one. We usually want to follow the test writer's intent about which table they put on probe and build for DF tests (e.g. sometimes we want to test with large build tables).
DF is enabled by default, except when task retries are enabled. I added this method so that I can test joins with and without DF explicitly in the retries mode.

Comment on lines +35 to +36
public class TestDeltaTaskFailureRecoveryTest
extends BaseDeltaFailureRecoveryTest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the only subclass of BaseDeltaFailureRecoveryTest?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - I considered adding also TestDeltaQueryFailureRecoveryTest.java but I am not convinced it is necessary. @arhimondr if you agree I will flatten the hierarchy.

@losipiuk losipiuk force-pushed the lo/delta-failure-recovery branch 4 times, most recently from b4b8d9a to 5bf6906 Compare March 28, 2022 12:34
@losipiuk losipiuk force-pushed the lo/delta-failure-recovery branch 2 times, most recently from 1914711 to 7719cfa Compare March 29, 2022 06:44
@losipiuk losipiuk force-pushed the lo/delta-failure-recovery branch from 7719cfa to 1cc5d83 Compare March 29, 2022 08:51
@losipiuk losipiuk force-pushed the lo/delta-failure-recovery branch from 1cc5d83 to b43e7c1 Compare March 29, 2022 14:45
Session session = super.enableDynamicFiltering(enabled);
return Session.builder(session)
// Ensure probe side scan wait until DF is collected
.setCatalogSessionProperty(session.getCatalog().orElseThrow(), "dynamic_filtering_wait_timeout", "1h")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After #11709 lands, dynamic_filtering_wait_timeout can go back to being in BaseFailureRecoveryTest and this override won't be needed.
fyi @ebyhr

@losipiuk losipiuk merged commit 8b6888c into trinodb:master Mar 30, 2022
@github-actions github-actions bot added this to the 376 milestone Mar 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Enable query retries for Delta Lake Connector
3 participants