Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support lazy dynamic filtering in hive connector #4991

Merged
merged 2 commits into from
Sep 9, 2020

Conversation

raunaqmorarka
Copy link
Member

No description provided.

@cla-bot cla-bot bot added the cla-signed label Aug 26, 2020
@raunaqmorarka raunaqmorarka force-pushed the hive_df_delay branch 15 times, most recently from 345f0f0 to f673884 Compare August 31, 2020 15:24
Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@raunaqmorarka raunaqmorarka force-pushed the hive_df_delay branch 2 times, most recently from f9e631f to 9f21a18 Compare September 3, 2020 06:45
Copy link
Contributor

@rzeyde-varada rzeyde-varada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)

@raunaqmorarka raunaqmorarka force-pushed the hive_df_delay branch 3 times, most recently from cfbd57d to 80f43f9 Compare September 6, 2020 06:25
Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comments


OperatorStats probeStats = searchScanFilterAndProjectOperatorStats(result.getQueryId(), "tpch:" + PARTITIONED_LINEITEM);
// Probe-side is partially scanned
assertLessThan(probeStats.getInputPositions(), countRows("lineitem"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: is it a consistent same scanned row count every time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we use text storage format and partitioned joins so the only pruning here should come from DF and it should be consistent due to the probe blocking logic. I've updated it to assert for specific row count now.

Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm + please add test case for large build side % minor comments

assertEquals(dynamicFiltersStats.getTotalDynamicFilters(), 2L);
assertEquals(dynamicFiltersStats.getLazyDynamicFilters(), 2L);
assertEquals(dynamicFiltersStats.getReplicatedDynamicFilters(), 0L);
assertBetweenInclusive(dynamicFiltersStats.getDynamicFiltersCompleted(), 1, 2);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sopel39 FYI I tweaked this test a bit to assert on completion of only one filter. I was seeing some flakiness without this because the probe side gets unblocked after receiving one filter and the query can sometimes finish before DynamicFilterService is able to collect the 2nd filter. If I set experimental.dynamic-filtering-refresh-interval to a very low value like 20ms then the flakiness goes away.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After changing to wait on full DF, the above tweak is no longer needed

@sopel39 sopel39 merged commit 1feaa0f into trinodb:master Sep 9, 2020
@sopel39
Copy link
Member

sopel39 commented Sep 9, 2020

merged, thanks!

@sopel39 sopel39 mentioned this pull request Sep 9, 2020
9 tasks
@raunaqmorarka raunaqmorarka deleted the hive_df_delay branch September 9, 2020 16:01
@martint martint added this to the 342 milestone Sep 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants