Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mitigate testDynamicFiltering* flakiness #18323

Conversation

ssheikin
Copy link
Contributor

@ssheikin ssheikin commented Jul 18, 2023

Mitigate testDynamicFiltering* flakiness

testDynamicFiltering* reuses global QueryTracker state which can
pruneExpiredQueries()
BaseJdbcConnectorTest is huge. Test methods are executed in parallel.
There could be more than query.max-history queries between
executeWithQueryId and SqlQueryManager.getFullQueryInfo invocations.

Executing testDynamicFiltering* methods even in almost single threaded
environment (executing these methods after all other in the test class)
eliminates this issue.

    private static final int LAST = 1000;
    @Test(priority = LAST)

However, community is not ready to introduce TestNG specific magic as
test ordering via priority.

Retrying assertions for unknown reason mitigates this issue as well.
So adding @flaky to make CI more solid.

Description

Additional context and related issues

Release notes

(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Jul 18, 2023
@ssheikin ssheikin requested review from kokosing, wendigo and hashhar July 18, 2023 09:49
@@ -2026,6 +2035,7 @@ private void assertDynamicFiltering(@Language("SQL") String sql, JoinDistributio
noDynamicFilteringResultWithQueryId.getResult(),
"For query: \n " + sql);

// Flaky: Reuses global QueryTracker state which can pruneExpiredQueries()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this assertion is flaky by its nature, then maybe we should retry in it or maybe we should use io.trino.testing.AbstractTestQueryFramework#executeExclusively(java.lang.Runnable)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not flaky in traditional sense. The query history gets purged so once this assertion fails it will keep failing even on retries. the other option is to retry entire test (either via @Flaky or wrapping the entire test in assertEventually).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's wrap test in assertEventually

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertEventually is widely used but it does not have maxRetries. 🤷
Does it worth to use failsafe here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertEventually has a timeout duration. it should be enough

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like usage of assertEventually because you retry entire test and not only wait for certain event that may happen eventually after the "when" phase of the test. In such this case I think usage of @Flaky is more justified.

I believe here using executeExclusively is the best as that eliminates impact of any other test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertEventually and @Flaky have the same logical meaning.
They retry test.
Without test retry the correct result will not be obtained eventually.
So assertEventually and @Flaky are equally incorrect.
I agree that executeExclusively would be a better fit. So I'll try that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kokosing executeExclusively would work only if all other tests which use the same runner would check for the lock too (e.g. obtain a read lock). This is not a case here.
Also we are interested in executeExclusively tests which require QueryTracker not only between themselves, but exclusively to all other tests as well.

    protected void executeExclusively(Runnable executionBlock)
    {
        runner.getExclusiveLock().lock();
        try {
            executionBlock.run();
        }
        finally {
            runner.getExclusiveLock().unlock();
        }
    }

So I'll stick with assertEventually implementation.

Also, as an alternative solution to choose I can propose to annotate methods with:

@Flaky
@Test(priority = LAST)

@kokosing @hashhar @wendigo please choose what do you like more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I prefer this more: #18468

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wrong about executeExclusively, so #18850

@kokosing executeExclusively would work only if all other tests which use the same runner would check for the lock too (e.g. obtain a read lock).

True.

This is not a case here.

Wrong. It IS a case here.
All methods of query runner check for this lock.

Also we are interested in executeExclusively tests which require QueryTracker not only between themselves, but exclusively to all other tests as well.

And it is, because All methods of query runner check for this lock.

@ssheikin ssheikin force-pushed the ssheikin/38/trino/flaky_testDynamicFiltering branch 2 times, most recently from 929ca1b to 66eefab Compare July 21, 2023 16:41
@@ -2017,6 +2020,19 @@ private void assertNoDynamicFiltering(@Language("SQL") String sql)
}

private void assertDynamicFiltering(@Language("SQL") String sql, JoinDistributionType joinDistributionType, boolean expectDynamicFiltering)
{
assertEventually(
new Duration(10, MINUTES),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 minutes is a lot

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember we had

Method io.trino.plugin.jdbc.BaseJdbcConnectorTest.testDynamicFiltering() didn't finish within the time-out 60000

that's why I've decided 10 minutes.

@@ -2026,6 +2035,7 @@ private void assertDynamicFiltering(@Language("SQL") String sql, JoinDistributio
noDynamicFilteringResultWithQueryId.getResult(),
"For query: \n " + sql);

// Flaky: Reuses global QueryTracker state which can pruneExpiredQueries()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like usage of assertEventually because you retry entire test and not only wait for certain event that may happen eventually after the "when" phase of the test. In such this case I think usage of @Flaky is more justified.

I believe here using executeExclusively is the best as that eliminates impact of any other test.

@ssheikin ssheikin force-pushed the ssheikin/38/trino/flaky_testDynamicFiltering branch from 66eefab to e111df0 Compare July 31, 2023 15:07
@ssheikin
Copy link
Contributor Author

@kokosing @hashhar @wendigo please approve
or choose what do you like more #18323 (comment)

@ssheikin ssheikin force-pushed the ssheikin/38/trino/flaky_testDynamicFiltering branch 2 times, most recently from c0f5efb to 0e5f7e1 Compare August 2, 2023 11:56
Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

% comment

* This method relies on global state of QueryTracker. It may fail because of QueryTracker.pruneExpiredQueries()
* You must ensure that query was issued and this method invoked in isolation -
* which guarantees that there is less other queries between query creation and obtaining query info than `query.max-history`
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you add TODO and mention #18499? I would do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo added.

@@ -1943,6 +1949,7 @@ public void testDynamicFilteringWithLimit()
"ON a.orderkey = b.orderkey AND b.totalprice < 1000");
}

@Flaky(issue = "https://github.com/trinodb/trino/issues/18499", match = ".*SqlQueryManager.getFullQueryInfo.*")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested this match is truely solving the flakiness?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. 20/20 runs passed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

`testDynamicFiltering*` reuses global `QueryTracker` state which can
`pruneExpiredQueries()`
`BaseJdbcConnectorTest` is huge. Test methods are executed in parallel.
There could be more than `query.max-history` queries between
`executeWithQueryId` and `SqlQueryManager.getFullQueryInfo` invocations.

Executing `testDynamicFiltering*` methods even in almost single threaded
environment (executing these methods after all other in the test class)
eliminates this issue.
```
    private static final int LAST = 1000;
    @test(priority = LAST)
```
However, community is not ready to introduce TestNG specific magic as
test ordering via `priority`.

Retrying assertions for unknown reason mitigates this issue as well.
So adding @flaky to make CI more solid.
@ssheikin ssheikin force-pushed the ssheikin/38/trino/flaky_testDynamicFiltering branch from 0e5f7e1 to 05889f5 Compare August 2, 2023 13:00
@ssheikin
Copy link
Contributor Author

ssheikin commented Aug 2, 2023

@kokosing all tests have passed. Please merge.

@kokosing kokosing merged commit f3c41a4 into trinodb:master Aug 2, 2023
@kokosing
Copy link
Member

kokosing commented Aug 2, 2023

Thank you!

@github-actions github-actions bot added this to the 423 milestone Aug 2, 2023
@ssheikin ssheikin deleted the ssheikin/38/trino/flaky_testDynamicFiltering branch August 2, 2023 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants