Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: de-flake TestDistSQLFlowsVirtualTables #75638

Conversation

irfansharif
Copy link
Contributor

Fixes #75208 (speculatively). Wasn't able to repro the original failure,
so here's a blind stab at fixing it. See this for the theory; I think
it's possible for the query this test fires off to error out + get
dequeued concurrently with when the assertions are made about it being
enqueued. We should relax our assertions accordingly.

Release note: None

Fixes cockroachdb#75208 (speculatively). Wasn't able to repro the original failure,
so here's a blind stab at fixing it. See [1] for the theory; I think
it's possible for the query this test fires off to error out + get
dequeued concurrently with when the assertions are made about it being
enqueued. We should relax our assertions accordingly.

[1]: github.com/cockroachdb/issues/75208#issuecomment-1023800242

Release note: None
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@irfansharif
Copy link
Contributor Author

Aside: I'm not really sure how #73876 caused this to flake more often. After staring at this test for hours I'm not really sure it's uncovering bugs in the span configs infra as much as shaking out latent flakiness in the test itself (which I still can't repro, 4hrs later). I'd be happy to hand off further investigation here to folks more familiar with this area if this PR ends up being insufficient.

@irfansharif
Copy link
Contributor Author

Actually I'm pretty sure it's that, this test has a pretty aggressive timeout for the query and the failing runs certainly take longer:

// Limit the execution of remote flows and shorten the timeout.
const flowStreamTimeout = 1 // in seconds

We could alternatively make the timeout less aggressive.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I think this 1 second timeout is the cause of flakiness - in all failures in #75208 we see an output like

FAIL: TestDistSQLFlowsVirtualTables/MaxRunningFlows=0 (1.53s)

and the test run time is always greater than 1 second.

I'm also not sure how span config work caused this to flake more often, and I agree that it looks like the test problem, so apologies for pushing this flake to you. Thanks for figuring it out though!

:lgtm:

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @arulajmani)

@irfansharif
Copy link
Contributor Author

Thanks!

bors r+

craig bot pushed a commit that referenced this pull request Jan 28, 2022
75638: sql: de-flake TestDistSQLFlowsVirtualTables r=irfansharif a=irfansharif

Fixes #75208 (speculatively). Wasn't able to repro the original failure,
so here's a blind stab at fixing it. See [this](github.com//issues/75208#issuecomment-1023800242) for the theory; I think
it's possible for the query this test fires off to error out + get
dequeued concurrently with when the assertions are made about it being
enqueued. We should relax our assertions accordingly.

Release note: None

Co-authored-by: irfan sharif <[email protected]>
@craig
Copy link
Contributor

craig bot commented Jan 28, 2022

Build failed:

@irfansharif
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jan 28, 2022

Build succeeded:

@craig craig bot merged commit 39d728b into cockroachdb:master Jan 28, 2022
@irfansharif irfansharif deleted the 220121.deflake-TestDistSQLFlowsVirtualTables branch February 1, 2022 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: TestDistSQLFlowsVirtualTables failed
3 participants