Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copy_both: handle Future cancellation case #27

Merged
merged 1 commit into from
Oct 17, 2024

Conversation

petrosagg
Copy link

@petrosagg petrosagg commented Oct 17, 2024

When a user starts a CopyBoth query a state machine is created that is responsible for driving the CopyBoth protocol forward regardless of what the user does with the returned handles. The state machine starts in the Setup phase and only gives control back to the user (in the form of the returned handles) after having finished with the Setup. Therefore the code assumed (incorrectly) that the handles will never be dropped while in Setup mode.

It turns out there is one more way of dropping the handle even while in the Setup phase, and that is to cancel the entire Future object returned by the copy_both method at just the right time. This resulted in the state machine observing that the handles were dropped while still in the Setup phase, which was asserted to never happen.

This PR fixes the problem by handling the case where a handles are dropped while still in the Setup phase.

Fixes: https://github.com/MaterializeInc/database-issues/issues/8604

@petrosagg petrosagg requested a review from rjobanp October 17, 2024 09:15
Copy link

@def- def- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we test this deterministically by adding a failpoint with a sleep?
Something like this in workflow_sink_kafka_restart:

Materialized(
            environment_extra=["FAILPOINTS=kafka_sink_commit_transaction=sleep(5000)"]
        )

@petrosagg
Copy link
Author

Could we test this deterministically by adding a failpoint with a sleep?

We could but then this makes this PR non-upstreamable. I just had an idea that might work in a unit test though, will report back in a moment

@petrosagg petrosagg force-pushed the handle-cancellation branch from aea9b1b to 3a603fa Compare October 17, 2024 10:11
@petrosagg
Copy link
Author

@def- added a unit test and confirmed that it fails without this PR

@petrosagg petrosagg force-pushed the handle-cancellation branch from 3a603fa to 067a1d0 Compare October 17, 2024 10:18
When a user starts a CopyBoth query a state machine is created that is
responsible for driving the CopyBoth protocol forward regardless of what
the user does with the returned handles. The state machine starts in the
`Setup` phase and only gives control back to the user (in the form of
the returned handles) after having finished with the Setup. Therefore
the code assumed (incorrectly) that the handles will never be dropped
while in Setup mode.

It turns out there is one more way of dropping the handle even while in
the Setup phase, and that is to cancel the entire Future object returned
by the `copy_both` method at just the right time. This resulted in the
state machine observing that the handles were dropped while still in the
Setup phase, which was asserted to never happen.

This PR fixes the problem by handling the case where a handles are
dropped while still in the `Setup` phase.

Signed-off-by: Petros Angelatos <[email protected]>
@petrosagg petrosagg force-pushed the handle-cancellation branch from 067a1d0 to f66d842 Compare October 17, 2024 10:21
@petrosagg petrosagg merged commit 548f00c into MaterializeInc:master Oct 17, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants