Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: fix cluster setting propagation flake take 2 #95583

Merged
merged 1 commit into from
Jan 20, 2023

Conversation

cucaroach
Copy link
Contributor

Previously we tried to fix this with one retry but that was
insufficient. Extend it to all queries in this section of the test.

Release note: None
Epic: CRDB-20535

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Previously we tried to fix this with one retry but that was
insufficient.  Extend it to all queries in this section of the test.

Release note: None
Epic: CRDB-20535
@cucaroach cucaroach changed the title sql: fix cluster setting propagation flake take #2 sql: fix cluster setting propagation flake take 2 Jan 20, 2023
@cucaroach cucaroach marked this pull request as ready for review January 20, 2023 15:31
@cucaroach cucaroach requested a review from yuzefovich January 20, 2023 15:31
@cucaroach
Copy link
Contributor Author

This test runs 10k times w/ stress w/o flaking now.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Re: can we ensure that all nodes in the cluster see the updated cluster setting value? Perhaps we could adjust the logic test to explicitly read the cluster setting from each node (with nodeidx directive) and with a retry option to wait until the setting is propagated, but we run this tests in 1- and 3-node configs, so we'd also need to skip some of those reads. In short, adding the retries seems like the easiest option.

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @cucaroach)

@cucaroach
Copy link
Contributor Author

Re: can we ensure that all nodes in the cluster see the updated cluster setting value? Perhaps we could adjust the logic test to explicitly read the cluster setting from each node (with nodeidx directive) and with a retry option to wait until the setting is propagated, but we run this tests in 1- and 3-node configs, so we'd also need to skip some of those reads. In short, adding the retries seems like the easiest option.

So basically the read side is a problem, there's a rangefeed on everynode that updates the in memory settings cache. What I'd like to see is a primitive to wait for range feed invocations to be done or something, ie:

# Bounding box operations.
statement ok
SET CLUSTER SETTING sql.spatial.experimental_box2d_comparison_operators.enabled = on

statement ok
SELECT crdb_internal.await_range_feed_progress_on_all_nodes(crdb_internal.get_lastest_timestamp("system.settings"))

But I have no idea how that would work, presumably we'd read the latest timestamp from that table and then check that all the other nodes range feeds have advanced to that timestamp? There's some discussion here:

#87201

Basically I agreed with Yahor retries are the easiest option but this feels like a problem that could come up again and could use a better solution.

@cucaroach
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jan 20, 2023

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants