Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: TestDataDriven/restore-grants hangs when run within a tenant #90444

Closed
msbutler opened this issue Oct 21, 2022 · 2 comments · Fixed by #106115
Closed

backupccl: TestDataDriven/restore-grants hangs when run within a tenant #90444

msbutler opened this issue Oct 21, 2022 · 2 comments · Fixed by #106115
Assignees
Labels
A-disaster-recovery C-test-failure Broken test (automatically or manually discovered). T-disaster-recovery

Comments

@msbutler
Copy link
Collaborator

msbutler commented Oct 21, 2022

TestDataDriven/restore-grants began to hang after the test began to get run within a tenant, due to this commit fd7f48d introduced in this PR #88271

To reproduce the regression, checkout the commit, and run the following command
./dev test pkg/ccl/backupccl -f TestDataDriven/restore-grants --stress --test-args='-test.timeout 1m'

Running this command on one previous commit shows the test running smoothly.

Jira issue: CRDB-20774

@msbutler msbutler added C-test-failure Broken test (automatically or manually discovered). T-disaster-recovery labels Oct 21, 2022
@blathers-crl
Copy link

blathers-crl bot commented Oct 21, 2022

cc @cockroachdb/disaster-recovery

msbutler added a commit to msbutler/cockroach that referenced this issue Oct 21, 2022
TestDataDriven unit tests were recently enabled to run within tenants(cockroachdb#88271).
This was quickly disabled as it caused TestDataDriven/restore-grants to flake
(tracked in cockroachdb#90444). This patch re-enables TestDataDriven unit tests to run
within a tenant, except for TestDataDriven/restore-grants. Further
investigation is needed to understand why restore-grants hangs when run within
a tenant.

Informs cockroachdb#90444

Release note: None
@stevendanna
Copy link
Collaborator

My guess is that this is because this mitigation:

+# TODO(ssd): We reset the closed timestamp configurables to avoid schema
+# change transactions entering a retry loop with the lease acquisition
+# transactions. See https://github.com/cockroachdb/cockroach/issues/89900
+exec-sql
+SET CLUSTER SETTING kv.closed_timestamp.target_duration = '3s';
+----
+
+exec-sql
+SET CLUSTER SETTING kv.closed_timestamp.side_transport_interval= '200ms';
+----
+

doesn't work when we are talking to a tenant.

msbutler added a commit to msbutler/cockroach that referenced this issue Oct 24, 2022
TestDataDriven unit tests were recently enabled to run within tenants(cockroachdb#88271).
This was quickly disabled as it caused TestDataDriven/restore-grants to flake
(tracked in cockroachdb#90444). This patch re-enables TestDataDriven unit tests to run
within a tenant, except for TestDataDriven/restore-grants. Further
investigation is needed to understand why restore-grants hangs when run within
a tenant.

Informs cockroachdb#90444

Release note: None
craig bot pushed a commit that referenced this issue Oct 24, 2022
90446: backupccl: re-enable data driven tests to run within a tenant r=adityamaru,stevendanna a=msbutler

TestDataDriven unit tests were recently enabled to run within tenants(#88271). This was quickly disabled as it caused TestDataDriven/restore-grants to flake (tracked in #90444). This patch re-enables TestDataDriven unit tests to run within a tenant, except for TestDataDriven/restore-grants. Further investigation is needed to understand why restore-grants hangs when run within a tenant.

Informs #90444

Release note: None

Co-authored-by: Michael Butler <[email protected]>
craig bot pushed a commit that referenced this issue Jul 6, 2023
106115: backupccl: enable restore-grants test on tenant r=rhu713 a=stevendanna

This test passed under stress for 5 hours, which seems sufficient to unskip it.

The underlying issue described in #89900 appears to be much harder to reproduce and may now be resolved.

Informs #89900
Fixes #90444

Epic: none

Release note: None

Co-authored-by: Steven Danna <[email protected]>
@craig craig bot closed this as completed in a55a341 Jul 6, 2023
blathers-crl bot pushed a commit that referenced this issue Jul 6, 2023
This test passed under stress for 5 hours, which seems sufficient to
unskip it.

The underlying issue described in #89900 appears to be much harder to
reproduce and may now be resolved.

Informs #89900
Fixes #90444

Epic: none

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery C-test-failure Broken test (automatically or manually discovered). T-disaster-recovery
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants