Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jobs: TestPauseReason failed #92112

Closed
cockroach-teamcity opened this issue Nov 18, 2022 · 4 comments · Fixed by #92121
Closed

jobs: TestPauseReason failed #92112

cockroach-teamcity opened this issue Nov 18, 2022 · 4 comments · Fixed by #92121
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-jobs
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Nov 18, 2022

jobs.TestPauseReason failed with artifacts on master @ a035d2e6e7ea57b30115ecc8ee1a5d553a9e3412:

=== RUN   TestPauseReason
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/000f42e66f4bc5220cdea84be2328068/logTestPauseReason3234956688
    test_log_scope.go:79: use -show-logs to present logs inline
    jobs_test.go:3308: condition failed to evaluate within 45s: still waiting for claim to clear
    panic.go:522: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/000f42e66f4bc5220cdea84be2328068/logTestPauseReason3234956688
--- FAIL: TestPauseReason (47.23s)

Parameters: TAGS=bazel,gss,deadlock

Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/jobs

This test on roachdash | Improve this report!

Jira issue: CRDB-21581

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Nov 18, 2022
@cockroach-teamcity cockroach-teamcity added this to the 23.1 milestone Nov 18, 2022
@blathers-crl blathers-crl bot added the T-jobs label Nov 18, 2022
@stevendanna stevendanna self-assigned this Nov 18, 2022
@cockroach-teamcity
Copy link
Member Author

jobs.TestPauseReason failed with artifacts on master @ 8357abb668a5adaff781343b394b162fb1b66c6e:

=== RUN   TestPauseReason
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/000f42e66f4bc5220cdea84be2328068/logTestPauseReason317876674
    test_log_scope.go:79: use -show-logs to present logs inline
    jobs_test.go:3308: condition failed to evaluate within 45s: still waiting for claim to clear
    panic.go:522: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/000f42e66f4bc5220cdea84be2328068/logTestPauseReason317876674
--- FAIL: TestPauseReason (46.55s)

Parameters: TAGS=bazel,gss,deadlock

Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

jobs.TestPauseReason failed with artifacts on master @ f0554bc215e31ff53e644936ba645bd895ec0d15:

=== RUN   TestPauseReason
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/000f42e66f4bc5220cdea84be2328068/logTestPauseReason1934995415
    test_log_scope.go:79: use -show-logs to present logs inline
    jobs_test.go:3308: condition failed to evaluate within 45s: still waiting for claim to clear
    panic.go:522: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/000f42e66f4bc5220cdea84be2328068/logTestPauseReason1934995415
--- FAIL: TestPauseReason (47.63s)

Parameters: TAGS=bazel,gss,deadlock

Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

jobs.TestPauseReason failed with artifacts on master @ dca415eddac0d659ae6d76b4e3dfdf4076adbd34:

=== RUN   TestPauseReason
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/000f42e66f4bc5220cdea84be2328068/logTestPauseReason1214204557
    test_log_scope.go:79: use -show-logs to present logs inline
    jobs_test.go:3291: condition failed to evaluate within 45s: still waiting for claim to clear
    panic.go:522: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/000f42e66f4bc5220cdea84be2328068/logTestPauseReason1214204557
--- FAIL: TestPauseReason (47.56s)

Parameters: TAGS=bazel,gss,deadlock

Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

craig bot pushed a commit that referenced this issue Nov 22, 2022
92121: jobs: clear claim for already-dead paused jobs r=ajwerner a=stevendanna

Previously we only cleared the claim after the state machine returned and only if the status wasn't pause-requested or
cancel-requested. This filter on status, however, was unnecessary.

The job may still be in the cancel-requested or pause-requested state when we go to clear the claim because the transaction that resulted in the canceled context may not have completed. But, it is still fine to clear the claim. There are 1 of two cases:

1) Either the transaction that cancelled us fails and we are thus
   still in the state cancel-requested or paused-requested with no
   claim. This is fine. The claim-jobs loop will claim the job and we will then move
   the state to paused or reverting, just with no context to cancel.

2) The transaction succeeds and we are in paused or reverting without
   a claim set. Just as we wanted.

Here we remove the where clause to always clear the claim when we return from the state machine.

In the case of (1), when processing the cancel-requested or paused-requested state the second time, we may still want the claim cleared. Here, we make sure it gets cleared even in the case where there is no running job that actually needs to be canceled.

Fixes #92112

Epic: None

Release note: None

Co-authored-by: Steven Danna <[email protected]>
@craig craig bot closed this as completed in 79ad3fd Nov 22, 2022
stevendanna added a commit to stevendanna/cockroach that referenced this issue Dec 12, 2022
Previously we only cleared the claim after the state machine returned
and only if the status wasn't pause-requested or
cancel-requested. This filter on status, however, was unnecessary.

The job may still be in the cancel-requested or pause-requested state
when we go to clear the claim because the transaction that resulted in
the canceled context may not have completed. But, it is still fine to
clear the claim. There are 1 of two cases:

1) Either the transaction that cancelled us fails and we are thus
   still in the state cancel-requested or paused-requested with no
   claim. This is fine. The adoption loop will adopt the job and move
   the state to paused or reverting, just with no context to cancel.

2) The transaction succeeds and we are in paused or reverting without
   a claim set. Just as we wanted.

Here we remove the where clause to always clear the claim when we
return from the state machine.

In the case of (1), when processing the cancel-requested or
paused-requested state the second time, we may still want the claim
cleared. Here, we make sure it gets cleared even in the case where
there is no running job that actually needs to be canceled.

Fixes cockroachdb#92112

Release note: None
stevendanna added a commit to stevendanna/cockroach that referenced this issue Jan 3, 2023
Previously we only cleared the claim after the state machine returned
and only if the status wasn't pause-requested or
cancel-requested. This filter on status, however, was unnecessary.

The job may still be in the cancel-requested or pause-requested state
when we go to clear the claim because the transaction that resulted in
the canceled context may not have completed. But, it is still fine to
clear the claim. There are 1 of two cases:

1) Either the transaction that cancelled us fails and we are thus
   still in the state cancel-requested or paused-requested with no
   claim. This is fine. The adoption loop will adopt the job and move
   the state to paused or reverting, just with no context to cancel.

2) The transaction succeeds and we are in paused or reverting without
   a claim set. Just as we wanted.

Here we remove the where clause to always clear the claim when we
return from the state machine.

In the case of (1), when processing the cancel-requested or
paused-requested state the second time, we may still want the claim
cleared. Here, we make sure it gets cleared even in the case where
there is no running job that actually needs to be canceled.

Fixes cockroachdb#92112

Release note: None
@wenyihu6
Copy link
Contributor

wenyihu6 commented Aug 2, 2023

This flaked on an unrelated PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-jobs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants