Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debugging hangs #1238

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from
Draft

Debugging hangs #1238

wants to merge 9 commits into from

Conversation

msimberg
Copy link
Collaborator

No description provided.

@msimberg msimberg self-assigned this Dec 13, 2024
@msimberg
Copy link
Collaborator Author

cscs-ci run

@msimberg
Copy link
Collaborator Author

I suspect https://gitlab.com/cscs-ci/ci-testing/webhook-ci/mirrors/4700071344751697/7514005670787789/-/jobs/8650776698#L2248 might be another case of SCOPED_TRACE while yielding. I'm going to wait for pika 0.31.0 and pika-org/pika#1373 before testing again. That PR should avoid yielding on async_rw_mutex access if the previous access has already been waited for.

@msimberg
Copy link
Collaborator Author

cscs-ci run

@msimberg
Copy link
Collaborator Author

cscs-ci run

@msimberg
Copy link
Collaborator Author

cscs-ci run

@msimberg
Copy link
Collaborator Author

cscs-ci run

@msimberg
Copy link
Collaborator Author

cscs-ci run

@msimberg
Copy link
Collaborator Author

cscs-ci run

1 similar comment
@msimberg
Copy link
Collaborator Author

cscs-ci run

@msimberg
Copy link
Collaborator Author

With pika 0.30.1 I've been able to reproduce two segfaults on test_bt_reduction_to_band: https://gitlab.com/cscs-ci/ci-testing/webhook-ci/mirrors/4700071344751697/7514005670787789/-/jobs/8686983455 and https://gitlab.com/cscs-ci/ci-testing/webhook-ci/mirrors/4700071344751697/7514005670787789/-/jobs/8686963129. The backtraces in these cases don't explicitly mention gtest, so I can't be 100% sure they're related to scoped trace, but the symptoms otherwise look the same.

So far I haven't been able to reproduce anything with pika 0.31.0, which is a good sign, but I'll run more tests.

@msimberg
Copy link
Collaborator Author

cscs-ci run

2 similar comments
@msimberg
Copy link
Collaborator Author

cscs-ci run

@msimberg
Copy link
Collaborator Author

cscs-ci run

@msimberg
Copy link
Collaborator Author

It seems like pika 0.31.0 is still able to trigger something related to gtest/scoped_trace: https://gitlab.com/cscs-ci/ci-testing/webhook-ci/mirrors/4700071344751697/7514005670787789/-/jobs/8688842361.

@msimberg
Copy link
Collaborator Author

cscs-ci run

@msimberg
Copy link
Collaborator Author

cscs-ci run

1 similar comment
@msimberg
Copy link
Collaborator Author

cscs-ci run

@msimberg
Copy link
Collaborator Author

cscs-ci run

@msimberg
Copy link
Collaborator Author

cscs-ci run

1 similar comment
@msimberg
Copy link
Collaborator Author

cscs-ci run

@msimberg
Copy link
Collaborator Author

So far so good with the pika branch I'm working on here: pika-org/pika#1379. That PR removes another source of potential yielding from async_rw_mutex compared to pika-org/pika#1373. I'll keep rerunning tests here, but pika-org/pika#1379 will likely anyway be available in pika 0.32.0.

@msimberg
Copy link
Collaborator Author

cscs-ci run

4 similar comments
@msimberg
Copy link
Collaborator Author

cscs-ci run

@msimberg
Copy link
Collaborator Author

msimberg commented Jan 6, 2025

cscs-ci run

@msimberg
Copy link
Collaborator Author

msimberg commented Jan 6, 2025

cscs-ci run

@msimberg
Copy link
Collaborator Author

msimberg commented Jan 7, 2025

cscs-ci run

@msimberg
Copy link
Collaborator Author

msimberg commented Jan 7, 2025

cscs-ci run

@msimberg
Copy link
Collaborator Author

msimberg commented Jan 8, 2025

cscs-ci run

@msimberg
Copy link
Collaborator Author

msimberg commented Jan 8, 2025

cscs-ci run

@msimberg
Copy link
Collaborator Author

msimberg commented Jan 8, 2025

cscs-ci run

@msimberg
Copy link
Collaborator Author

msimberg commented Jan 8, 2025

cscs-ci run

1 similar comment
@msimberg
Copy link
Collaborator Author

msimberg commented Jan 9, 2025

cscs-ci run

@msimberg
Copy link
Collaborator Author

msimberg commented Jan 9, 2025

cscs-ci run

1 similar comment
@msimberg
Copy link
Collaborator Author

msimberg commented Jan 9, 2025

cscs-ci run

@msimberg
Copy link
Collaborator Author

cscs-ci run

@msimberg
Copy link
Collaborator Author

I've separated one fix into #1257. I still need to find a good solution for the debug bloat so that we can use pika 0.31.0 in CI as well.

@msimberg
Copy link
Collaborator Author

A patch to have gtest detect yielding during ScopedTrace is here: #1258.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

1 participant