Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check VecGeom BVH device pointers at setup and run time #1481

Merged
merged 12 commits into from
Nov 6, 2024

Conversation

sethrj
Copy link
Member

@sethrj sethrj commented Nov 4, 2024

This is an attempt to truly nail down the failure mode for #1462 . It adds a kernel launch that simply copies dBVH, the global inline BVH device pointer, back to the host. It also adds code for comparing it against the result of cudaMemcpyFromSymbol, which in all the tested configurations (combinations of static/shared) returns a null pointer, and against the actual VecGeom-allocated device pointer (which depends on https://gitlab.cern.ch/VecGeom/VecGeom/-/merge_requests/1239).

If the kernel-returned dBVH is a null pointer (or one inconsistent with the VecGeom-returned pointer if available), we print an error log message; otherwise we print a debug message. A new assertion in the BVH navigator also checks for some additional null pointers.

@sethrj sethrj added bug Something isn't working external Dependencies and framework-oriented features labels Nov 4, 2024
Copy link
Contributor

@drbenmorgan drbenmorgan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can only really comment on the CMake side and use of the new VECGEOM_BVHMANAGER_DEVICE symbol that'll come from upstream. These look o.k., as the checks/tests look to protect against the case where this won't be available in older versions.

I guess the only other thing is whether it's possible (without significant effort) to create a build setup/environment that would reproduce the issue, but that's maybe for when we've nailed down the cause(s) better?

Copy link

github-actions bot commented Nov 4, 2024

Test summary

 3 371 files   5 194 suites   3m 52s ⏱️
 1 577 tests  1 549 ✅ 28 💤 0 ❌
17 384 runs  17 318 ✅ 66 💤 0 ❌

Results for commit 09901ed.

@sethrj
Copy link
Member Author

sethrj commented Nov 6, 2024

Ping @esseivaju @pcanal @mrguilima , can one of you take a look at this and approve? It seems like @esseivaju already confirmed it does the job of telling us that the BVH pointer is wrong...

Copy link
Contributor

@esseivaju esseivaju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sethrj sethrj enabled auto-merge (squash) November 6, 2024 16:35
@sethrj sethrj merged commit f20c082 into celeritas-project:develop Nov 6, 2024
32 checks passed
@sethrj sethrj deleted the vecgeom-pointers branch November 6, 2024 22:57
@esseivaju esseivaju mentioned this pull request Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working external Dependencies and framework-oriented features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants