-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add workaround for corrupted nsys GPU utilization data #104
Comments
The corresponding JIRA issue has been closed. @GregoryKimball Can this be closed? |
Thank you @bdice for checking in. I'm sorry to say that there is a still a problem here at the intersection of When running this command on RAPIDS devel image
We receive this nsys diagnostics error instead of valid GPU Utilization metrics:
As of cudf 23.02, we find the "GPU Metrics event chronological order was broken" error for every libcudf microbenchmark that uses
The error only occurs with |
Closed by rapidsai/cudf#12728 |
libcudf benchmarks run using nvbench show a conflict with Nsight Systems when collecting GPU utilization data. This issue is tracked in the Nsight Systems Jira board (Slack thread, Jira Issue).
The current consensus is that the root cause lies within Nsight Systems collects utilization data. I'm opening this issue to request that nvbench investigates a workaround. C++ google benchmarks and python pytest benchmarks have no issues collecting GPU utilization data with Nsight Systems, so there must be some way for nvbench user using the
--profile
flag to access GPU utilization data.Reference profile with nvbench:
Reference profile with gbench:
The text was updated successfully, but these errors were encountered: