Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve CUDAMonitoringService #506

Merged

Conversation

makortel
Copy link

PR description:

This PR improves the memory reporting of CUDAMonitoringService by

  • printing out the numbers also after each module (configurable)
  • including CachingDeviceAllocator numbers in the printout (live, free, total)

PR validation:

Tested privately

@makortel
Copy link
Author

By the way, should I change the target branch to 11_2_X? Or open a separate PR?

@fwyzard
Copy link

fwyzard commented Jul 19, 2020

By the way, should I change the target branch to 11_2_X? Or open a separate PR?

Ah, yes. I've changed the target branch.
Can you fix the conflict (seems simple) and/or rebase this onto CMSSW_11_2_X_Patatrack, please ?

I'd rather freeze the 11.1.x branch as soon as we are done with the ongoing bug fixes, and move the developments to 11.2.x.

@fwyzard fwyzard changed the base branch from CMSSW_11_1_X_Patatrack to CMSSW_11_2_X_Patatrack July 19, 2020 08:53
@makortel makortel force-pushed the cudaMemoryMonitor branch from c19a092 to 04cec0a Compare July 21, 2020 00:04
@makortel
Copy link
Author

Rebased on top of CMSSW_11_2_X_Patatrack.

@fwyzard fwyzard merged commit ed5a12c into cms-patatrack:CMSSW_11_2_X_Patatrack Jul 31, 2020
fwyzard pushed a commit that referenced this pull request Aug 8, 2020
CachingDeviceAllocator:
  - add the device allocator status to the public interface
  - monitor the requested amount of bytes in addition to the allocated amount

CUDAMonitoringService:
  - print CUDA memory information after each module, including stats from caching allocator
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants