-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use TrackingResourceAdaptor to get better debug info #1079
Use TrackingResourceAdaptor to get better debug info #1079
Conversation
Codecov ReportBase: 87.17% // Head: 63.31% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## branch-23.02 #1079 +/- ##
=================================================
- Coverage 87.17% 63.31% -23.86%
=================================================
Files 18 26 +8
Lines 2253 3127 +874
=================================================
+ Hits 1964 1980 +16
- Misses 289 1147 +858
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really nice @madsbk , I can see this being very useful in the long-term. I've left a few minor change requests.
dask_cuda/proxify_host_file.py
Outdated
StatisticsResourceAdaptor and TrackingResourceAdaptor that | ||
can report the current allocated bytes. Returns None, if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StatisticsResourceAdaptor and TrackingResourceAdaptor that | |
can report the current allocated bytes. Returns None, if | |
``StatisticsResourceAdaptor`` and ``TrackingResourceAdaptor`` that | |
can report the current allocated bytes. Returns ``None``, if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should only use a single ` for function, variables etc.
https://numpydoc.readthedocs.io/en/latest/format.html#common-rest-concepts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, I'm not 100% confident I know what's the difference from a single ` to double ``, IIRC the change to use double was started by @charlesbluca when he was working on RTD, could you remind us why that change was made?
In any case, I won't block this PR for this right now, if needed be we can address this on a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now, thanks @madsbk !
/merge |
For better out of memory message, JIT-unspill now check the current RMM resource stack for resources such as
StatisticsResourceAdaptor
andTrackingResourceAdaptor
that can report the current allocated bytes.Enable by running
dask-cuda-worker
with--rmm-track-allocations=True
or callingdask_cuda.LocalCUDACluster
withrmm_track_allocations=True
.This is very useful for debugging RMM fragmentation.