Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use TrackingResourceAdaptor to get better debug info #1079

Merged

Conversation

madsbk
Copy link
Member

@madsbk madsbk commented Jan 11, 2023

For better out of memory message, JIT-unspill now check the current RMM resource stack for resources such as StatisticsResourceAdaptor and TrackingResourceAdaptor that can report the current allocated bytes.

Enable by running dask-cuda-worker with --rmm-track-allocations=True or calling dask_cuda.LocalCUDACluster with rmm_track_allocations=True.

This is very useful for debugging RMM fragmentation.

@madsbk madsbk added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 11, 2023
@github-actions github-actions bot added the python python code needed label Jan 11, 2023
@codecov-commenter
Copy link

Codecov Report

Base: 87.17% // Head: 63.31% // Decreases project coverage by -23.85% ⚠️

Coverage data is based on head (e3e3a4e) compared to base (bdb7b56).
Patch coverage: 84.21% of modified lines in pull request are covered.

Additional details and impacted files
@@                Coverage Diff                @@
##           branch-23.02    #1079       +/-   ##
=================================================
- Coverage         87.17%   63.31%   -23.86%     
=================================================
  Files                18       26        +8     
  Lines              2253     3127      +874     
=================================================
+ Hits               1964     1980       +16     
- Misses              289     1147      +858     
Impacted Files Coverage Δ
dask_cuda/local_cuda_cluster.py 89.01% <ø> (ø)
dask_cuda/proxify_host_file.py 93.15% <84.21%> (-1.58%) ⬇️
dask_cuda/benchmarks/local_cudf_merge.py 0.00% <0.00%> (ø)
dask_cuda/benchmarks/local_cudf_shuffle.py 0.00% <0.00%> (ø)
dask_cuda/benchmarks/local_cupy_map_overlap.py 0.00% <0.00%> (ø)
dask_cuda/benchmarks/common.py 0.00% <0.00%> (ø)
dask_cuda/benchmarks/local_cupy.py 0.00% <0.00%> (ø)
dask_cuda/benchmarks/local_cudf_groupby.py 0.00% <0.00%> (ø)
dask_cuda/benchmarks/utils.py 0.00% <0.00%> (ø)
... and 1 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@madsbk madsbk marked this pull request as ready for review January 11, 2023 13:17
@madsbk madsbk requested a review from a team as a code owner January 11, 2023 13:17
Copy link
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really nice @madsbk , I can see this being very useful in the long-term. I've left a few minor change requests.

dask_cuda/benchmarks/utils.py Outdated Show resolved Hide resolved
Comment on lines 54 to 55
StatisticsResourceAdaptor and TrackingResourceAdaptor that
can report the current allocated bytes. Returns None, if
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
StatisticsResourceAdaptor and TrackingResourceAdaptor that
can report the current allocated bytes. Returns None, if
``StatisticsResourceAdaptor`` and ``TrackingResourceAdaptor`` that
can report the current allocated bytes. Returns ``None``, if

Copy link
Member Author

@madsbk madsbk Jan 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should only use a single ` for function, variables etc.
https://numpydoc.readthedocs.io/en/latest/format.html#common-rest-concepts

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I'm not 100% confident I know what's the difference from a single ` to double ``, IIRC the change to use double was started by @charlesbluca when he was working on RTD, could you remind us why that change was made?

In any case, I won't block this PR for this right now, if needed be we can address this on a follow-up PR.

dask_cuda/proxify_host_file.py Outdated Show resolved Hide resolved
@madsbk madsbk requested a review from pentschev January 11, 2023 15:42
Copy link
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now, thanks @madsbk !

@pentschev
Copy link
Member

/merge

@rapids-bot rapids-bot bot merged commit 0957418 into rapidsai:branch-23.02 Jan 11, 2023
@madsbk madsbk deleted the jit_unspill_better_oom_warning branch January 12, 2023 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change python python code needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants