Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deadlock and simplify proxy tracking #712

Merged
merged 27 commits into from
Sep 8, 2021

Conversation

madsbk
Copy link
Member

@madsbk madsbk commented Aug 26, 2021

This PR introduce a ProxyManager that replaces the current implementation of proxy tracking:

class ProxyManager:
    """
    This class together with Proxies, ProxiesOnHost, and ProxiesOnDevice
    implements the tracking of all known proxies and their total host/device
    memory usage. It turns out having to re-calculate memory usage continuously
    is too expensive.

    The idea is to have the ProxifyHostFile or the proxies themself update
    their location (device or host). The manager then tallies the total memory usage.

    Notice, the manager only keeps weak references to the proxies.
    """

Additionally, this PR fixes a rare deadlock by having all proxies and the ProxyManager use the same lock. Finally, this PR will make it much easier to implement spilling to disk: #708.

Notice, from the user's perspective, this PR shouldn't change anything.

@github-actions github-actions bot added the python python code needed label Aug 26, 2021
@madsbk madsbk added 2 - In Progress Currently a work in progress improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed python python code needed labels Aug 26, 2021
@github-actions github-actions bot added the python python code needed label Aug 26, 2021
@madsbk madsbk changed the title Simplify proxy tracking [WIP] Simplify proxy tracking Aug 26, 2021
@madsbk madsbk force-pushed the simplify_proxy_tracking branch 2 times, most recently from 6e9079d to 11e36c1 Compare August 27, 2021 11:57
@codecov-commenter
Copy link

codecov-commenter commented Aug 27, 2021

Codecov Report

Merging #712 (4225364) into branch-21.10 (8e6ab70) will decrease coverage by 1.38%.
The diff coverage is 87.30%.

Impacted file tree graph

@@               Coverage Diff                @@
##           branch-21.10     #712      +/-   ##
================================================
- Coverage         87.63%   86.25%   -1.39%     
================================================
  Files                15       15              
  Lines              1658     1717      +59     
================================================
+ Hits               1453     1481      +28     
- Misses              205      236      +31     
Impacted Files Coverage Δ
dask_cuda/cuda_worker.py 77.64% <ø> (ø)
dask_cuda/get_device_memory_objects.py 70.00% <0.00%> (+1.94%) ⬆️
dask_cuda/local_cuda_cluster.py 77.88% <50.00%> (ø)
dask_cuda/utils.py 81.10% <55.26%> (-6.18%) ⬇️
dask_cuda/proxify_host_file.py 92.96% <91.50%> (-6.44%) ⬇️
dask_cuda/proxy_object.py 89.51% <98.11%> (-0.13%) ⬇️
dask_cuda/device_host_file.py 70.00% <100.00%> (-0.17%) ⬇️
dask_cuda/explicit_comms/dataframe/shuffle.py 98.03% <100.00%> (ø)
dask_cuda/initialize.py 88.88% <100.00%> (ø)
dask_cuda/is_device_object.py 62.96% <100.00%> (ø)
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a077463...4225364. Read the comment docs.

@madsbk madsbk force-pushed the simplify_proxy_tracking branch from 11e36c1 to 531932f Compare September 6, 2021 07:57
@madsbk madsbk changed the title [WIP] Simplify proxy tracking Fix deadlock and simplify proxy tracking Sep 7, 2021
@madsbk madsbk marked this pull request as ready for review September 7, 2021 11:26
@madsbk madsbk requested a review from a team as a code owner September 7, 2021 11:26
@madsbk madsbk added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Sep 7, 2021
Copy link
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, great work @madsbk . I have only a few minor comments/suggestions.

dask_cuda/proxify_device_objects.py Outdated Show resolved Hide resolved
dask_cuda/proxify_host_file.py Outdated Show resolved Hide resolved
dask_cuda/proxify_host_file.py Outdated Show resolved Hide resolved
Comment on lines +248 to +252
for dev_buf, proxies in self.get_dev_buffer_to_proxies().items():
last_access = max(p._obj_pxy.get("last_access", 0) for p in proxies)
size = sizeof(dev_buf)
dev_buf_access.append((last_access, size, proxies))
total_dev_mem_usage += size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how expensive this is, but would it make sense to update this information when there's a change instead (i.e., when something is added/removed/accessed)? Also not necessarily in this PR, just an idea for the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I will think of that in a future PR

Copy link
Member Author

@madsbk madsbk Sep 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But notice, this is only executed when we actually have to serialize, which is properly going to dominate the performance anyways.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what if you have to serialize many small objects? Maybe that can become a bottleneck at some point?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that could be a potential problem.

dask_cuda/proxy_object.py Show resolved Hide resolved
dask_cuda/proxy_object.py Outdated Show resolved Hide resolved
@madsbk
Copy link
Member Author

madsbk commented Sep 8, 2021

Overall looks good, great work @madsbk . I have only a few minor comments/suggestions.

Thanks @pentschev for the review, I think I have addressed all your suggestions.

Copy link
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @madsbk !

@pentschev
Copy link
Member

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 33e5d3e into rapidsai:branch-21.10 Sep 8, 2021
@madsbk madsbk deleted the simplify_proxy_tracking branch September 17, 2021 06:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change python python code needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants