Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update test_spill to use cuDF's memory_usage() #767

Merged

Conversation

pentschev
Copy link
Member

@pentschev pentschev commented Oct 28, 2021

With the cuDF move from using __sizeof__() to memory_usage() we need to update spilling tests to check against the new method. Requires rapidsai/cudf#9544 .

Together with this change we fix #79 too.

@pentschev pentschev requested a review from a team as a code owner October 28, 2021 18:18
@github-actions github-actions bot added the python python code needed label Oct 28, 2021
@pentschev pentschev added 3 - Ready for Review Ready for review by team bug Something isn't working improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed bug Something isn't working labels Oct 28, 2021
@pentschev
Copy link
Member Author

rerun tests

Copy link
Member

@madsbk madsbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @pentschev

@madsbk madsbk added 5 - Merge After Dependencies Depends on another PR: do not merge out of order and removed 3 - Ready for Review Ready for review by team labels Nov 10, 2021
@pentschev
Copy link
Member Author

rerun tests

2 similar comments
@pentschev
Copy link
Member Author

rerun tests

@pentschev
Copy link
Member Author

rerun tests

@pentschev
Copy link
Member Author

Tests are still failing because gpuCI isn't picking latest dask-cudf packages, waiting for that to be resolved.

@pentschev
Copy link
Member Author

rerun tests

1 similar comment
@quasiben
Copy link
Member

rerun tests

@pentschev
Copy link
Member Author

I wasted far too much time figuring this out. To summarize, the problem is gpuCI installs python-snappy which isn't installed by other means, such as when following official RAPIDS Conda Install Instructions. When python-snappy is installed, it gets automatically used by Dask for compressing files to disk, as distributed.comm.compression="auto" by default, which then breaks checking for spilled size. We should be able to disable it via dask.config.set or dask.config.update, but for whatever reason neither is respected and compression happens even when disabling it. The only way I found that actually is respected is disabling via environment variables, which is what I've done now.

If anybody knows why this happens or has ideas to fix, please let me know, otherwise I'll merge it as is in 24 hours provided tests pass.

@pentschev
Copy link
Member Author

Actually, this is supposed to be merged before the release, so it has to be merged by EOD today.

@madsbk
Copy link
Member

madsbk commented Nov 30, 2021

I wasted far too much time figuring this out. To summarize, the problem is gpuCI installs python-snappy which isn't installed by other means, such as when following official RAPIDS Conda Install Instructions. When python-snappy is installed, it gets automatically used by Dask for compressing files to disk, as distributed.comm.compression="auto" by default, which then breaks checking for spilled size. We should be able to disable it via dask.config.set or dask.config.update, but for whatever reason neither is respected and compression happens even when disabling it. The only way I found that actually is respected is disabling via environment variables, which is what I've done now.

If anybody knows why this happens or has ideas to fix, please let me know, otherwise I'll merge it as is in 24 hours provided tests pass.

Great debugging work @pentschev, thanks!

@ajschmidt8 ajschmidt8 merged commit 72eaad5 into rapidsai:branch-21.12 Nov 30, 2021
@pentschev pentschev deleted the test-spill-cudf-memory_usage branch October 4, 2023 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Merge After Dependencies Depends on another PR: do not merge out of order improvement Improvement / enhancement to an existing function non-breaking Non-breaking change python python code needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix spilling test assertion that fails on rare occasions
5 participants