Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory usage during culling for shuffling and merge #8197

Merged
merged 4 commits into from
Sep 20, 2023

Conversation

phofl
Copy link
Collaborator

@phofl phofl commented Sep 20, 2023

Closes #8196

  • Tests added / passed
  • Passes pre-commit run --all-files

Not sure if we can add tests here that would actually help. memory spike is gone on this branch

@phofl phofl requested a review from fjetter as a code owner September 20, 2023 11:36
Copy link
Member

@fjetter fjetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If CI doesn't mind, happy to merge

distributed/shuffle/_merge.py Outdated Show resolved Hide resolved
parts_out = parts_out or self._keys_to_parts(keys)
keys = {(self.name_input_left, i) for i in range(self.npartitions)}
keys |= {(self.name_input_right, i) for i in range(self.npartitions)}
keys = frozenset(keys)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a comment why this is a frozenset. Using a frozenset is a bit uncommon and if we're honest this is just paranoia / testing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@github-actions
Copy link
Contributor

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       21 files  ±       0         21 suites  ±0   9h 40m 1s ⏱️ - 1h 19m 13s
  3 813 tests ±       0    3 703 ✔️ +    1     107 💤 ±  0  3  - 1 
35 806 runs   - 1 037  34 054 ✔️  - 980  1 749 💤  - 55  3  - 2 

For more details on these failures, see this check.

Results for commit e4cd8f9. ± Comparison against base commit 1650ceb.

@fjetter fjetter merged commit 2858930 into dask:main Sep 20, 2023
@phofl phofl deleted the memory_usage_culling branch September 20, 2023 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Culling in HashJoinP2PLayer can blow up memory
2 participants