Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Register partd encode dispatch in dask_cudf #14287

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
108 commits
Select commit Hold shift + click to select a range
19f5174
Merge pull request #4714 from rapidsai/branch-0.13
raydouglass Mar 30, 2020
a2804c3
REL v0.13.0 release
GPUtester Mar 31, 2020
fef2a2b
REL v0.13.0 CHANGELOG Updates
mike-wendt Apr 1, 2020
ab00eb0
Merge pull request #5310 from rapidsai/branch-0.14
raydouglass Jun 3, 2020
b34b838
REL v0.14.0 release
GPUtester Jun 3, 2020
9ff9cdb
update master references
ajschmidt8 Jul 14, 2020
789d19b
REL DOC Updates for main branch switch
mike-wendt Jul 16, 2020
819f514
Merge pull request #6079 from rapidsai/branch-0.15
raydouglass Aug 26, 2020
3a0f214
REL v0.15.0 release
GPUtester Aug 26, 2020
f947393
Merge pull request #6101 from rapidsai/branch-0.15
raydouglass Aug 27, 2020
71cb8c0
REL v0.15.0 release
GPUtester Aug 27, 2020
7ef8174
Merge pull request #6547 from rapidsai/branch-0.16
raydouglass Oct 21, 2020
2b8298f
REL v0.16.0 release
GPUtester Oct 21, 2020
d72b1eb
Merge pull request #6935 from rapidsai/branch-0.17
ajschmidt8 Dec 10, 2020
f56ef85
REL v0.17.0 release
GPUtester Dec 10, 2020
b7e1a85
Merge pull request #7405 from rapidsai/branch-0.18
raydouglass Feb 24, 2021
20778e5
REL v0.18.0 release
GPUtester Feb 24, 2021
042c20f
Merge pull request #7585 from rapidsai/branch-0.18
raydouglass Mar 15, 2021
999be56
REL v0.18.1 release
raydouglass Mar 15, 2021
2391864
Merge pull request #7969 from rapidsai/branch-0.18
raydouglass Apr 15, 2021
3341561
REL v0.18.2 release
raydouglass Apr 15, 2021
6573759
Merge pull request #7626 from rapidsai/branch-0.19
raydouglass Apr 21, 2021
f07b251
REL v0.19.0 release
GPUtester Apr 21, 2021
61e5a20
REL Changelog update
ajschmidt8 Apr 21, 2021
a13e8dc
Merge pull request #8037 from rapidsai/branch-0.19
raydouglass Apr 22, 2021
a9f3453
REL v0.19.1 release
GPUtester Apr 22, 2021
2089fc9
Merge pull request #8100 from rapidsai/branch-0.19
raydouglass Apr 28, 2021
ab3b3f6
REL v0.19.2 release
GPUtester Apr 28, 2021
f9d5e2e
Merge pull request #8418 from rapidsai/branch-21.06
raydouglass Jun 9, 2021
ae44046
REL v21.06.00 release
GPUtester Jun 9, 2021
3b831c3
Merge pull request #8488 from rapidsai/branch-21.06
ajschmidt8 Jun 10, 2021
d56ac1d
Merge pull request #8542 from rapidsai/branch-21.06
raydouglass Jun 17, 2021
cddc64f
REL v21.06.01 release
GPUtester Jun 17, 2021
101fc0f
REL Merge pull request #8544 from rapidsai/branch-21.06
raydouglass Jun 17, 2021
e9dabf8
Merge pull request #8840 from rapidsai/branch-21.08
raydouglass Aug 4, 2021
106039c
REL v21.08.00 release
GPUtester Aug 4, 2021
8055721
Merge pull request #8986 from rapidsai/branch-21.08
raydouglass Aug 6, 2021
e0a8114
REL v21.08.01 release
GPUtester Aug 6, 2021
a7391e6
Merge pull request #8990 from rapidsai/branch-21.08
raydouglass Aug 6, 2021
f6d31fa
REL v21.08.02 release
GPUtester Aug 6, 2021
dff45e5
Merge pull request #9116 from rapidsai/branch-21.08
ajschmidt8 Sep 16, 2021
e4313b6
REL v21.08.03 release
GPUtester Sep 16, 2021
5638329
Merge pull request #9301 from rapidsai/branch-21.10
ajschmidt8 Oct 6, 2021
072fd86
REL v21.10.00 release
GPUtester Oct 6, 2021
8cfb8e5
Merge pull request #9420 from rapidsai/branch-21.10
raydouglass Oct 12, 2021
a1d2d13
REL v21.10.01 release
GPUtester Oct 12, 2021
3ceb0c0
Merge pull request #9689 from rapidsai/branch-21.12
raydouglass Dec 3, 2021
f1ef2d2
REL v21.12.00 release
GPUtester Dec 3, 2021
fd04831
Merge pull request #9880 from rapidsai/branch-21.12
raydouglass Dec 9, 2021
a0a0a3a
REL v21.12.01 release
GPUtester Dec 9, 2021
c74e24f
Merge pull request #9924 from rapidsai/branch-21.12
raydouglass Dec 16, 2021
06540b9
REL v21.12.02 release
GPUtester Dec 16, 2021
f39f559
Merge pull request #10101 from rapidsai/branch-22.02
raydouglass Feb 2, 2022
774d859
REL v22.02.00 release
GPUtester Feb 2, 2022
803c42a
Merge pull request #10512 from rapidsai/branch-22.04
raydouglass Apr 6, 2022
8bf0520
REL v22.04.00 release
GPUtester Apr 6, 2022
0363197
REL Merge pull request #10633 from rapidsai/branch-22.04
raydouglass Apr 11, 2022
89c7736
Merge pull request #10969 from rapidsai/branch-22.06
raydouglass Jun 7, 2022
5658c5b
REL v22.06.00 release
GPUtester Jun 7, 2022
a1fe591
Merge pull request #11208 from rapidsai/branch-22.06
raydouglass Jul 6, 2022
0dab0f8
REL v22.06.01 release
GPUtester Jul 6, 2022
a7f8de5
Merge pull request #11444 from rapidsai/branch-22.08
raydouglass Aug 17, 2022
b71873c
REL v22.08.00 release
GPUtester Aug 17, 2022
aa58765
pin numpy version (#11824)
galipremsagar Sep 29, 2022
78d3655
Merge pull request #11826 from rapidsai/branch-22.08
raydouglass Sep 29, 2022
31337c9
REL v22.08.01 release
GPUtester Sep 29, 2022
b466b6a
Merge pull request #11858 from rapidsai/branch-22.10
raydouglass Oct 12, 2022
8ffe375
REL v22.10.00 release
GPUtester Oct 12, 2022
432fb37
Merge pull request #12061 from rapidsai/branch-22.10
raydouglass Nov 3, 2022
d90f7e9
REL v22.10.01 release
GPUtester Nov 3, 2022
ca9a422
REL Merge pull request #12069 from rapidsai/branch-22.10
raydouglass Nov 4, 2022
a7dcfdf
Merge pull request #12200 from rapidsai/branch-22.12
raydouglass Dec 8, 2022
baae3a6
REL v22.12.00 release
GPUtester Dec 8, 2022
b2dfcdf
Merge pull request #12346 from rapidsai/branch-22.12
raydouglass Dec 8, 2022
f700408
REL v22.12.01 release
GPUtester Dec 8, 2022
93c5b34
Merge pull request #12660 from rapidsai/branch-23.02
raydouglass Feb 9, 2023
d5b59a2
Merge pull request #12746 from rapidsai/branch-23.02
raydouglass Feb 9, 2023
5ad4a85
REL v23.02.00 release
raydouglass Feb 9, 2023
471fa64
Merge pull request #13038 from rapidsai/branch-23.04
raydouglass Apr 12, 2023
cd71208
REL v23.04.00 release
raydouglass Apr 12, 2023
4d31a6f
REL v23.04.00 release
raydouglass Apr 12, 2023
d023acc
Merge pull request #13197 from rapidsai/branch-23.04
raydouglass Apr 21, 2023
7e070fc
REL v23.04.01 release
raydouglass Apr 21, 2023
88cb6db
REL Merge pull request #13280 from rapidsai/branch-23.04
raydouglass May 3, 2023
4548010
Merge remote-tracking branch 'upstream/branch-23.06'
raydouglass Jun 7, 2023
f881d40
REL v23.06.00 release
raydouglass Jun 7, 2023
7d33d20
Merge pull request #13640 from rapidsai/branch-23.06
raydouglass Jun 29, 2023
6a548b0
REL v23.06.01 release
raydouglass Jun 29, 2023
d9589b7
Merge pull request #13781 from rapidsai/branch-23.08
raydouglass Aug 9, 2023
8150d38
REL v23.08.00 release
raydouglass Aug 9, 2023
562f70e
Merge pull request #14224 from rapidsai/branch-23.10
raydouglass Oct 11, 2023
9f0c2f4
REL v23.10.00 release
raydouglass Oct 11, 2023
383fe3d
register partd_encode_dispatch to enable basic disk shuffle support
rjzamora Oct 16, 2023
2c499e0
Merge remote-tracking branch 'upstream/branch-23.12' into register-pa…
rjzamora Oct 16, 2023
a52ae74
roll back changelog
rjzamora Oct 16, 2023
51e7e05
Merge branch 'branch-23.12' into register-partd_encode_dispatch
rjzamora Oct 16, 2023
053e205
Merge branch 'branch-23.12' into register-partd_encode_dispatch
rjzamora Oct 16, 2023
2c33593
Merge branch 'branch-23.12' into register-partd_encode_dispatch
rjzamora Oct 17, 2023
7a7d125
skip test for old dask version
rjzamora Oct 19, 2023
3abb7f6
Merge branch 'branch-23.12' into register-partd_encode_dispatch
rjzamora Oct 19, 2023
6403fe3
Merge branch 'branch-23.12' into register-partd_encode_dispatch
rjzamora Oct 20, 2023
7db8d15
Merge branch 'branch-23.12' into register-partd_encode_dispatch
rjzamora Oct 23, 2023
6a07517
Merge branch 'branch-23.12' into register-partd_encode_dispatch
rjzamora Oct 25, 2023
bc4cd11
Merge branch 'branch-23.12' into register-partd_encode_dispatch
rjzamora Oct 25, 2023
5b39ad6
Merge branch 'branch-23.12' into register-partd_encode_dispatch
rjzamora Oct 30, 2023
e90ba5c
Merge branch 'branch-23.12' into register-partd_encode_dispatch
rjzamora Nov 3, 2023
36929ec
Merge branch 'branch-23.12' into register-partd_encode_dispatch
rjzamora Nov 6, 2023
47e82ce
Merge branch 'branch-23.12' into register-partd_encode_dispatch
rjzamora Nov 6, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions python/dask_cudf/dask_cudf/backends.py
Original file line number Diff line number Diff line change
Expand Up @@ -481,6 +481,31 @@ def sizeof_cudf_series_index(obj):
return obj.memory_usage()


# TODO: Remove try/except when cudf is pinned to dask>=2023.10.0
try:
from dask.dataframe.dispatch import partd_encode_dispatch

@partd_encode_dispatch.register(cudf.DataFrame)
def _simple_cudf_encode(_):
# Basic pickle-based encoding for a partd k-v store
import pickle
from functools import partial

import partd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add partd to our package requirements and conda recipes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it comes in transitively through dask? Which is a dependency of dask-cudf.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right exactly, dask depends on partd. I think it should be safe for us to let dask worry about the partd dependency. If dask suddenly stops using partd for shuffle="disk", it will also stop using partd_encode_dispatch.


def join(dfs):
if not dfs:
return cudf.DataFrame()
else:
return cudf.concat(dfs)

dumps = partial(pickle.dumps, protocol=pickle.HIGHEST_PROTOCOL)
return partial(partd.Encode, dumps, pickle.loads, join)

except ImportError:
pass


def _default_backend(func, *args, **kwargs):
# Utility to call a dask.dataframe function with
# the default ("pandas") backend
Expand Down
11 changes: 11 additions & 0 deletions python/dask_cudf/dask_cudf/tests/test_sort.py
Original file line number Diff line number Diff line change
Expand Up @@ -114,3 +114,14 @@ def test_sort_values_empty_string(by):
if "a" in by:
expect = df.sort_values(by)
assert dd.assert_eq(got, expect, check_index=False)


def test_disk_shuffle():
try:
from dask.dataframe.dispatch import partd_encode_dispatch # noqa: F401
except ImportError:
pytest.skip("need a version of dask that has partd_encode_dispatch")
df = cudf.DataFrame({"a": [1, 2, 3] * 20, "b": [4, 5, 6, 7] * 15})
ddf = dd.from_pandas(df, npartitions=4)
got = dd.DataFrame.shuffle(ddf, "a", shuffle="disk")
dd.assert_eq(got, df)