[BUG] Dask cuDF groupby-aggregations slower with UCX, particularly with split_out #403

beckernick · 2020-01-29T20:50:37Z

Perhaps related to #402 , we see slowdowns with groupby-aggregations when using UCX. This is particularly relevant when using split_out, which is often necessary for at-scale groupby-aggregations with many unique keys.

I get these loose time results on a DGX-2:

%%time

res = df.groupby(['a_0', 'a_1']).a_non_merge_2.mean()
res = res.persist()
_ = wait(res)

TCP

3.75s

UCX + NVLink

6.41

%%time

res = df.groupby(['a_0', 'a_1']).a_non_merge_2.mean(split_out=16)
res = res.persist()
_ = wait(res)

TCP

7.87s

UCX + NVLink

19.9s

Code to reproduce:

import dask_cudf
import cudf
import os
import time
import dask.dataframe as dd
import dask.array as da

from dask_cuda import LocalCUDACluster
from dask.distributed import Client,wait
from dask.utils import parse_bytes
from dask.distributed import performance_report

protocol = "tcp"
enable_nvlink = False

# protocol = "ucx"
# enable_nvlink = True


def create_random_data(n_rows=1_000, n_parts = 10, n_keys_index_1=100_000,n_keys_index_2=100,n_keys_index_3=100, col_prefix = 'a'):
    
    chunks = n_rows//n_parts

    df = dd.concat([
        da.random.random(n_rows, chunks = chunks).to_dask_dataframe(columns= col_prefix + '_non_merge_1'),
        da.random.random(n_rows, chunks = chunks).to_dask_dataframe(columns= col_prefix + '_non_merge_2'),
        da.random.random(n_rows, chunks = chunks).to_dask_dataframe(columns= col_prefix + '_non_merge_3'),
        da.random.randint(0, n_keys_index_1, size=n_rows,chunks = chunks ).to_dask_dataframe(columns= col_prefix + '_0'),
        da.random.randint(0, n_keys_index_2, size=n_rows, chunks = chunks ).to_dask_dataframe(columns= col_prefix +'_1'),
        da.random.randint(0, n_keys_index_3, size=n_rows, chunks = chunks ).to_dask_dataframe(columns= col_prefix +'_2'),
        
    ], axis=1).persist()
    
    gdf = df.map_partitions(cudf.from_pandas)
    gdf =  gdf.persist()
    _ = wait(gdf)
    return gdf


def setup_rmm_pool(client):
    client.run(
        cudf.set_allocator,
        pool=True,
        initial_pool_size= parse_bytes("26GB"),
        allocator="default"
    )
    return None

cluster = LocalCUDACluster(
    protocol=protocol,
    enable_nvlink=enable_nvlink
)
client = Client(cluster)

setup_rmm_pool(client)

rows, parts = 10_000_000, 250
df = create_random_data(n_rows=rows, n_parts=parts, col_prefix='a', n_keys_index_2=10000)

res = df.groupby(['a_0', 'a_1']).a_non_merge_2.mean(split_out=16)
res = res.persist()
_ = wait(res)

Relevant Conda Environment:

cudf                      0.13.0a200129          py37_748    rapidsai-nightly
dask                      2.10.0                     py_0    conda-forge
dask-core                 2.10.0                     py_0    conda-forge
dask-cuda                 0.13.0a200129            py37_2    rapidsai-nightly
dask-cudf                 0.13.0a200129          py37_748    rapidsai-nightly
dask-xgboost              0.2.0.dev28      cuda10.1py37_0    rapidsai-nightly
distributed               2.10.0                     py_0    conda-forge
libcudf                   0.13.0a200129      cuda10.1_743    rapidsai-nightly
ucx                       1.7.0dev+g9d06c3a    cuda10.1_129    rapidsai-nightly
ucx-py                    0.13.0a200123+g896e60b         py37_12    rapidsai-nightly

The text was updated successfully, but these errors were encountered:

pentschev · 2021-04-23T14:51:23Z

This indeed used to be an issue but I believe it was resolved by UCX upstream. New numbers with RAPIDS 0.20 and UCX 1.9:

TCP: 1.07s
UCX: 1.36s

TCP split_out=16: 1.97s
UCX split_out=16: 1.95s

It seems that for this particular workflow there isn't much gain with UCX, but the underlying issue seems to be resolved. Closing this for now, but please reopen if you encounter that again.

beckernick changed the title ~~Dask cuDF groupby-aggregations slower with UCX, particularly with split_out~~ [BUG] Dask cuDF groupby-aggregations slower with UCX, particularly with split_out Jan 29, 2020

pentschev closed this as completed Apr 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Dask cuDF groupby-aggregations slower with UCX, particularly with split_out #403

[BUG] Dask cuDF groupby-aggregations slower with UCX, particularly with split_out #403

beckernick commented Jan 29, 2020 •

edited

Loading

pentschev commented Apr 23, 2021

[BUG] Dask cuDF groupby-aggregations slower with UCX, particularly with split_out #403

[BUG] Dask cuDF groupby-aggregations slower with UCX, particularly with split_out #403

Comments

beckernick commented Jan 29, 2020 • edited Loading

pentschev commented Apr 23, 2021

beckernick commented Jan 29, 2020 •

edited

Loading