Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Import of cuML leads to strangeness #3750

Closed
quasiben opened this issue Apr 15, 2021 · 4 comments
Closed

[BUG] Import of cuML leads to strangeness #3750

quasiben opened this issue Apr 15, 2021 · 4 comments
Labels
bug Something isn't working

Comments

@quasiben
Copy link
Member

quasiben commented Apr 15, 2021

Describe the bug
When importing cuML, it seems like there is some monkeypatching which leads to unintended host->device transfers. For example, the code snippet below demonstrates that with a cuml import (but no cuml usage) a dask dataframe is converted to a dask-cudf dataframe

import cuml
import pandas as pd
import dask.dataframe as dd
from cuml.dask.neighbors import NearestNeighbors

df = dd.from_pandas(pd.DataFrame({'author_id': [1, 2], 'embedding': [[1, 2], [3,4]]}), npartitions=1)
df.head(2)

print(type(df))

emb_col = 'embedding'

emb_dim = 2

feature_columns = ['c' + str(i) for i in range(emb_dim)]
meta = {'c' + str(i): 'float32' for i in range(emb_dim)}
emb = df[emb_col].apply(pd.Series, index=feature_columns, meta=meta)
print(type(emb))

Steps/Code to reproduce bug
Follow this guide http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports to craft a minimal bug report. This helps us reproduce the issue you're having and resolve the issue more quickly.

Expected behavior
A clear and concise description of what you expected to happen.

Environment details (please complete the following information):

  • Environment location: Bare-metal
  • Linux Distro/Architecture: Ubuntu 17.04
  • GPU Model/Driver: [V100 and driver 460.39]
  • CUDA: 11.2
  • Method of cuDF & cuML install: con
    • If method of install is [conda], run conda list and include results here
# Name                    Version                   Build  Channel
cuml                      0.20.0a210413   cuda11.2_py38_g5f61a3519_74    rapidsai-nightly
libcuml                   0.20.0a210413   cuda11.2_g5f61a3519_74    rapidsai-nightly
libcumlprims              0.20.0a210408   cuda11.2_g7f19636_2    rapidsai-nightly
@quasiben quasiben added ? - Needs Triage Need team to review and classify bug Something isn't working labels Apr 15, 2021
@viclafargue
Copy link
Contributor

viclafargue commented Apr 15, 2021

Probably related to this issue cudf/#7946. The cuML code imports dask_cudf. It seems that the problem is reproducible by just importing dask_cudf.

@viclafargue viclafargue removed the ? - Needs Triage Need team to review and classify label Apr 15, 2021
@AliceChenyy
Copy link

Hi Benjamin, this bug is badly blocking the GPU accelerated adoption of machine learning pipeline platform. The platform only used CPU before. Now in order to accelerate machine learning algorithms using GPU, we want to use knn from cuml.

@cyy857
Copy link

cyy857 commented May 17, 2021

Hello, is there any update on this issue? Thanks!

@viclafargue
Copy link
Contributor

cudf/#8342 has just been merged and should fix the issue.

@dantegd dantegd closed this as completed Jul 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants