Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] knn.predict() results in CudaRuntimeError: Error! cudaErrorIllegalAddress reason='an illegal memory access was encountered' extraMsg='Stream sync' #1685

Closed
rnyak opened this issue Feb 14, 2020 · 8 comments
Labels
? - Needs Triage Need team to review and classify bug Something isn't working

Comments

@rnyak
Copy link
Contributor

rnyak commented Feb 14, 2020

Describe the bug
Hello, I am trying to run cuml KNeighborsClassifier. This is my code:

import cudf
import cuml
from cuml.preprocessing.model_selection import train_test_split
from cuml.neighbors import KNeighborsClassifier

#80/20 split
X_train, X_test, y_train, y_test = train_test_split(gdf, 'label', train_size=0.8)

X_train.shape, y_train.shape, X_test.shape, y_test.shape
((1934289, 15), (1934289,), (483573, 15), (483573,))

knn = KNeighborsClassifier(n_neighbors=5)

%time knn.fit(X_train, y_train)

cuml_result =knn.predict(X_test)

Steps/Code to reproduce bug
When I run knn.predict(X_test) and knn.predict_proba(X_test) after fitting the model, it generates the error below:

---------------------------------------------------------------------------
CudaRuntimeError                          Traceback (most recent call last)
<ipython-input-25-282c92244bf8> in <module>
----> 1 cuml_result =knn.predict_proba(X_test)

cuml/neighbors/kneighbors_classifier.pyx in cuml.neighbors.kneighbors_classifier.KNeighborsClassifier.predict_proba()

cuml/common/handle.pyx in cuml.common.handle.Handle.sync()

CudaRuntimeError: Error! cudaErrorIllegalAddress reason='an illegal memory access was encountered' extraMsg='Stream sync'

Expected behavior
knn.predict() to works without any issue.

Environment details (please complete the following information):

  • Environment location: [Docker]
  • Linux Distro/Architecture: [Ubuntu 16.04]
  • GPU Model/Driver: [GV100 and driver 418.39]
  • CUDA: [10.1]
  • Method of cuDF & cuML install: Docker 0.12 release

Additional context
I also ran the same example on 8x V100-32 GB using docker. Same issue occurred.

@rnyak rnyak added ? - Needs Triage Need team to review and classify bug Something isn't working labels Feb 14, 2020
@cjnolet
Copy link
Member

cjnolet commented Feb 14, 2020

@rnyak, thank you for filing an issue about this.

Your example does not indicate how the data was created/loaded, but I've used what you've provided to generate data randomly. On the current cuML nightly, I am able to execute the following code end-to-end without error.

%%time
import cudf
import cuml
from cuml.preprocessing.model_selection import train_test_split
from cuml.neighbors import KNeighborsClassifier

n_samples = 4000000
n_features = 15

X_host_train = pd.DataFrame(np.random.uniform(0, 1,
                                              (n_samples, n_features)))
y_host_train = pd.DataFrame(np.random.randint(0, 5, (n_samples, 1)))

X_device_train = cudf.DataFrame.from_pandas(X_host_train)
X_device_train["labels"] = cudf.DataFrame.from_pandas(y_host_train)


X_train, X_test, y_train, y_test = train_test_split(X_device_train, "labels", train_size=0.8)

# X_train.shape, y_train.shape, X_test.shape, y_test.shape
# ((1934289, 15), (1934289,), (483573, 15), (483573,))

knn = KNeighborsClassifier(n_neighbors=5)

knn.fit(X_train, y_train)

cuml_result = knn.predict(X_test)

The KNeighborsClassifier code has not been touched much in this release, at least not in a way that I believe would have fixed a crash like this.

Can you run my modified example and see if the crash is still happening on version 0.12? If so, can you see if the code runs successfully on the nightly? If you are using conda, you can install our nightly with conda install -c rapidsai-nightly cuml

If you aren't able to run it successfully on either, we'll need to dig a little deeper.

@rnyak
Copy link
Contributor Author

rnyak commented Feb 14, 2020

@cjnolet Thanks for the quick response. Will test your example.

Data was first in .pkl format, then it was converted to parquet, and saved as a parquet file. We read data from parquet to gdf as:

gdf = cudf.read_parquet('mydatafile.parquet')

from gdf, we created a subset of gdf, gdf_sub, by only selecting certain columns, and train a KNeighborsClassifier. The problem I am having is at the knn.predict() part.

Not sure if it gives any info, but I could run xgboost without any issue.

@rnyak
Copy link
Contributor Author

rnyak commented Feb 17, 2020

@cjnolet the example you provided works fine, just knn.predict() step is much longer than knn.fit() step. I have been using docker 0.12 stable release.

Somehow, knn.predict() does not work with my data. I get the error below at this step cuml_result= knn.predict(X_test) with my test set:

---------------------------------------------------------------------------
CudaRuntimeError                          Traceback (most recent call last)
<ipython-input-19-4524604db152> in <module>
      1 #this is where I get memory errors
      2 
----> 3 cuml_result =knn.predict(X_test)

cuml/neighbors/kneighbors_classifier.pyx in cuml.neighbors.kneighbors_classifier.KNeighborsClassifier.predict()

cuml/common/handle.pyx in cuml.common.handle.Handle.sync()

CudaRuntimeError: Error! cudaErrorIllegalAddress reason='an illegal memory access was encountered' extraMsg='Stream sync'

both X_train.dtypes and X_test.dtypes:

a             float32
b             float32
c             float32
d             float32
e             float32
f              float32
g             float32
h             float32
k             float32
l              float32
m           float32
n            float32
p            float32
r             float32
dtype: object

Thanks.

@cjnolet
Copy link
Member

cjnolet commented Feb 17, 2020

@rnyak,

Since the current KNNClassifier implementation uses only brute force and doesn't create any specialized indices. As a result, the call to fit() is essentially just an assignment of the training data to instance variables in the model class.

Nothing immediately obvious stands out about your feature dataframes. Is your labels column also float32?

@rnyak
Copy link
Contributor Author

rnyak commented Feb 17, 2020

@cjnolet Thanks for the quick response, and explanation. The label column is int64, it includes only integers from 0 to 50 for classification purpose.

@rnyak rnyak closed this as completed Feb 17, 2020
@cjnolet
Copy link
Member

cjnolet commented Feb 17, 2020

@rnyak, were you able to get this running successfully on your data?

@rnyak
Copy link
Contributor Author

rnyak commented Feb 17, 2020

@cjnolet Unfortunately, not. I could not find the reason, so closed the issue for now. Will try 0.13 nightly, if same error occurs, I can reopen this issue. Thanks.

@pseudotensor
Copy link

I still get this problem for 21.08. Will file new issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants