[BUG] knn.predict() results in CudaRuntimeError: Error! cudaErrorIllegalAddress reason='an illegal memory access was encountered' extraMsg='Stream sync' #1685

rnyak · 2020-02-14T19:20:39Z

Describe the bug
Hello, I am trying to run cuml KNeighborsClassifier. This is my code:

import cudf
import cuml
from cuml.preprocessing.model_selection import train_test_split
from cuml.neighbors import KNeighborsClassifier

#80/20 split
X_train, X_test, y_train, y_test = train_test_split(gdf, 'label', train_size=0.8)

X_train.shape, y_train.shape, X_test.shape, y_test.shape
((1934289, 15), (1934289,), (483573, 15), (483573,))

knn = KNeighborsClassifier(n_neighbors=5)

%time knn.fit(X_train, y_train)

cuml_result =knn.predict(X_test)

Steps/Code to reproduce bug
When I run knn.predict(X_test) and knn.predict_proba(X_test) after fitting the model, it generates the error below:

---------------------------------------------------------------------------
CudaRuntimeError                          Traceback (most recent call last)
<ipython-input-25-282c92244bf8> in <module>
----> 1 cuml_result =knn.predict_proba(X_test)

cuml/neighbors/kneighbors_classifier.pyx in cuml.neighbors.kneighbors_classifier.KNeighborsClassifier.predict_proba()

cuml/common/handle.pyx in cuml.common.handle.Handle.sync()

CudaRuntimeError: Error! cudaErrorIllegalAddress reason='an illegal memory access was encountered' extraMsg='Stream sync'

Expected behavior
knn.predict() to works without any issue.

Environment details (please complete the following information):

Environment location: [Docker]
Linux Distro/Architecture: [Ubuntu 16.04]
GPU Model/Driver: [GV100 and driver 418.39]
CUDA: [10.1]
Method of cuDF & cuML install: Docker 0.12 release

Additional context
I also ran the same example on 8x V100-32 GB using docker. Same issue occurred.

The text was updated successfully, but these errors were encountered:

cjnolet · 2020-02-14T22:43:23Z

@rnyak, thank you for filing an issue about this.

Your example does not indicate how the data was created/loaded, but I've used what you've provided to generate data randomly. On the current cuML nightly, I am able to execute the following code end-to-end without error.

%%time
import cudf
import cuml
from cuml.preprocessing.model_selection import train_test_split
from cuml.neighbors import KNeighborsClassifier

n_samples = 4000000
n_features = 15

X_host_train = pd.DataFrame(np.random.uniform(0, 1,
                                              (n_samples, n_features)))
y_host_train = pd.DataFrame(np.random.randint(0, 5, (n_samples, 1)))

X_device_train = cudf.DataFrame.from_pandas(X_host_train)
X_device_train["labels"] = cudf.DataFrame.from_pandas(y_host_train)


X_train, X_test, y_train, y_test = train_test_split(X_device_train, "labels", train_size=0.8)

# X_train.shape, y_train.shape, X_test.shape, y_test.shape
# ((1934289, 15), (1934289,), (483573, 15), (483573,))

knn = KNeighborsClassifier(n_neighbors=5)

knn.fit(X_train, y_train)

cuml_result = knn.predict(X_test)

The KNeighborsClassifier code has not been touched much in this release, at least not in a way that I believe would have fixed a crash like this.

Can you run my modified example and see if the crash is still happening on version 0.12? If so, can you see if the code runs successfully on the nightly? If you are using conda, you can install our nightly with conda install -c rapidsai-nightly cuml

If you aren't able to run it successfully on either, we'll need to dig a little deeper.

rnyak · 2020-02-14T23:28:02Z

@cjnolet Thanks for the quick response. Will test your example.

Data was first in .pkl format, then it was converted to parquet, and saved as a parquet file. We read data from parquet to gdf as:

gdf = cudf.read_parquet('mydatafile.parquet')

from gdf, we created a subset of gdf, gdf_sub, by only selecting certain columns, and train a KNeighborsClassifier. The problem I am having is at the knn.predict() part.

Not sure if it gives any info, but I could run xgboost without any issue.

rnyak · 2020-02-17T14:06:12Z

@cjnolet the example you provided works fine, just knn.predict() step is much longer than knn.fit() step. I have been using docker 0.12 stable release.

Somehow, knn.predict() does not work with my data. I get the error below at this step cuml_result= knn.predict(X_test) with my test set:

---------------------------------------------------------------------------
CudaRuntimeError                          Traceback (most recent call last)
<ipython-input-19-4524604db152> in <module>
      1 #this is where I get memory errors
      2 
----> 3 cuml_result =knn.predict(X_test)

cuml/neighbors/kneighbors_classifier.pyx in cuml.neighbors.kneighbors_classifier.KNeighborsClassifier.predict()

cuml/common/handle.pyx in cuml.common.handle.Handle.sync()

CudaRuntimeError: Error! cudaErrorIllegalAddress reason='an illegal memory access was encountered' extraMsg='Stream sync'

both X_train.dtypes and X_test.dtypes:

a             float32
b             float32
c             float32
d             float32
e             float32
f              float32
g             float32
h             float32
k             float32
l              float32
m           float32
n            float32
p            float32
r             float32
dtype: object

Thanks.

cjnolet · 2020-02-17T14:41:21Z

@rnyak,

Since the current KNNClassifier implementation uses only brute force and doesn't create any specialized indices. As a result, the call to fit() is essentially just an assignment of the training data to instance variables in the model class.

Nothing immediately obvious stands out about your feature dataframes. Is your labels column also float32?

rnyak · 2020-02-17T14:48:04Z

@cjnolet Thanks for the quick response, and explanation. The label column is int64, it includes only integers from 0 to 50 for classification purpose.

cjnolet · 2020-02-17T15:57:43Z

@rnyak, were you able to get this running successfully on your data?

rnyak · 2020-02-17T16:20:21Z

@cjnolet Unfortunately, not. I could not find the reason, so closed the issue for now. Will try 0.13 nightly, if same error occurs, I can reopen this issue. Thanks.

pseudotensor · 2022-03-10T17:34:20Z

I still get this problem for 21.08. Will file new issue

rnyak added ? - Needs Triage Need team to review and classify bug Something isn't working labels Feb 14, 2020

rnyak closed this as completed Feb 17, 2020

MatthiasKohl mentioned this issue Apr 14, 2020

[BUG] KNN is supposed to synchronize its internal streams on user stream #2079

Closed

pseudotensor mentioned this issue Mar 10, 2022

knn predict wrong and varying predictions, cudaErrorIllegalAddress, or core dump #4629

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] knn.predict() results in CudaRuntimeError: Error! cudaErrorIllegalAddress reason='an illegal memory access was encountered' extraMsg='Stream sync' #1685

[BUG] knn.predict() results in CudaRuntimeError: Error! cudaErrorIllegalAddress reason='an illegal memory access was encountered' extraMsg='Stream sync' #1685

rnyak commented Feb 14, 2020

cjnolet commented Feb 14, 2020 •

edited

Loading

rnyak commented Feb 14, 2020

rnyak commented Feb 17, 2020 •

edited

Loading

cjnolet commented Feb 17, 2020

rnyak commented Feb 17, 2020

cjnolet commented Feb 17, 2020

rnyak commented Feb 17, 2020

pseudotensor commented Mar 10, 2022

[BUG] knn.predict() results in CudaRuntimeError: Error! cudaErrorIllegalAddress reason='an illegal memory access was encountered' extraMsg='Stream sync' #1685

[BUG] knn.predict() results in CudaRuntimeError: Error! cudaErrorIllegalAddress reason='an illegal memory access was encountered' extraMsg='Stream sync' #1685

Comments

rnyak commented Feb 14, 2020

cjnolet commented Feb 14, 2020 • edited Loading

rnyak commented Feb 14, 2020

rnyak commented Feb 17, 2020 • edited Loading

cjnolet commented Feb 17, 2020

rnyak commented Feb 17, 2020

cjnolet commented Feb 17, 2020

rnyak commented Feb 17, 2020

pseudotensor commented Mar 10, 2022

cjnolet commented Feb 14, 2020 •

edited

Loading

rnyak commented Feb 17, 2020 •

edited

Loading