Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Linear regression on a view can cause segfaults depending on the offsets #4199

Closed
beckernick opened this issue Sep 9, 2021 · 4 comments
Labels
bug Something isn't working CUDA / C++ CUDA issue

Comments

@beckernick
Copy link
Member

Linear regression on a view can cause segfaults depending on the offsets. This could cause issues when a user is trying to do something like rolling regressions using a sliding window in a loop.

I haven't root caused this issue, but it does not occur with all "window" sizes for the view.

import cudf
import cuml
import numpy as npstart = 1
end = start + 10df = cudf.DataFrame(np.random.normal(size=(15,3)), columns=["x","y","z"])
​
X = df[['x','y']].iloc[start:end]
y = df['z'].iloc[start:end]
​
lr = cuml.linear_model.LinearRegression()
lr.fit(X,y)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_38873/3373970818.py in <module>
     12 
     13 lr = cuml.linear_model.LinearRegression()
---> 14 lr.fit(X,y) # segfault

/raid/nicholasb/miniconda3/envs/rapids-21.10/lib/python3.8/site-packages/cuml/internals/api_decorators.py in inner_with_setters(*args, **kwargs)
    407                                 target_val=target_val)
    408 
--> 409                 return func(*args, **kwargs)
    410 
    411         @wraps(func)

cuml/linear_model/linear_regression.pyx in cuml.linear_model.linear_regression.LinearRegression.fit()

RuntimeError: cuSOLVER error encountered at: file=/raid/nicholasb/miniconda3/envs/rapids-21.10/include/raft/linalg/eig.cuh line=60: call='cusolverDnsyevd(cusolverH, CUSOLVER_EIG_MODE_VECTOR, CUBLAS_FILL_MODE_UPPER, n_rows, eig_vectors, n_cols, eig_vals, d_work.data(), lwork, d_dev_info.data(), stream)', Reason=7:CUSOLVER_STATUS_INTERNAL_ERROR
Obtained 64 stack frames
#0 in /raid/nicholasb/miniconda3/envs/rapids-21.10/lib/python3.8/site-packages/cuml/common/../../../../libcuml++.so(_ZN4raft9exception18collect_call_stackEv+0x3b) [0x7fcea8561dbb]
#1 in /raid/nicholasb/miniconda3/envs/rapids-21.10/lib/python3.8/site-packages/cuml/common/../../../../libcuml++.so(_ZN4raft14cusolver_errorC2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0xbd) [0x7fcea85bcb8d]
#2 in /raid/nicholasb/miniconda3/envs/rapids-21.10/lib/python3.8/site-packages/cuml/common/../../../../libcuml++.so(_ZN4raft6linalg5eigDCIdEEvRKNS_8handle_tEPKT_iiPS5_S8_P11CUstream_st+0x6ec) [0x7fcea86bbc1c]
#3 in /raid/nicholasb/miniconda3/envs/rapids-21.10/lib/python3.8/site-packages/cuml/common/../../../../libcuml++.so(_ZN4raft6linalg6svdEigIdEEvRKNS_8handle_tEPT_iiS6_S6_S6_bP11CUstream_st+0xff) [0x7fcea86bc16f]
#4 in /raid/nicholasb/miniconda3/envs/rapids-21.10/lib/python3.8/site-packages/cuml/common/../../../../libcuml++.so(_ZN8MLCommon6LinAlg5lstsqIdEEvRKN4raft8handle_tEPT_iiS7_S7_iP11CUstream_st+0x642) [0x7fcea86bcb42]
#5 in /raid/nicholasb/miniconda3/envs/rapids-21.10/lib/python3.8/site-packages/cuml/common/../../../../libcuml++.so(_ZN2ML3GLM6olsFitIdEEvRKN4raft8handle_tEPT_iiS7_S7_S7_bbP11CUstream_sti+0x7df) [0x7fcea86bd48f]
#6 in /raid/nicholasb/miniconda3/envs/rapids-21.10/lib/python3.8/site-packages/cuml/common/../../../../libcuml++.so(_ZN2ML3GLM6olsFitERKN4raft8handle_tEPdiiS5_S5_S5_bbi+0x24) [0x7fcea8667bb4]
#7 in /raid/nicholasb/miniconda3/envs/rapids-21.10/lib/python3.8/site-packages/cuml/linear_model/linear_regression.cpython-38-x86_64-linux-gnu.so(+0x2a182) [0x7fce8a1c8182]
#8 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(PyObject_Call+0x24d) [0x562440cd7d5d]
#9 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(_PyEval_EvalFrameDefault+0x21bf) [0x562440d8784f]
#10 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(_PyEval_EvalCodeWithName+0x2c3) [0x562440d6c433]
#11 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(+0x1b7f47) [0x562440d6df47]
#12 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(_PyEval_EvalFrameDefault+0x4d33) [0x562440d8a3c3]
#13 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(_PyEval_EvalCodeWithName+0x2c3) [0x562440d6c433]
#14 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(PyEval_EvalCodeEx+0x39) [0x562440d6d499]
#15 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(PyEval_EvalCode+0x1b) [0x562440e08ecb]
#16 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(+0x273c4e) [0x562440e29c4e]
#17 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(+0x12488b) [0x562440cda88b]
#18 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(_PyEval_EvalFrameDefault+0x947) [0x562440d85fd7]
#19 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(+0x17ffc3) [0x562440d35fc3]
#20 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(_PyEval_EvalFrameDefault+0x1d9d) [0x562440d8742d]
#21 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(+0x17ffc3) [0x562440d35fc3]
#22 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(_PyEval_EvalFrameDefault+0x1d9d) [0x562440d8742d]
#23 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(+0x17ffc3) [0x562440d35fc3]
#24 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(+0x190569) [0x562440d46569]
#25 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(_PyEval_EvalFrameDefault+0xa63) [0x562440d860f3]
#26 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(_PyFunction_Vectorcall+0x1a6) [0x562440d6d646]
#27 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(_PyEval_EvalFrameDefault+0x947) [0x562440d85fd7]
#28 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(_PyFunction_Vectorcall+0x1a6) [0x562440d6d646]
#29 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(_PyEval_EvalFrameDefault+0xa63) [0x562440d860f3]
#30 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(_PyEval_EvalCodeWithName+0x2c3) [0x562440d6c433]
#31 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(_PyFunction_Vectorcall+0x378) [0x562440d6d818]
#32 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(+0x1b7ed1) [0x562440d6ded1]
#33 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(PyObject_Call+0x5e) [0x562440cd7b6e]
#34 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(_PyEval_EvalFrameDefault+0x21bf) [0x562440d8784f]
#35 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(_PyEval_EvalCodeWithName+0x2c3) [0x562440d6c433]
#36 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(+0x1b7f47) [0x562440d6df47]
#37 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(_PyEval_EvalFrameDefault+0x1822) [0x562440d86eb2]
#38 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(+0x17ffc3) [0x562440d35fc3]
#39 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(_PyEval_EvalFrameDefault+0x1d9d) [0x562440d8742d]
#40 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(+0x17ffc3) [0x562440d35fc3]
#41 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(_PyEval_EvalFrameDefault+0x1d9d) [0x562440d8742d]
#42 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(+0x17ffc3) [0x562440d35fc3]
#43 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(_PyEval_EvalFrameDefault+0x1d9d) [0x562440d8742d]
#44 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(+0x17ffc3) [0x562440d35fc3]
#45 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(_PyEval_EvalFrameDefault+0x1d9d) [0x562440d8742d]
#46 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(+0x17ffc3) [0x562440d35fc3]
#47 in /raid/nicholasb/miniconda3/envs/rapids-21.10/lib/python3.8/lib-dynload/_asyncio.cpython-38-x86_64-linux-gnu.so(+0xa8a6) [0x7fd2d31568a6]
#48 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(_PyObject_MakeTpCall+0x31e) [0x562440ce8ebe]
#49 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(+0x21adef) [0x562440dd0def]
#50 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(+0x124b02) [0x562440cdab02]
#51 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(PyVectorcall_Call+0x6e) [0x562440ce581e]
#52 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(_PyEval_EvalFrameDefault+0x5c4a) [0x562440d8b2da]
#53 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(_PyFunction_Vectorcall+0x1a6) [0x562440d6d646]
#54 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(_PyEval_EvalFrameDefault+0xa63) [0x562440d860f3]
#55 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(_PyFunction_Vectorcall+0x1a6) [0x562440d6d646]
#56 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(_PyEval_EvalFrameDefault+0xa63) [0x562440d860f3]
#57 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(_PyFunction_Vectorcall+0x1a6) [0x562440d6d646]
#58 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(_PyEval_EvalFrameDefault+0xa63) [0x562440d860f3]
#59 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(_PyFunction_Vectorcall+0x1a6) [0x562440d6d646]
#60 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(_PyEval_EvalFrameDefault+0xa63) [0x562440d860f3]
#61 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(_PyFunction_Vectorcall+0x1a6) [0x562440d6d646]
#62 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(_PyEval_EvalFrameDefault+0xa63) [0x562440d860f3]
#63 in /raid/nicholasb/miniconda3/envs/rapids-21.10/bin/python(_PyEval_EvalCodeWithName+0x2c3) [0x562440d6c433]

If we explicitly copy the data before fitting the model, there is no issue:

import cudf
import cuml
import numpy as npstart = 1
end = start + 10df = cudf.DataFrame(np.random.normal(size=(15,3)), columns=["x","y","z"])
​
X = df[['x','y']].iloc[start:end]
y = df['z'].iloc[start:end]
​
lr = cuml.linear_model.LinearRegression()
lr.fit(X.copy(), y.copy())
LinearRegression()
conda list | grep "rapids\|numpy\|arrow"
# packages in environment at /raid/nicholasb/miniconda3/envs/rapids-21.10:
arrow-cpp                 5.0.0           py38h65ca8dc_5_cuda    conda-forge
arrow-cpp-proc            3.0.0                      cuda    conda-forge
cucim                     21.10.00a210909 cuda_11.2_py38_g27da55a_20    rapidsai-nightly
cudf                      21.10.00a210909 cuda_11.2_py38_g473063ffee_278    rapidsai-nightly
cudf_kafka                21.10.00a210909 py38_g473063ffee_278    rapidsai-nightly
cugraph                   21.10.00a210909 cuda11.2_py38_gc3d798f4_70    rapidsai-nightly
cuml                      21.10.00a210909 cuda11.2_py38_g496ddf075_74    rapidsai-nightly
cusignal                  21.10.00a210909 py38_g5ae8e6a_13    rapidsai-nightly
cuspatial                 21.10.00a210909 py38_gdd2156f_14    rapidsai-nightly
custreamz                 21.10.00a210909 py38_g473063ffee_278    rapidsai-nightly
cuxfilter                 21.10.00a210909 py38_gd7204b9_13    rapidsai-nightly
dask-cuda                 21.10.00a210909         py38_37    rapidsai-nightly
dask-cudf                 21.10.00a210909 py38_g473063ffee_278    rapidsai-nightly
libcucim                  21.10.00a210909 cuda11.2_g27da55a_20    rapidsai-nightly
libcudf                   21.10.00a210909 cuda11.2_g473063ffee_278    rapidsai-nightly
libcudf_kafka             21.10.00a210909 g473063ffee_278    rapidsai-nightly
libcugraph                21.10.00a210909 cuda11.2_gc3d798f4_70    rapidsai-nightly
libcuml                   21.10.00a210909 cuda11.2_g496ddf075_74    rapidsai-nightly
libcumlprims              21.10.00a210908 cuda11.2_g8e4d5a6_6    rapidsai-nightly
libcuspatial              21.10.00a210909 cuda11.2_gdd2156f_14    rapidsai-nightly
librmm                    21.10.00a210909 cuda11.2_g8527317_28    rapidsai-nightly
libxgboost                1.4.2dev.rapidsai21.10      cuda11.2_0    rapidsai-nightly
numpy                     1.21.2           py38he2449b9_0    conda-forge
py-xgboost                1.4.2dev.rapidsai21.10  cuda11.2py38_0    rapidsai-nightly
pyarrow                   5.0.0           py38hed47224_5_cuda    conda-forge
rapids                    21.10.00a210909 cuda11.2_py38_g9ebf66e_63    rapidsai-nightly
rapids-xgboost            21.10.00a210909 cuda11.2_py38_g9ebf66e_63    rapidsai-nightly
rmm                       21.10.00a210909 cuda_11.2_py38_g8527317_28    rapidsai-nightly
ucx                       1.9.0+gcd9efd3       cuda11.2_0    rapidsai-nightly
ucx-proc                  1.0.0                       gpu    rapidsai-nightly
ucx-py                    0.22.0a210909   py38_gcd9efd3_19    rapidsai-nightly
xgboost                   1.4.2dev.rapidsai21.10  cuda11.2py38_0    rapidsai-nightly
@beckernick beckernick added bug Something isn't working ? - Needs Triage Need team to review and classify labels Sep 9, 2021
@Nanthini10
Copy link
Contributor

Nanthini10 commented Sep 13, 2021

Looks like only y label needs to be copy()'ed. The following doesn't return an error.

import cudf
import cuml
import numpy as np

start = 1
end = start + 10

df = cudf.DataFrame(np.random.normal(size=(15,3)), columns=["x","y","z"])

X = df[['x','y']].iloc[start:end]
y = df['z'].iloc[start:end]
lr = cuml.linear_model.LinearRegression()
lr.fit(X, y.copy())

Even when y is a dataframe, it doesn't need to be copied. So, it's probably a problem from Series?

y = df[['z']].iloc[start:end]
lr.fit(X, y) # works when y is dataframe

@viclafargue viclafargue added CUDA / C++ CUDA issue and removed ? - Needs Triage Need team to review and classify labels Sep 15, 2021
@Nanthini10
Copy link
Contributor

update: Getting a cudaErrorMisalignedAddress now
Looking into RMM memory allocator.

Stacktrace:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-4-7f7ff82085e1> in <module>
      1 lr = cuml.linear_model.LinearRegression()
----> 2 lr.fit(X,y)

/opt/conda/envs/rapids/lib/python3.8/site-packages/cuml/internals/api_decorators.py in inner_with_setters(*args, **kwargs)
    407                                 target_val=target_val)
    408 
--> 409                 return func(*args, **kwargs)
    410 
    411         @wraps(func)

cuml/linear_model/linear_regression.pyx in cuml.linear_model.linear_regression.LinearRegression.fit()

RuntimeError: CUDA error encountered at: file=/opt/conda/envs/rapids/include/raft/mr/buffer_base.hpp line=71: call='cudaStreamSynchronize(stream_)', Reason=cudaErrorMisalignedAddress:misaligned address
Obtained 64 stack frames
#0 in /opt/conda/envs/rapids/lib/libcuml++.so(_ZN4raft9exception18collect_call_stackEv+0x4e) [0x7f2d72499f2e]
#1 in /opt/conda/envs/rapids/lib/libcuml++.so(_ZN4raft10cuda_errorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x6a) [0x7f2d7249a6fa]
#2 in /opt/conda/envs/rapids/lib/libcuml++.so(_ZN4raft2mr11buffer_baseIdNS0_6device9allocatorEEC1ESt10shared_ptrIS3_EP11CUstream_stm+0x260) [0x7f2d724a1690]
#3 in /opt/conda/envs/rapids/lib/libcuml++.so(_ZN4raft6linalg6svdEigIdEEvRKNS_8handle_tEPT_iiS6_S6_S6_bP11CUstream_st+0x10b) [0x7f2d7262f00b]
#4 in /opt/conda/envs/rapids/lib/libcuml++.so(_ZN8MLCommon6LinAlg5lstsqIdEEvRKN4raft8handle_tEPT_iiS7_S7_iP11CUstream_st+0x671) [0x7f2d7262fdd1]
#5 in /opt/conda/envs/rapids/lib/libcuml++.so(_ZN2ML3GLM6olsFitIdEEvRKN4raft8handle_tEPT_iiS7_S7_S7_bbP11CUstream_sti+0x7b2) [0x7f2d726306e2]
#6 in /opt/conda/envs/rapids/lib/libcuml++.so(_ZN2ML3GLM6olsFitERKN4raft8handle_tEPdiiS5_S5_S5_bbi+0x24) [0x7f2d725dc414]
#7 in /opt/conda/envs/rapids/lib/python3.8/site-packages/cuml/linear_model/linear_regression.cpython-38-x86_64-linux-gnu.so(+0x2825c) [0x7f2d5beaa25c]
#8 in /opt/conda/envs/rapids/bin/python(PyObject_Call+0x24d) [0x564f3b04fd5d]
#9 in /opt/conda/envs/rapids/bin/python(_PyEval_EvalFrameDefault+0x21bf) [0x564f3b0ff84f]
#10 in /opt/conda/envs/rapids/bin/python(_PyEval_EvalCodeWithName+0x2c3) [0x564f3b0e4433]
#11 in /opt/conda/envs/rapids/bin/python(+0x1b7f47) [0x564f3b0e5f47]
#12 in /opt/conda/envs/rapids/bin/python(_PyEval_EvalFrameDefault+0x4d33) [0x564f3b1023c3]
#13 in /opt/conda/envs/rapids/bin/python(_PyEval_EvalCodeWithName+0x2c3) [0x564f3b0e4433]
#14 in /opt/conda/envs/rapids/bin/python(PyEval_EvalCodeEx+0x39) [0x564f3b0e5499]
#15 in /opt/conda/envs/rapids/bin/python(PyEval_EvalCode+0x1b) [0x564f3b180ecb]
#16 in /opt/conda/envs/rapids/bin/python(+0x273c4e) [0x564f3b1a1c4e]
#17 in /opt/conda/envs/rapids/bin/python(+0x12488b) [0x564f3b05288b]
#18 in /opt/conda/envs/rapids/bin/python(_PyEval_EvalFrameDefault+0x947) [0x564f3b0fdfd7]
#19 in /opt/conda/envs/rapids/bin/python(+0x17ffc3) [0x564f3b0adfc3]
#20 in /opt/conda/envs/rapids/bin/python(_PyEval_EvalFrameDefault+0x1d9d) [0x564f3b0ff42d]
#21 in /opt/conda/envs/rapids/bin/python(+0x17ffc3) [0x564f3b0adfc3]
#22 in /opt/conda/envs/rapids/bin/python(_PyEval_EvalFrameDefault+0x1d9d) [0x564f3b0ff42d]
#23 in /opt/conda/envs/rapids/bin/python(+0x17ffc3) [0x564f3b0adfc3]
#24 in /opt/conda/envs/rapids/bin/python(+0x190569) [0x564f3b0be569]
#25 in /opt/conda/envs/rapids/bin/python(_PyEval_EvalFrameDefault+0xa63) [0x564f3b0fe0f3]
#26 in /opt/conda/envs/rapids/bin/python(_PyFunction_Vectorcall+0x1a6) [0x564f3b0e5646]
#27 in /opt/conda/envs/rapids/bin/python(_PyEval_EvalFrameDefault+0x947) [0x564f3b0fdfd7]
#28 in /opt/conda/envs/rapids/bin/python(_PyFunction_Vectorcall+0x1a6) [0x564f3b0e5646]
#29 in /opt/conda/envs/rapids/bin/python(_PyEval_EvalFrameDefault+0xa63) [0x564f3b0fe0f3]
#30 in /opt/conda/envs/rapids/bin/python(_PyEval_EvalCodeWithName+0x2c3) [0x564f3b0e4433]
#31 in /opt/conda/envs/rapids/bin/python(_PyFunction_Vectorcall+0x378) [0x564f3b0e5818]
#32 in /opt/conda/envs/rapids/bin/python(+0x1b7ed1) [0x564f3b0e5ed1]
#33 in /opt/conda/envs/rapids/bin/python(PyObject_Call+0x5e) [0x564f3b04fb6e]
#34 in /opt/conda/envs/rapids/bin/python(_PyEval_EvalFrameDefault+0x21bf) [0x564f3b0ff84f]
#35 in /opt/conda/envs/rapids/bin/python(_PyEval_EvalCodeWithName+0x2c3) [0x564f3b0e4433]
#36 in /opt/conda/envs/rapids/bin/python(+0x1b7f47) [0x564f3b0e5f47]
#37 in /opt/conda/envs/rapids/bin/python(_PyEval_EvalFrameDefault+0x1822) [0x564f3b0feeb2]
#38 in /opt/conda/envs/rapids/bin/python(+0x18cc9a) [0x564f3b0bac9a]
#39 in /opt/conda/envs/rapids/bin/python(+0x12488b) [0x564f3b05288b]
#40 in /opt/conda/envs/rapids/bin/python(+0x13337a) [0x564f3b06137a]
#41 in /opt/conda/envs/rapids/bin/python(+0x21adef) [0x564f3b148def]
#42 in /opt/conda/envs/rapids/bin/python(+0x124b02) [0x564f3b052b02]
#43 in /opt/conda/envs/rapids/bin/python(_PyEval_EvalFrameDefault+0x947) [0x564f3b0fdfd7]
#44 in /opt/conda/envs/rapids/bin/python(_PyEval_EvalCodeWithName+0x2c3) [0x564f3b0e4433]
#45 in /opt/conda/envs/rapids/bin/python(_PyFunction_Vectorcall+0x378) [0x564f3b0e5818]
#46 in /opt/conda/envs/rapids/bin/python(_PyEval_EvalFrameDefault+0xa63) [0x564f3b0fe0f3]
#47 in /opt/conda/envs/rapids/bin/python(+0x18cc9a) [0x564f3b0bac9a]
#48 in /opt/conda/envs/rapids/bin/python(+0x12488b) [0x564f3b05288b]
#49 in /opt/conda/envs/rapids/bin/python(+0x13337a) [0x564f3b06137a]
#50 in /opt/conda/envs/rapids/bin/python(+0x21adef) [0x564f3b148def]
#51 in /opt/conda/envs/rapids/bin/python(+0x124b02) [0x564f3b052b02]
#52 in /opt/conda/envs/rapids/bin/python(_PyEval_EvalFrameDefault+0x947) [0x564f3b0fdfd7]
#53 in /opt/conda/envs/rapids/bin/python(_PyEval_EvalCodeWithName+0x2c3) [0x564f3b0e4433]
#54 in /opt/conda/envs/rapids/bin/python(+0x1b7f47) [0x564f3b0e5f47]
#55 in /opt/conda/envs/rapids/bin/python(_PyEval_EvalFrameDefault+0x947) [0x564f3b0fdfd7]
#56 in /opt/conda/envs/rapids/bin/python(+0x18cc9a) [0x564f3b0bac9a]
#57 in /opt/conda/envs/rapids/bin/python(+0x12488b) [0x564f3b05288b]
#58 in /opt/conda/envs/rapids/bin/python(+0x13337a) [0x564f3b06137a]
#59 in /opt/conda/envs/rapids/bin/python(+0x21adef) [0x564f3b148def]
#60 in /opt/conda/envs/rapids/bin/python(+0x124b02) [0x564f3b052b02]
#61 in /opt/conda/envs/rapids/bin/python(_PyEval_EvalFrameDefault+0x947) [0x564f3b0fdfd7]
#62 in /opt/conda/envs/rapids/bin/python(_PyEval_EvalCodeWithName+0x2c3) [0x564f3b0e4433]
#63 in /opt/conda/envs/rapids/bin/python(_PyFunction_Vectorcall+0x378) [0x564f3b0e5818]

@cjnolet
Copy link
Member

cjnolet commented Sep 20, 2021

@Nanthini10,

I can't immediately think of why this would be happening other than the underlying pointer that the cudf view is returning somehow not being aligned to the proper power of 2. In this case, the memory should be aligning w/ 8 since we're using a double-precision array.

>>> import cudf
>>> import cuml
>>> import numpy as np
>>> start = 1
>>> end = start + 10
>>> df = cudf.DataFrame(np.random.normal(size=(15,3)), columns=["x","y","z"])
>>> y = df['z'].iloc[start:end]
>>> y
1     0.588237
2     1.994371
3    -1.165460
4     1.157915
5    -1.263956
6     1.094313
7    -0.713109
8    -0.351764
9    -1.126963
10   -0.189863
Name: z, dtype: float64

Taking cuml out of the equation, one thing I notice from the start is that the pointer (in the data entry below) is divisible by 8 bytes:

>>> y2 = df['z'].iloc[0:end]
>>> y2.__cuda_array_interface__
{'shape': (15,), 'strides': (8,), 'typestr': '<f8', 'data': (140614112904704, False), 'version': 1}
>>> 140614112904704 % 8
0

And slicing element 1 does increment that pointer by 8 bytes.

>>> y.__cuda_array_interface__
{'shape': (10,), 'strides': (8,), 'typestr': '<f8', 'data': (140614112904712, False), 'version': 1}

Here's an example of a double precision cupy array, which is doing identical arithmetic:

>>> import cupy as cp
>>> a = cp.array([4.0, 5.0, 6.0], dtype='float64')
>>> a.__cuda_array_interface__
{'shape': (3,), 'typestr': '<f8', 'descr': [('', '<f8')], 'stream': 1, 'version': 3, 'strides': None, 'data': (140614112903168, False)}
>>> a[1:].__cuda_array_interface__
{'shape': (2,), 'typestr': '<f8', 'descr': [('', '<f8')], 'stream': 1, 'version': 3, 'strides': None, 'data': (140614112903176, False)}

At face value, it seems like cudf is doing what it's supposed to do. Perhaps there's something weird going on here w/ the conversions / memory reuse / array creation inside cuml before the data makes it to the c++ layer? It would probably benefit to do a similar analysis of the pointers inside the fit() function for linear regression.

@Nanthini10
Copy link
Contributor

Thank you that's really helpful, let me dig into what's going on with the data within the fit call!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CUDA / C++ CUDA issue
Projects
None yet
Development

No branches or pull requests

4 participants