Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] cuml.dask.naive_bayes now fails if n_chunks in X and y are >1 #3150

Closed
VibhuJawa opened this issue Nov 17, 2020 · 1 comment · Fixed by #3152
Closed

[BUG] cuml.dask.naive_bayes now fails if n_chunks in X and y are >1 #3150

VibhuJawa opened this issue Nov 17, 2020 · 1 comment · Fixed by #3152
Labels
bug Something isn't working

Comments

@VibhuJawa
Copy link
Member

VibhuJawa commented Nov 17, 2020

Describe the bug
cuml.dask.naive_bayes fails if n_chunks in X and y are >1

Steps/Code to reproduce bug

import cupy as cp
import dask
import numpy as np
import dask.array as da

from dask_cuda import LocalCUDACluster
from dask.distributed import Client

from cupy.sparse import csr_matrix as cp_csr_matrix
from scipy.sparse import csr_matrix as scipy_csr_matrix


from cuml.dask.naive_bayes import MultinomialNB
cluster = LocalCUDACluster()
client = Client(cluster)


# Testing Util Functions
def generate_dask_array(np_array, n_parts):
    """
        Creates a dask array from a numpy 2d array
    """
    n_samples = np_array.shape[0]
    n_samples_per_part = int(n_samples / n_parts)
    chunks = [n_samples_per_part] * n_parts
    samples_last_row = n_samples - ((n_parts - 1) * n_samples_per_part)
    chunks[-1] = samples_last_row
    chunks = tuple(chunks)
    return da.from_array(np_array, chunks=(chunks, -1))


def create_cp_sparse_ar_from_dense_np_ar(ar, dtype=np.float32):
    """
        Creates a gpu array from a dense cpu array
    """
    return cp_csr_matrix(scipy_csr_matrix(ar), dtype=dtype)


def create_cp_sparse_dask_array(np_ar, n_parts):
    """
        Creates a sparse gpu dask array from the given numpy array
    """
    ar = generate_dask_array(np_ar, n_parts)
    meta = dask.array.from_array(cp_csr_matrix(cp.zeros(1, dtype=cp.float32)))
    ar = ar.map_blocks(create_cp_sparse_ar_from_dense_np_ar, meta=meta)
    return ar

X = np.array([[0,0,0,1],[1,0,0,1],[1,0,0,0]])
X = create_cp_sparse_dask_array(X,3)
y = dask.array.from_array([1,0,0], asarray=False,
                      fancy=False,chunks=(1)).astype(cp.int32)

print(y.chunks)
print(X.chunks)


# Train model
# (Main reproduder)
model = MultinomialNB()
model.fit(X, y)

Log on Worker:

((1, 1, 1),)
((1, 1, 1), (4,))


distributed.worker - WARNING -  Compute Failed
Function:  _merge_counts_to_model
args:      ([MultinomialNB(alpha=1.0, fit_prior=True, output_type='input', handle=None, verbose=4), MultinomialNB(alpha=1.0, fit_prior=True, output_type='input', handle=None, verbose=4)])
kwargs:    {}
Exception: AttributeError()

Log on client

[I] [13:26:22.014071] [<Future: finished, type: cuml.MultinomialNB, key: _fit-4ce30dac-2704-46c9-ba30-2c45e8528a95>]
[I] [13:26:22.014217] [<Future: finished, type: cuml.MultinomialNB, key: _fit-42163c36-d52c-4f9c-848a-631e0ba2589c>]
[I] [13:26:22.014552] [<Future: finished, type: cuml.MultinomialNB, key: _fit-14dca813-be3f-4878-a114-2aa3873e51fe>]
[I] [13:26:22.016543] [<Future: finished, type: cuml.MultinomialNB, key: _fit-4ce30dac-2704-46c9-ba30-2c45e8528a95>]
[I] [13:26:22.016722] [<Future: finished, type: cuml.MultinomialNB, key: _fit-42163c36-d52c-4f9c-848a-631e0ba2589c>]
[I] [13:26:22.016829] [<Future: finished, type: cuml.MultinomialNB, key: _fit-14dca813-be3f-4878-a114-2aa3873e51fe>]
[I] [13:26:22.018411] [Delayed('_merge_counts_to_model-29c8ddbb-e330-4ccc-887f-bece9cc206e4')]
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-20-3c2c439107a9> in <module>
      2 
      3 model = MultinomialNB()
----> 4 model.fit(X, y)

/raid/vjawa/conda/envs/rapids-tpcx-bb-nov-17-cupy-8/lib/python3.7/site-packages/cuml/common/memory_utils.py in cupy_rmm_wrapper(*args, **kwargs)
     91     def cupy_rmm_wrapper(*args, **kwargs):
     92         with cupy_using_allocator(rmm.rmm_cupy_allocator):
---> 93             return func(*args, **kwargs)
     94 
     95     # Mark the function as already wrapped

/raid/vjawa/conda/envs/rapids-tpcx-bb-nov-17-cupy-8/lib/python3.7/site-packages/cuml/dask/naive_bayes/naive_bayes.py in fit(self, X, y, classes)
    194                                     pure=False)
    195 
--> 196         wait_and_raise_from_futures([models])
    197 
    198         self._set_internal_model(models)

/raid/vjawa/conda/envs/rapids-tpcx-bb-nov-17-cupy-8/lib/python3.7/site-packages/cuml/dask/common/utils.py in wait_and_raise_from_futures(futures)
    152     """
    153     wait(futures)
--> 154     raise_exception_from_futures(futures)
    155     return futures
    156 

/raid/vjawa/conda/envs/rapids-tpcx-bb-nov-17-cupy-8/lib/python3.7/site-packages/cuml/dask/common/utils.py in raise_exception_from_futures(futures)
    142     if errs:
    143         raise RuntimeError("%d of %d worker jobs failed: %s" % (
--> 144             len(errs), len(futures), ", ".join(map(str, errs))
    145             ))
    146 

RuntimeError: 1 of 1 worker jobs failed:

Expected behavior
I would expect it to work as it used to work in earlier nightly.

It fails on below environment:

cuml                      0.17.0a201117   cuda10.2_py37_g59497faa0_119    rapidsai-nightly
libcuml                   0.17.0a201117   cuda10.2_g59497faa0_119    rapidsai-nightly
libcumlprims              0.17.0a201030   cuda10.2_g1fa28a5_8    rapidsai-nightly
cudf                      0.17.0a201117   cuda_10.2_py37_g69203f1e82_275    rapidsai-nightly
dask-cudf                 0.17.0a201117   py37_g69203f1e82_275    rapidsai-nightly
libcudf                   0.17.0a201117   cuda10.2_g69203f1e82_275    rapidsai-nightly

It works on the below environment:

cuml                      0.17.0a201112   cuda10.2_py37_g708ae47a3_115    rapidsai-nightly
libcuml                   0.17.0a201112   cuda10.2_g708ae47a3_115    rapidsai-nightly
libcumlprims              0.17.0a201030   cuda10.2_g1fa28a5_8    rapidsai-nightly
# packages in environment at /raid/vjawa/conda/envs/rapids-tpcx-bb-nov-12-cupy-8-libcudf-tagged:
cudf                      0.17.0a201112   cuda_10.2_py37_g3fc8142d62_225    rapidsai-nightly
dask-cudf                 0.17.0a201112   py37_g3fc8142d62_225    rapidsai-nightly
libcudf                   0.17.0a201112   cuda10.2_g3fc8142d62_225    rapidsai-nightly

Additional context
This should likely be the cause of rapidsai/gpu-bdb#140 .

CC: @beckernick , @dantegd , @cjnolet

@VibhuJawa VibhuJawa added ? - Needs Triage Need team to review and classify bug Something isn't working labels Nov 17, 2020
@dantegd
Copy link
Member

dantegd commented Nov 17, 2020

@VibhuJawa found the issue, it is a small thing that wasn't updated in the PR of the cumlarray descriptor. Will have a fix in a bit. Thanks for the log and reproducer, that made it super easy to find! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants