Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Fix access to attributes of individual NB objects in dask NB #3152

Merged
merged 5 commits into from
Nov 20, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,9 +56,10 @@
- PR #3086: Reverting FIL Notebook Testing
- PR #3114: Fixed a typo in SVC's predict_proba AttributeError
- PR #3117: Fix two crashes in experimental RF backend
- PR #3119: Fix memset args for benchmark
- PR #3119: Fix memset args for benchmark
- PR #3130: Return Python string from `dump_as_json()` of RF
- PR #3136: Fix stochastic gradient descent example
- PR #3152: Fix access to attributes of individual NB objects in dask NB
- PR #3156: Force local conda artifact install

# cuML 0.16.0 (Date TBD)
Expand Down
4 changes: 2 additions & 2 deletions python/cuml/dask/naive_bayes/naive_bayes.py
Original file line number Diff line number Diff line change
Expand Up @@ -136,8 +136,8 @@ def _merge_counts_to_model(models):
modela = first(models)

for model in models[1:]:
modela._feature_count_ += model._feature_count_
modela._class_count_ += model._class_count_
modela.feature_count_ += model.feature_count_
modela.class_count_ += model.class_count_
return modela

@staticmethod
Expand Down
25 changes: 21 additions & 4 deletions python/cuml/test/dask/test_naive_bayes.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,13 @@
# limitations under the License.
#


from cuml.test.dask.utils import load_text_corpus

from sklearn.metrics import accuracy_score
import cupy as cp
import dask.array

from cuml.dask.naive_bayes import MultinomialNB
from cuml.naive_bayes.naive_bayes import MultinomialNB as SGNB
from cuml.test.dask.utils import load_text_corpus
from sklearn.metrics import accuracy_score


def test_basic_fit_predict(client):
Expand Down Expand Up @@ -74,3 +74,20 @@ def test_score(client):
y_local = y.compute()

assert(accuracy_score(y_hat_local.get(), y_local) == score)


def test_model_multiple_chunks(client):
# tests naive_bayes with n_chunks being greater than one, related to issue
# https://github.com/rapidsai/cuml/issues/3150
X = cp.array([[0, 0, 0, 1], [1, 0, 0, 1], [1, 0, 0, 0]])

X = dask.array.from_array(X, chunks=((1, 1, 1), -1)).astype(cp.int32)
y = dask.array.from_array([1, 0, 0], asarray=False,
fancy=False, chunks=(1)).astype(cp.int32)

model = MultinomialNB()
model.fit(X, y)

# this test is a code coverage test, it is too small to be a numeric test,
# but we call score here to exercise the whole model.
assert(0 <= model.score(X, y) <= 1)