Use MultinomialNB for iris classification (Int -> Float) #41

ayush1999 · 2019-02-21T09:18:42Z

I'm trying to use MultinomialNB for iris classification, but it looks like fit for MultinomialNB expects X to be Matrix{Int64}, which is not the case for my dataset. (Similar for the predict function.). Won't it be better to consider the type to be Float64, since it'd be more generic?

The text was updated successfully, but these errors were encountered:

dfdx · 2019-02-22T22:35:46Z

As far as I remember, my initial idea was that multinomial NB uses object counts as features, and counts can't be non-integer. But now I realize it might be too limiting, so I'll try to update the code and see if tests are passing.

ablaom · 2019-02-25T20:44:12Z

Perhaps I'm missing something, but I don't believe it makes sense to apply MultinomialNB to the iris data. As @dfdx implies, the input data for a MultinomialNB is categorical not continuous.

dfdx · 2019-03-04T22:56:36Z

Tried Float64/Real for feature "counts" in multinomial-nb-with-real-counts branch - technically it works, but I'm still somewhat suspicious about the concept. At least I need to read through the code once again checking what real counts may lead to.

ablaom · 2019-03-04T23:33:18Z

A related issue: It would be useful if the "regularisation" parameter alpha in Multinomial case is allowed to be continuous (Lidstone smoothing). Currently it must be discrete, I think. I understand that 0 < alpha < 1 is a common choice.

dfdx · 2019-03-05T23:36:21Z

After several iterations of updates I now think that it might be easier to rewrite the code to Julia 1.x entirely (current version is essentially written for Julia 0.4 with compatibility adapters). This leads to another question - whether keeping a separate repo is even reasonable. If I were to re-implement current algorithms in MLJModels, how would you estimate effort for someone new to MLJ infrastructure?

ablaom · 2019-03-06T21:37:23Z

Well, that would be awesome.

If I were to re-implement current algorithms in MLJModels, how would you estimate effort for someone new to MLJ infrastructure?

Well, pretty easy, I expect. We've already written the wrapping code for Multinomial and Gauss here. So, if you kept your API the same, this would be a copy and paste operation. There are some simplifications that could be made, if you are working from scratch, but I'd be happy to forgo these. And in any case, I will be happy to provide guidance.

(There is some dancing around in our wrap because the target is categorical and MLJ is fussy about preserving unseen classes in the pool in the (probabilistic) predictions. That is, the training target "election_winner" may not have "Green Party" in the training set, but if "Green Party" is in the target pool, then the predicted distribution must have "Green Party" in its support (with probability zero).)

This leads to another question - whether keeping a separate repo is even reasonable.

From our point of view, native implementations of the MLJ interface are preferred. One reason is to better isolate testing. Another, is to distribute responsibility for maintenance and documentation. Native implementation is simple. Your NaiveBayes.jl module imports MLJBase, implements the API, and then you raise an issue at MLJRegistry requesting we make your package accessible to all MLJ users (you don't need to do this to use NB with MLJ, just to make your models findable and loadable by MLJ users that don't know about your package). See the end of the guide for details.

Have you given any thought on how to handle sparse data? (For the sake of expediency, our wrap just converts the generic tabular input provided by the MLJ user into a matrix and feeds that to NB.) If I remember correctly, you presently allow dictionary input, but I guess there are better alternatives by now?

ayush1999 mentioned this issue Feb 25, 2019

[WIP] Initial NaiveBayes classifier JuliaAI/MLJModels.jl#5

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use MultinomialNB for iris classification (Int -> Float) #41

Use MultinomialNB for iris classification (Int -> Float) #41

ayush1999 commented Feb 21, 2019

dfdx commented Feb 22, 2019

ablaom commented Feb 25, 2019

dfdx commented Mar 4, 2019

ablaom commented Mar 4, 2019

dfdx commented Mar 5, 2019

ablaom commented Mar 6, 2019

Use MultinomialNB for iris classification (Int -> Float) #41

Use MultinomialNB for iris classification (Int -> Float) #41

Comments

ayush1999 commented Feb 21, 2019

dfdx commented Feb 22, 2019

ablaom commented Feb 25, 2019

dfdx commented Mar 4, 2019

ablaom commented Mar 4, 2019

dfdx commented Mar 5, 2019

ablaom commented Mar 6, 2019