Scikit learn #4

ysimillides · 2019-02-18T12:04:26Z

Build is working (alongside the correct Python path with the change in Travis). 6 new models are implemented, but their traits still need correct defining.

ablaom · 2019-02-18T20:12:48Z

Great. I have added some tests, which fail when we replace your Xr and yr with real data (the Boston dataset). I guess you probably need to apply MLJBase.matrix to your data in fit and predict. Can you have a look please?

Otherwise, looking good. Still to do:

add trait functions ✔
wrap in a SVM submodule (the load_paths are then,
"MLJModels.ScikitLearn_.SVM.SVMClassifier" and so forth
add minimal doc string (can just link to the sklearn documentation
add clean! methods to check string hyper parameters make sense (eg, "auto" not "atou")

ysimillides · 2019-02-18T22:10:19Z

I'll have a closer look at this tomorrow so I can see the types of each task to see it's dispatched correct. Just a quick question, we seem to be mixing the Iris and Boston dataset? maybe this is introducing the error?

ablaom · 2019-02-19T03:27:05Z

Yiannis,

I've added more rigorous testing (tests on artificial linear data) and resolved your bugs. The main issues were:

failure to test regression fit methods on tabular data (you were testing on matrix data)
confusion about what the regressor/classifier distinction means

For the second, here is a quote from the guide:

The form of the target data y passed to fit depends on the kind of
supervised model. For "regressors", y is a vector, for "classifiers"
it must be a categorical vector. More precisely, the form is
determined by value returned by the trait output_kind that each model
type should define (see below):

`output_kind` return value	type of `y`
`:continuous`	`Vector{<:AbstractFloat}`
`:binary`	`CategoricalVector`
`:multiclass`	`CategoricalVector`
`:ordered_factor_finite`	`CategoricalVector`
`:ordered_factor_infinite`	`Vector{<:Integer}`

So y for a regressor is a Vector{<:AbstractFloat} (and there is none of that decoding jazz), while for a classifier y is a CategoricalVector and we need to worry about preservation of levels as described later in the guide.

Small note: there is no need to annotate the types of X, y and Xnew in your methods. You will always know what they are. And there is no point writing methods to deal with alternative form of data because they will be invisible to the higher level API.

You probably want to go over the changes I have made to src/ScikitLearn.jl.

See earlier comments for what is still left for you to do.

Thanks

ablaom · 2019-02-20T00:52:17Z

I'm sorry, but I am rethinking the "wrap in submodule" plan. If need be, we might need to rename the src file ScikitLearnSVM.jl but leave it for now, thanks.

ysimillides · 2019-02-20T17:02:47Z

That's fine about the module. I'll fix the other things and add the DT in due course. I'll also go over the XGBoost code and make similar changes, so classifiers work on Categorical data and regressors on plain vectors their also.

ablaom · 2019-02-20T21:58:42Z

Okay. I'm ready to merge as soon as the doc string and clean! methods are done.

tlienart · 2019-02-21T04:08:16Z

src/ScikitLearn.jl

+@sk_import svm: NuSVR
+@sk_import svm: LinearSVR
+
+mutable struct SVMClassifier{Any} <: MLJBase.Deterministic{Any}


why {Any} ?

The only stict requirement of the parameter R in Deterministic{R} or Probabilistic{R} is that the fitresult returned by fit is alway of type R. (In tests, one should do @test fitresult isa fitresult_type(model) . So Any always works.

To get optimal performance in ensembling, you want R to be concrete (the "ensemble" is a Vector{R}), although it sometimes has to be a "small" union type (see, eg, DecisionTreeRegressor). If you like, you could take a look at ensembling and see if the type can be inferred. This code (refactored from Koala) is quite old. Finding out R is a bit of a pain point. To make R concrete in the tree case you have to introduce target_type as explicit type parameter for the model struct.

ok thanks for the explanation, it makes sense

ablaom · 2019-02-24T19:54:27Z

Please separate move the additions to DecisionTree.jl to a separate PR - they have nothing to do with the current Scikitlearn PR.

ablaom · 2019-02-25T20:27:49Z

Thanks.

ysimillides and others added 5 commits February 18, 2019 11:25

Add files via upload

aea84ec

Add files via upload

359bb79

Add files via upload

09ad7ea

Update .travis.yml

10b92f8

Add tests, including real regression data (which now fails)

5b98e06

ablaom added 2 commits February 19, 2019 14:11

add test, train split for regressors; regressors still failing tests

4b00deb

Add more rigorous testing and fix bugs

1c3299b

tlienart reviewed Feb 21, 2019

View reviewed changes

ysimillides added 2 commits February 22, 2019 12:59

Add files via upload

03b209a

Add files via upload

93a617e

ysimillides force-pushed the ScikitLearn branch from 1b2fd07 to 93a617e Compare February 25, 2019 13:51

ablaom merged commit 02497bb into master Feb 26, 2019

ablaom mentioned this pull request Feb 26, 2019

Implement K-Nearest Neighbours JuliaAI/MLJ.jl#87

Closed

ablaom deleted the ScikitLearn branch June 18, 2019 13:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scikit learn #4

Scikit learn #4

ysimillides commented Feb 18, 2019

ablaom commented Feb 18, 2019 •

edited

Loading

ysimillides commented Feb 18, 2019

ablaom commented Feb 19, 2019

ablaom commented Feb 20, 2019

ysimillides commented Feb 20, 2019

ablaom commented Feb 20, 2019 •

edited

Loading

tlienart Feb 21, 2019

ablaom Feb 21, 2019

tlienart Feb 21, 2019

ablaom commented Feb 24, 2019

ablaom commented Feb 25, 2019

Scikit learn #4

Scikit learn #4

Conversation

ysimillides commented Feb 18, 2019

ablaom commented Feb 18, 2019 • edited Loading

ysimillides commented Feb 18, 2019

ablaom commented Feb 19, 2019

ablaom commented Feb 20, 2019

ysimillides commented Feb 20, 2019

ablaom commented Feb 20, 2019 • edited Loading

tlienart Feb 21, 2019

Choose a reason for hiding this comment

ablaom Feb 21, 2019

Choose a reason for hiding this comment

tlienart Feb 21, 2019

Choose a reason for hiding this comment

ablaom commented Feb 24, 2019

ablaom commented Feb 25, 2019

ablaom commented Feb 18, 2019 •

edited

Loading

ablaom commented Feb 20, 2019 •

edited

Loading