-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scikit learn #4
Scikit learn #4
Conversation
Great. I have added some tests, which fail when we replace your Xr and yr with real data (the Boston dataset). I guess you probably need to apply MLJBase.matrix to your data in fit and predict. Can you have a look please? Otherwise, looking good. Still to do:
|
I'll have a closer look at this tomorrow so I can see the types of each task to see it's dispatched correct. Just a quick question, we seem to be mixing the Iris and Boston dataset? maybe this is introducing the error? |
Yiannis, I've added more rigorous testing (tests on artificial linear data) and resolved your bugs. The main issues were:
For the second, here is a quote from the guide: The form of the target data
So Small note: there is no need to annotate the types of X, y and Xnew in your methods. You will always know what they are. And there is no point writing methods to deal with alternative form of data because they will be invisible to the higher level API. You probably want to go over the changes I have made to src/ScikitLearn.jl. See earlier comments for what is still left for you to do. Thanks |
I'm sorry, but I am rethinking the "wrap in submodule" plan. If need be, we might need to rename the src file |
That's fine about the module. I'll fix the other things and add the DT in due course. I'll also go over the XGBoost code and make similar changes, so classifiers work on Categorical data and regressors on plain vectors their also. |
Okay. I'm ready to merge as soon as the doc string and clean! methods are done. |
@sk_import svm: NuSVR | ||
@sk_import svm: LinearSVR | ||
|
||
mutable struct SVMClassifier{Any} <: MLJBase.Deterministic{Any} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why {Any}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only stict requirement of the parameter R
in Deterministic{R}
or Probabilistic{R}
is that the fitresult
returned by fit is alway of type R
. (In tests, one should do @test fitresult isa fitresult_type(model)
. So Any
always works.
To get optimal performance in ensembling, you want R
to be concrete (the "ensemble" is a Vector{R}
), although it sometimes has to be a "small" union type (see, eg, DecisionTreeRegressor). If you like, you could take a look at ensembling and see if the type can be inferred. This code (refactored from Koala) is quite old. Finding out R
is a bit of a pain point. To make R
concrete in the tree case you have to introduce target_type
as explicit type parameter for the model struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok thanks for the explanation, it makes sense
Please separate move the additions to DecisionTree.jl to a separate PR - they have nothing to do with the current Scikitlearn PR. |
1b2fd07
to
93a617e
Compare
Thanks. |
Build is working (alongside the correct Python path with the change in Travis). 6 new models are implemented, but their traits still need correct defining.