-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scikit-learn style krige parameter optimisation #24
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your Pull Request! Adding a scikit-learn compatible API could definitely be interesting. A few comments on the code,
- First of all, we will probably not be able to add scikit-learn as a hard dependency ( because it's a massive library and in this PR you are using just a few classes from it). So this PR should work with or without scikit-learn installed, this means that,
- For defining the scikit-learn compatible estimator (where you need
BaseEstimator
andRegressorMixin
classes from scikit-learn) the solution could be something similar to what is done in xgboost or here. Alternatively, this could be addressed in a subsequent PR, and just raising an ImportError when this module is imported without scikit learn could be fine (and raising aSkipTest
in the unit tests).
2. Everything elsePipleline
,GridSearchCV
etc, should not be imported in PyKrige, but rather illustrated in a separate example.
- For defining the scikit-learn compatible estimator (where you need
- IMO PyKrige can just expose a scikit-learn compatible Kriging class, everything else (pipelining, Cross-Validation, any other form of pre-post processing) should be up to the user. In particular, this means that we could maybe just move
pykrige/optimise/pipeline.py
to e.g.examples/krige_cv.py
- Why do you need to wrap the Kriging class in a pipeline? GridSearchCV should work directly on the Kriging class, shouldn't it?
- Regarding filenames, it might be best to,
-
move
pykrige/optimise/krige.py
topykrige/sklearn.py
(orsklearn_compat.py
)? -
move
pykrige/optimise/pipeline.py
toexamples/krige_cv.py
(or any other appropriate name )- and remove there anything related to
ConfigParser
, saving to CSV (and if possible pipeline), as that a bit too specific. Just printing the output should be fine.
- and remove there anything related to
-
remove
pykrige/optimise/README.md
alltogether. I think it would be better, to a) add a section at the end of the readme on how to run this example b) in the example link to http://scikit-learn.org/stable/modules/cross_validation.html
-
What do you think?
- python: "3.4" | ||
env: DEPS="numpy=1.10.4 scipy=0.17 cython nose matplotlib" | ||
env: DEPS="numpy=1.11.2 scipy=0.17 cython nose matplotlib scikit-learn=0.18.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The numpy==1.10.4
was actually intentional here, to test that PyKrige works with multiple numpy versions not just the latest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would you do that? I always use a virtualenv
for python, even on a supercomputer. Any particular reason why you would not upgrade from numpy 1.10.4?
The reason for the change in numpy version is that scikit-learn=0.18.1
requires numpy 1.11.+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rth Some good points there. Thanks.
Point 1 is very sensible.
Point 2: This is what that Krige
class is?
Point 3. the wrapper Krige
class makes the pykrige
classes scikit-learn
compatible.
Point 4, you are spot on. That pipeline.py
is just an example of how to use the Krige
class. We can rename it something like you suggest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason to support (and test) multiple version of dependencies is to reduce the chance of a dependency conflict (e.g. Package A depends on package C-v1
, package B depends on C-v2
, and you need A and B). I also use the latest numpy version, but in general we cannot assume that (e.g. in large legacy systems with a significant cost of upgrading). For instance, scikit-learn will install numpy-1.11
if it's not present but it supports any numpy versions starting from 1.6.1
(and also test several version in Travis CI. Here we just test the 2 latest numpy version 0.10 for PY<3.5 and 0.11 for PY 3.5.
Point 2: I was referring to the new Krige class you created in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rth There is no dependency conflict as all the tests pass with the latest numpy version. Scikit-learn 0.18.+ has many improvements and requires numpy 1.11.+.
Even on a legacy system you can use a virtualenv
. Has there been any problem with creating the pykrige virtualenv
on a legacy system?
The Krige
class is the convenience class that makes the pykrige OrdinaryKriging
and UniversalKriging
classes scikit-learn
compatible.
PCKG_DAT = {'pykrige': ['README.md', 'CHANGELOG.md', 'LICENSE.txt', 'MANIFEST.in', | ||
join('test_data', '*.txt'), join('test_data', '*.asc')]} | ||
REQ = ['numpy', 'scipy', 'matplotlib'] | ||
REQ = ['numpy', 'scipy', 'matplotlib', 'sklearn'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sklearn
shouldn't be added to mandatory requirements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, I can look into that.
I will use a new PR once I have managed to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! As this is actually a lot of code, feel free to split this into several smaller PRs if you prefer. Thanks again for contributing :)
@@ -21,19 +21,20 @@ | |||
DESC = 'Kriging Toolkit for Python' | |||
LDESC = 'PyKrige is a kriging toolkit for Python that supports two- and ' \ | |||
'three-dimensional ordinary and universal kriging.' | |||
PACKAGES = ['pykrige'] | |||
PACKAGES = ['pykrige', 'pykrige.optimise'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be just one package pykrige
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would you put such a restriction? It just seems natural to add something like this in a subpackage, as not too many people will use this and is not part of core functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I was just wondering why we need this, since they are in the same setup.py
both will be installed at the same time anyway (and a single package PyKrige
is installed when you run this version of setup.py
). So, if you don't add this, the result would be the same, wouldn't it? Only users who need pykrige.optimize
(or rather pykrige.sklearn
) would import this, but it can still be installed by default?
P.S: @basaks BTW, have you checked if this new Kriging estimator passes |
@rth No, I have not. However, the fact that GridSearchCV works with this class is a proof in in itself of that. |
I actually have one such project (depending on numpy 0.10¹) and using PyKrige, so I'm -1 on this, though would be happy to hear other opinions.
Could you provide a url link confirming that? ) ¹Even if there might be almost no backward incompatible changes between 0.10 and 0.11.. |
Yes, it;s not necessary. I checked requirements for scikit-learn. It's just that I will put together another PR soon :) edit: it's not |
I can build a
|
Yes, it looks like conda only builds scikit-learn-0.18.1 with numpy 1.11 (probably because they already have to build
Maybe what we could do for now in travis is to add scikit-learn 0.18.1 for the Python 3.5 line (that has numpy 1.11) and just not install scikit-learn for other python versions / numpy versions. Then skip the tests that need sklearn by raising a |
@rth I tried to get Anyway all your orginal deps are working in both python 3 and python 2 environments. I have addressed all your concerns with the original PR. |
@rth Let me know if I have missed anything. I did not use another PR as the scope of the PR is still the same, i.e., I did not break it up into many PRs and hopefully managed to address all your concerns. |
@basaks Sorry for the late response. Yes, conda can be frustrating sometimes. Just a few last comments,
Thanks again for this PR. What do you think? @bsmurphy Would you have time to have a look at this PR, to know if you are OK with it? Thanks! |
@rth Excellent suggestions. I agree to all of them. I have made all changes your suggest except that I could not rename the |
Thanks, @basaks ! This looks good to me. Will just wait a few more days before merging in case bsmurphy (or anybody else interested) wants to have a look at this PR. |
I am using this parameter optimisation in a project. May be someone will benefit from this.