scikit-learn style krige parameter optimisation #24

basaks · 2016-12-04T12:40:02Z

I am using this parameter optimisation in a project. May be someone will benefit from this.

rth

Thanks a lot for your Pull Request! Adding a scikit-learn compatible API could definitely be interesting. A few comments on the code,

First of all, we will probably not be able to add scikit-learn as a hard dependency ( because it's a massive library and in this PR you are using just a few classes from it). So this PR should work with or without scikit-learn installed, this means that,
1. For defining the scikit-learn compatible estimator (where you need BaseEstimator and RegressorMixin classes from scikit-learn) the solution could be something similar to what is done in xgboost or here. Alternatively, this could be addressed in a subsequent PR, and just raising an ImportError when this module is imported without scikit learn could be fine (and raising a SkipTest in the unit tests).
  2. Everything else Pipleline, GridSearchCV etc, should not be imported in PyKrige, but rather illustrated in a separate example.
IMO PyKrige can just expose a scikit-learn compatible Kriging class, everything else (pipelining, Cross-Validation, any other form of pre-post processing) should be up to the user. In particular, this means that we could maybe just move pykrige/optimise/pipeline.py to e.g. examples/krige_cv.py
Why do you need to wrap the Kriging class in a pipeline? GridSearchCV should work directly on the Kriging class, shouldn't it?
Regarding filenames, it might be best to,
- move pykrige/optimise/krige.py to pykrige/sklearn.py (or sklearn_compat.py)?
- move pykrige/optimise/pipeline.py to examples/krige_cv.py (or any other appropriate name )
  - and remove there anything related to ConfigParser, saving to CSV (and if possible pipeline), as that a bit too specific. Just printing the output should be fine.
- remove pykrige/optimise/README.md alltogether. I think it would be better, to a) add a section at the end of the readme on how to run this example b) in the example link to http://scikit-learn.org/stable/modules/cross_validation.html

What do you think?

rth · 2016-12-05T08:28:56Z

.travis.yml

    - python: "3.4"                                                                                         
-      env: DEPS="numpy=1.10.4 scipy=0.17 cython nose matplotlib"
+      env: DEPS="numpy=1.11.2 scipy=0.17 cython nose matplotlib scikit-learn=0.18.1"


The numpy==1.10.4 was actually intentional here, to test that PyKrige works with multiple numpy versions not just the latest.

why would you do that? I always use a virtualenv for python, even on a supercomputer. Any particular reason why you would not upgrade from numpy 1.10.4?

The reason for the change in numpy version is that scikit-learn=0.18.1 requires numpy 1.11.+

@rth Some good points there. Thanks.
Point 1 is very sensible.
Point 2: This is what that Krige class is?
Point 3. the wrapper Krige class makes the pykrige classes scikit-learn compatible.
Point 4, you are spot on. That pipeline.py is just an example of how to use the Krige class. We can rename it something like you suggest.

The main reason to support (and test) multiple version of dependencies is to reduce the chance of a dependency conflict (e.g. Package A depends on package C-v1, package B depends on C-v2, and you need A and B). I also use the latest numpy version, but in general we cannot assume that (e.g. in large legacy systems with a significant cost of upgrading). For instance, scikit-learn will install numpy-1.11 if it's not present but it supports any numpy versions starting from 1.6.1 (and also test several version in Travis CI. Here we just test the 2 latest numpy version 0.10 for PY<3.5 and 0.11 for PY 3.5.

Point 2: I was referring to the new Krige class you created in this PR.

@rth There is no dependency conflict as all the tests pass with the latest numpy version. Scikit-learn 0.18.+ has many improvements and requires numpy 1.11.+.

Even on a legacy system you can use a virtualenv. Has there been any problem with creating the pykrige virtualenv on a legacy system?

The Krige class is the convenience class that makes the pykrige OrdinaryKriging and UniversalKriging classes scikit-learn compatible.

rth · 2016-12-05T08:33:46Z

setup.py

 PCKG_DAT = {'pykrige': ['README.md', 'CHANGELOG.md', 'LICENSE.txt', 'MANIFEST.in',
                        join('test_data', '*.txt'), join('test_data', '*.asc')]}
-REQ = ['numpy', 'scipy', 'matplotlib']
+REQ = ['numpy', 'scipy', 'matplotlib', 'sklearn']


sklearn shouldn't be added to mandatory requirements.

Fair enough, I can look into that.
I will use a new PR once I have managed to do this.

Thanks! As this is actually a lot of code, feel free to split this into several smaller PRs if you prefer. Thanks again for contributing :)

rth · 2016-12-05T08:37:51Z

setup.py

@@ -21,19 +21,20 @@
 DESC = 'Kriging Toolkit for Python'
 LDESC = 'PyKrige is a kriging toolkit for Python that supports two- and ' \
        'three-dimensional ordinary and universal kriging.'
-PACKAGES = ['pykrige']
+PACKAGES = ['pykrige', 'pykrige.optimise']


It should be just one package pykrige.

Why would you put such a restriction? It just seems natural to add something like this in a subpackage, as not too many people will use this and is not part of core functionality.

Well, I was just wondering why we need this, since they are in the same setup.py both will be installed at the same time anyway (and a single package PyKrige is installed when you run this version of setup.py). So, if you don't add this, the result would be the same, wouldn't it? Only users who need pykrige.optimize (or rather pykrige.sklearn) would import this, but it can still be installed by default?

rth · 2016-12-05T10:04:03Z

P.S: @basaks BTW, have you checked if this new Kriging estimator passes check_estimator from sklearn.utils.estimator_checks (cf "Rolling your own estimator" docs)? Even if it doesn't it's probably OK as that test is not completely general (scikit-learn/scikit-learn#6715 ) but it could be useful to detect API inconsistencies...

basaks · 2016-12-05T11:15:26Z

@rth No, I have not. However, the fact that GridSearchCV works with this class is a proof in in itself of that.
I will run it through those further checks and use another PR. Thank you for pointing this out.

rth · 2016-12-06T08:13:17Z

There is no dependency conflict as all the tests pass with the latest numpy version.
Even on a legacy system you can use a virtualenv. Has there been any problem with creating the pykrige virtualenv on a legacy system?

virtualenv doesn't solve dependency issues. To give a more practical example, say I have a work project developed a year ago with scikit-learn-0.17.1 (at the time) and numpy 1.10 (in a virtualenv). It works fine, and now I want to add Kriging as a new feature. But then if we made PyKrige depend on scikit-learn-0.18.1 and numpy 1.11, I would be stuck. I would have to spend time upgrading my whole project to these versions, or testing by myself that PyKrige works with the previous versions even if they are not supported, or updating PyKrige to work with scikit-learn 0.17.1 and numpy 0.11.+ . All of which are probably useful but not what I wanted / was funded to do. This is the reason to reduce the dependencies to a strict minimum and to support multiple versions of those. It's the same reason why multiple Python versions are typically supported.

I actually have one such project (depending on numpy 0.10¹) and using PyKrige, so I'm -1 on this, though would be happy to hear other opinions.

Scikit-learn 0.18.+ [..] requires numpy 1.11.+.

Could you provide a url link confirming that? )

¹Even if there might be almost no backward incompatible changes between 0.10 and 0.11..

basaks · 2016-12-06T08:36:36Z

Could you provide a url link confirming that? )

Yes, it;s not necessary. I checked requirements for scikit-learn. It's just that pip pulls in the latest numpy by default when you install scikit-learn.

I will put together another PR soon :)

edit: it's not pip, it's the conda package manager that requires numpy=1.11.+ with sciki-learn=1.18.+. See details in comment below.

basaks · 2016-12-06T12:15:44Z

I can build a virtrualenv with sciki-learn 0.18.1 and numpy 1.10.4 on my pc, but does work due to the anaconda packaging during Travis build:

$ python --version
Python 3.4.2
$ pip --version
pip 6.0.7 from /home/travis/virtualenv/python3.4.2/lib/python3.4/site-packages (python 3.4)
before_install.1
0.34s$ wget http://repo.continuum.io/miniconda/Miniconda${TRAVIS_PYTHON_VERSION:0:1}-latest-Linux-x86_64.sh -O miniconda.sh
--2016-12-06 11:22:13--  http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
Resolving repo.continuum.io (repo.continuum.io)... 104.16.19.10, 104.16.18.10, 2400:cb00:2048:1::6810:120a, ...
Connecting to repo.continuum.io (repo.continuum.io)|104.16.19.10|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh [following]
--2016-12-06 11:22:13--  https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
Connecting to repo.continuum.io (repo.continuum.io)|104.16.19.10|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 33905474 (32M) [application/octet-stream]
Saving to: `miniconda.sh'
100%[======================================>] 33,905,474   143M/s   in 0.2s    
2016-12-06 11:22:13 (143 MB/s) - `miniconda.sh' saved [33905474/33905474]
before_install.2
0.01s$ chmod +x miniconda.sh
before_install.3
6.74s$ ./miniconda.sh -b
PREFIX=/home/travis/miniconda3
installing: python-3.5.2-0 ...
installing: conda-env-2.6.0-0 ...
installing: openssl-1.0.2j-0 ...
installing: pycosat-0.6.1-py35_1 ...
installing: readline-6.2-2 ...
installing: requests-2.11.1-py35_0 ...
installing: ruamel_yaml-0.11.14-py35_0 ...
installing: sqlite-3.13.0-0 ...
installing: tk-8.5.18-0 ...
installing: xz-5.2.2-0 ...
installing: yaml-0.1.6-0 ...
installing: zlib-1.2.8-3 ...
installing: conda-4.2.12-py35_0 ...
installing: pycrypto-2.6.1-py35_4 ...
installing: pip-8.1.2-py35_0 ...
installing: wheel-0.29.0-py35_0 ...
installing: setuptools-27.2.0-py35_0 ...
Python 3.5.2 :: Continuum Analytics, Inc.
creating default environment...
installation finished.
before_install.4
0.00s$ export PATH=/home/travis/miniconda${TRAVIS_PYTHON_VERSION:0:1}/bin:$PATH
before_install.5
2.47s$ conda update --yes conda
Fetching package metadata .......
Solving package specifications: ..........
Package plan for installation in environment /home/travis/miniconda3:
The following packages will be downloaded:
    package                    |            build
    ---------------------------|-----------------
    conda-4.2.13               |           py35_0         402 KB
The following packages will be UPDATED:
    conda: 4.2.12-py35_0 --> 4.2.13-py35_0
Fetching packages ...
conda-4.2.13-p 100% || Time: 0:00:00  25.61 MB/s
Extracting packages ...
[      COMPLETE      ]|| 100%
Unlinking packages ...
[      COMPLETE      ]|| 100%
Linking packages ...
[      COMPLETE      ]|| 100%
before_install.6
0.56s$ conda info -a
Current conda install:
               platform : linux-64
          conda version : 4.2.13
       conda is private : False
      conda-env version : 4.2.13
    conda-build version : not installed
         python version : 3.5.2.final.0
       requests version : 2.11.1
       root environment : /home/travis/miniconda3  (writable)
    default environment : /home/travis/miniconda3
       envs directories : /home/travis/miniconda3/envs
          package cache : /home/travis/miniconda3/pkgs
           channel URLs : https://repo.continuum.io/pkgs/free/linux-64
                          https://repo.continuum.io/pkgs/free/noarch
                          https://repo.continuum.io/pkgs/pro/linux-64
                          https://repo.continuum.io/pkgs/pro/noarch
            config file : None
           offline mode : False
# conda environments:
#
root                  *  /home/travis/miniconda3
sys.version: 3.5.2 |Continuum Analytics, Inc.| (defau...
sys.prefix: /home/travis/miniconda3
sys.executable: /home/travis/miniconda3/bin/python
conda location: /home/travis/miniconda3/lib/python3.5/site-packages/conda
conda-build: None
conda-env: /home/travis/miniconda3/bin/conda-env
user site dirs: 
CIO_TEST: <not set>
CONDA_DEFAULT_ENV: <not set>
CONDA_ENVS_PATH: <not set>
LD_LIBRARY_PATH: <not set>
PATH: /home/travis/miniconda3/bin:/home/travis/virtualenv/python3.4.2/bin:/home/travis/bin:/home/travis/.local/bin:/home/travis/.rvm/gems/ruby-1.9.3-p551/bin:/home/travis/.rvm/gems/ruby-1.9.3-p551@global/bin:/home/travis/.rvm/rubies/ruby-1.9.3-p551/bin:/opt/python/2.7.9/bin:/opt/python/2.6.9/bin:/opt/python/3.4.2/bin:/opt/python/3.3.5/bin:/opt/python/3.2.5/bin:/opt/python/pypy-2.5.0/bin:/opt/python/pypy3-2.4.0/bin:/usr/local/phantomjs/bin:/home/travis/.nvm/v0.10.36/bin:./node_modules/.bin:/usr/local/maven-3.2.5/bin:/usr/local/clang-3.4/bin:/home/travis/.gimme/versions/go1.4.1.linux.amd64/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/travis/.rvm/bin
PYTHONHOME: <not set>
PYTHONPATH: <not set>
WARNING: could not import _license.show_info
# try:
# $ conda install -n root _license
2.19s$ conda install --yes $DEPS pip
Fetching package metadata .......
Solving package specifications: ....
UnsatisfiableError: The following specifications were found to be in conflict:
  - numpy 1.10.4*
  - scikit-learn 0.18.1* -> numpy 1.11*
Use "conda info <package>" to see the dependencies for each package.

rth · 2016-12-06T12:34:55Z

Yes, it looks like conda only builds scikit-learn-0.18.1 with numpy 1.11 (probably because they already have to build [3 Python versions] x [MLK/no MLK builds] and don't want to add another x [Number of numpy versions].

% conda search scikit-learn                                                                                      
Fetching package metadata: ....
scikit-learn
                            [...]
                             0.18                np111py27_0  defaults        
                             0.18            np111py27_nomkl_0  defaults        [nomkl]
                             0.18                np111py34_0  defaults        
                             0.18            np111py34_nomkl_0  defaults        [nomkl]
                             0.18                np111py35_0  defaults        
                             0.18            np111py35_nomkl_0  defaults        [nomkl]
                             0.18.1              np111py27_0  defaults        
                             0.18.1          np111py27_nomkl_0  defaults        [nomkl]
                             0.18.1              np111py34_0  defaults        
                             0.18.1          np111py34_nomkl_0  defaults        [nomkl]
                          .  0.18.1              np111py35_0  defaults        
                             0.18.1          np111py35_nomkl_0  defaults        [nomkl]

Maybe what we could do for now in travis is to add scikit-learn 0.18.1 for the Python 3.5 line (that has numpy 1.11) and just not install scikit-learn for other python versions / numpy versions. Then skip the tests that need sklearn by raising a unittest.SkipTest ?

basaks · 2016-12-06T14:43:49Z

@rth I tried to get numpy=1.9.2 working as well with python2.7, but there does not seem to be any stable conda environment that works with both numpy=1.9.2 and scipy=0.17.

Anyway all your orginal deps are working in both python 3 and python 2 environments.

I have addressed all your concerns with the original PR.

basaks · 2016-12-09T22:46:08Z

@rth Let me know if I have missed anything. I did not use another PR as the scope of the PR is still the same, i.e., I did not break it up into many PRs and hopefully managed to address all your concerns.

rth · 2016-12-10T10:10:26Z

@basaks Sorry for the late response. Yes, conda can be frustrating sometimes. Just a few last comments,

could you please move the /examples folder one level up (so it's in the top level directory as for instance in https://github.com/scikit-learn/scikit-learn)
I tried to simplify the example as much as possible (by removing configparser, pipeline, using default parameters for GridSearchCV and reducing how many lines it prints) here http://pastebin.com/jWvHKi1q Would you be OK with that?
Would you mind renaming optimise.py to sklearn.py (or something similar) as the functions that this modules include do not perform any optimization but just expose a scikit learn compatible API.
Also the ConfigException, TagsMixin and KrigePredictProbaMixin classes are not actually used in the example (or unit tests), so if the Krige class is defined as class Krige(RegressorMixin, BaseEstimator) everything still works. I understand that you using them in your code, maybe it would be best to include them in the next PR (not this one). Both would require more discussion as they are not standard with respect to the scikit-learn API (predict_proba typically returns a single array not 4), and the estimators tags are still being developed for scikit v0.19 (cf issue 6599 at https://github.com/scikit-learn/scikit-learn/issues)
Regarding the new section in the README
- optimise -> optimize everywhere : sorry I know you are in Australia, but scientific python community (and this package) uses American English (e.g. scipy.optimize)
- Could we merge the two current sections "Kriging Optimiser" and "How to use the optimise module" into just one called "Kriging parameters tuning" (or something similar). In general, I would remove all references to the "optimization module" (it's debatable whether finding best parameters by cross-validation can be called optimization, and in any case this PR doesn't add any optimization capability, just a new API) , and say something along the lines of,
  
  PyKrige also exposes a scikit learn compatible API, which can be used to perform parameter tuning using sklearn.model_selection.GridSearchCV (with a link scikt-learn docs). [Maybe some explanations on the parameters that can be tuned] You can run the corresponding example with
  python examples/krige_cv.py
  
  Maybe remove the table at the end in the readme (among other things the mean_test_score is R² score by default, so when it's negative it means that the predictions are pretty bad, which is OK as we use random data, but probably not something you would want to have in a README.)

Thanks again for this PR. What do you think?

@bsmurphy Would you have time to have a look at this PR, to know if you are OK with it? Thanks!

basaks · 2016-12-10T13:05:54Z

@rth Excellent suggestions. I agree to all of them.
My judgement on this PR is a bit biased as I am using these classes in a specialized pipeline.
Thank you for your valuable inputs.

I have made all changes your suggest except that I could not rename the optimiser.py to sklearn.py as then when I use import sklearn the python interpreter may instead import this file, instead of scikit-learn depending on your paths.

rth · 2016-12-10T13:55:06Z

Thanks, @basaks ! This looks good to me.

Will just wait a few more days before merging in case bsmurphy (or anybody else interested) wants to have a look at this PR.

basaks added 8 commits December 3, 2016 23:15

added sklearn requirement

c2be125

sklearn pipeline added for optimisation of krige parameters

84b90f1

pipeline test added

b659792

removed .conf and used read_string instead

4229767

added brief description of optimise module

b66ed4e

updated travis etc

430ded2

updated python2.7 and 3.4 numpy versions

7638d58

updated readme

e5187f3

rth reviewed Dec 5, 2016

View reviewed changes

basaks added 3 commits December 6, 2016 01:09

added docstring

473aa03

minor improvement in test

1508919

cleaned up optimiser example

06b5374

basaks added 7 commits December 6, 2016 19:38

added compat file for sklearn compatibility/availability

1ea7098

raise warning and exit when sklearn is not installed

24c4cf2

used compat to support multiple version of sklearn

362db49

updated Krige class test

e49628d

updated setup and readme

8106ab3

reverted numpy versions

4b00109

fixed test

c3eea6d

basaks added 2 commits December 6, 2016 23:39

updated travis due to anaconda build failure

dd3f2ba

numpy=1.9.2 does not work with intel-mkl

03818c7

upgraded python 2 env to use numpy=1.10.4

b8e8382

updated readme, moved example, and trimmed non-essentials

8341c73

basaks added 2 commits December 11, 2016 00:07

refactored, removed references to optimize, clean up and updated tests

83875d2

more parameters supported for parameter tuning

0c0d192

rth merged commit 07af4a5 into GeoStat-Framework:master Dec 15, 2016

rth mentioned this pull request Dec 20, 2016

[Refactoring] N-dimenstional Kriging #31

Closed

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scikit-learn style krige parameter optimisation #24

scikit-learn style krige parameter optimisation #24

basaks commented Dec 4, 2016

rth left a comment •

edited

Loading

rth Dec 5, 2016

basaks Dec 5, 2016 •

edited

Loading

basaks Dec 5, 2016 •

edited

Loading

rth Dec 5, 2016

basaks Dec 5, 2016 •

edited

Loading

rth Dec 5, 2016

basaks Dec 5, 2016 •

edited

Loading

rth Dec 5, 2016

rth Dec 5, 2016

basaks Dec 5, 2016 •

edited

Loading

rth Dec 5, 2016

rth commented Dec 5, 2016

basaks commented Dec 5, 2016 •

edited

Loading

rth commented Dec 6, 2016 •

edited

Loading

basaks commented Dec 6, 2016 •

edited

Loading

basaks commented Dec 6, 2016 •

edited

Loading

rth commented Dec 6, 2016 •

edited

Loading

basaks commented Dec 6, 2016 •

edited

Loading

basaks commented Dec 9, 2016

rth commented Dec 10, 2016

basaks commented Dec 10, 2016

rth commented Dec 10, 2016

scikit-learn style krige parameter optimisation #24

scikit-learn style krige parameter optimisation #24

Conversation

basaks commented Dec 4, 2016

rth left a comment • edited Loading

Choose a reason for hiding this comment

rth Dec 5, 2016

Choose a reason for hiding this comment

basaks Dec 5, 2016 • edited Loading

Choose a reason for hiding this comment

basaks Dec 5, 2016 • edited Loading

Choose a reason for hiding this comment

rth Dec 5, 2016

Choose a reason for hiding this comment

basaks Dec 5, 2016 • edited Loading

Choose a reason for hiding this comment

rth Dec 5, 2016

Choose a reason for hiding this comment

basaks Dec 5, 2016 • edited Loading

Choose a reason for hiding this comment

rth Dec 5, 2016

Choose a reason for hiding this comment

rth Dec 5, 2016

Choose a reason for hiding this comment

basaks Dec 5, 2016 • edited Loading

Choose a reason for hiding this comment

rth Dec 5, 2016

Choose a reason for hiding this comment

rth commented Dec 5, 2016

basaks commented Dec 5, 2016 • edited Loading

rth commented Dec 6, 2016 • edited Loading

basaks commented Dec 6, 2016 • edited Loading

basaks commented Dec 6, 2016 • edited Loading

rth commented Dec 6, 2016 • edited Loading

basaks commented Dec 6, 2016 • edited Loading

basaks commented Dec 9, 2016

rth commented Dec 10, 2016

basaks commented Dec 10, 2016

rth commented Dec 10, 2016

rth left a comment •

edited

Loading

basaks Dec 5, 2016 •

edited

Loading

basaks Dec 5, 2016 •

edited

Loading

basaks Dec 5, 2016 •

edited

Loading

basaks Dec 5, 2016 •

edited

Loading

basaks Dec 5, 2016 •

edited

Loading

basaks commented Dec 5, 2016 •

edited

Loading

rth commented Dec 6, 2016 •

edited

Loading

basaks commented Dec 6, 2016 •

edited

Loading

basaks commented Dec 6, 2016 •

edited

Loading

rth commented Dec 6, 2016 •

edited

Loading

basaks commented Dec 6, 2016 •

edited

Loading