Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

saving conformal predictor #8

Open
wgmueller1 opened this issue Jul 13, 2017 · 3 comments
Open

saving conformal predictor #8

wgmueller1 opened this issue Jul 13, 2017 · 3 comments

Comments

@wgmueller1
Copy link

Thank you for your library!

Is there any easy way to save the fitted and calibrated conformal predictors for re-use? I'd like to make conformal predictions in an online setting.

I tried just pickling the ipc object, but that failed.

@donlnz
Copy link
Owner

donlnz commented Jul 13, 2017

Pickling IcpClassifier or IcpRegressor will fail, due to them containing lambda expressions, which are not picklable by default. The easiest way to fix this is to simply import the dill package, which automatically makes lambda expressions picklable.

import dill, joblib

# ...

joblib.dump(icp, 'my_filename') # store model
icp = joblib.load('my_filename') # load stored model

Full running example: https://gist.github.com/donlnz/c00791aba32330facf315396f9935c9a

NB: Due to some early decisions in developing nonconformist, the underlying model caches its predictions (initially, it was only possible to output predictions for a single specific significance level at a time, leading to the underlying model being applied multiple times for the same data if the same test set was to be evaluated at several significance levels; of course, commonly in test settings, the same conformal predictor would be applied to the same test set for each significance level 0.01, 0.02, ... 0.98, 0.99.). Long story short: after calling IcpClassifier.calibrate or IcpClassifier.predict (same goes for IcpRegressor), the last seen calibration set (or test set) will be stored in BaseModelAdapter. This might lead to files that are very large if the model is saved to disk (and that might additionally contain sensitive data). This behaviour will most likely be removed in the future, or at least be made optional. In the meantime, it is suggested that the cache is cleared before storing models to disk.

This is done as such:

icp.nc_function.model.last_x = None
icp.nc_function.model.last_y = None

joblib.dump(icp, 'my_filename')

@wgmueller1
Copy link
Author

wgmueller1 commented Jul 25, 2017

Thank you for the response. What is your environment?

When I run your gist using Python 3.5.2 :: Anaconda 4.3.1 (x86_64) and the following library versions

dill==0.2.5
joblib==0.11
scikit-learn==0.18.1

I get the following error:

PicklingError: Can't pickle <function BaseIcp.init.. at 0x1174c70d0>: it's not found as nonconformist.icp.BaseIcp.init..

@donlnz
Copy link
Owner

donlnz commented Sep 3, 2017

I'm able to run my code example on two separate setups:

Setup 1
WinPython x64 2.7.6 (Windows)
dill==0.2.7
joblib==0.11
scikit-learn==0.15.2

Setup 2
Python 3.5.2 x64 (Linux)
dill==0.2.7
joblib==0.11
scikit-learn==0.18.1

Does running python -v nonconformist_save_load.py yield any further insights?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants