Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test GLM with scikit-learn check_estimator (setting attributes during init ) #363

Closed
titipata opened this issue Feb 18, 2020 · 6 comments · Fixed by #364
Closed

Test GLM with scikit-learn check_estimator (setting attributes during init ) #363

titipata opened this issue Feb 18, 2020 · 6 comments · Fixed by #364
Assignees

Comments

@titipata
Copy link
Collaborator

titipata commented Feb 18, 2020

According to openjournals/joss-reviews#1959 (comment), we cannot use scikit-learn's check_estimator with GLM class. We might have set the initial values here and here somewhere outside the init.

from pyglmnet import GLM
from sklearn.utils.estimator_checks import check_estimator

check_estimator(GLM)

Output:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-10-96431abb45f5> in <module>
----> 1 check_estimator(GLM)

~/anaconda3/lib/python3.7/site-packages/sklearn/utils/estimator_checks.py in check_estimator(Estimator)
    292         estimator = Estimator()
    293         check_parameters_default_constructible(name, Estimator)
--> 294         check_no_attributes_set_in_init(name, estimator)
    295     else:
    296         # got an instance

~/anaconda3/lib/python3.7/site-packages/sklearn/utils/estimator_checks.py in check_no_attributes_set_in_init(name, estimator)
   2067             "Estimator %s should not set any attribute apart"
   2068             " from parameters during init. Found attributes %s."
-> 2069             % (name, sorted(invalid_attr)))
   2070     # Ensure that each parameter is set in init
   2071     invalid_attr = set(init_params) - set(vars(estimator)) - {"self"}

AssertionError: Estimator GLM should not set any attribute apart from parameters during init. Found attributes ['beta0_', 'beta_', 'n_iter_', 'rng', 'ynull_'].
@titipata titipata changed the title Problem with check_estimator, setting attribute during init Test GLM with scikit-learn check_estimator (setting attributes during init ) Feb 18, 2020
@jasmainak
Copy link
Member

I fear this will go down a rabbit hole, but maybe good for the long term. I'll give it a try later today and see how far I can go ...

@titipata
Copy link
Collaborator Author

titipata commented Feb 18, 2020

We can also fix in the writing (paper.md) first where we specify some functionality that can be used with some crucial scikit-learn functionality but not all.

@titipata
Copy link
Collaborator Author

titipata commented Feb 18, 2020

@jasmainak I did try moving

self.beta0_ = None
self.beta_ = None
self.ynull_ = None
self.n_iter_ = 0
self.rng = np.random.RandomState(self.random_state)

into fit(...). However, I'm getting a following error from check_estimator instead:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-96431abb45f5> in <module>
----> 1 check_estimator(GLM)

~/anaconda3/lib/python3.7/site-packages/sklearn/utils/estimator_checks.py in check_estimator(Estimator)
    300     for check in _yield_all_checks(name, estimator):
    301         try:
--> 302             check(name, estimator)
    303         except SkipTest as exception:
    304             # the only SkipTest thrown currently results from not

~/anaconda3/lib/python3.7/site-packages/sklearn/utils/testing.py in wrapper(*args, **kwargs)
    353             with warnings.catch_warnings():
    354                 warnings.simplefilter("ignore", self.category)
--> 355                 return fn(*args, **kwargs)
    356 
    357         return wrapper

~/anaconda3/lib/python3.7/site-packages/sklearn/utils/estimator_checks.py in check_fit_score_takes_y(name, estimator_orig)
   1101         func = getattr(estimator, func_name, None)
   1102         if func is not None:
-> 1103             func(X, y)
   1104             args = [p.name for p in signature(func).parameters.values()]
   1105             if args[0] == "self":

~/Documents/github/pyglmnet/pyglmnet/pyglmnet.py in fit_predict(self, X, y)
    957             The predicted targets of shape (n_samples,).
    958         """
--> 959         return self.fit(X, y).predict(X)
    960 
    961     def score(self, X, y):

~/Documents/github/pyglmnet/pyglmnet/pyglmnet.py in fit(self, X, y)
    786 
    787         if hasattr(self, '_allow_refit') and self._allow_refit is False:
--> 788             raise ValueError("This glm object has already been fit before."
    789                              "A refit is not allowed")
    790 

ValueError: This glm object has already been fit before.A refit is not allowed

note The error is from def fit_predict().

@jasmainak
Copy link
Member

I think this is because of #330 . The reasoning at that time as far as I can remember was that refitting is dangerous because some estimated parameters (ending with underscore) already exist and we don't flush them out before a second fit. I think scikit-learn handles it better by using clone etc. but it seemed like an easy fix at the time. Can you try to undo the changes in that PR and see how far you can go? Feel free to open a WIP pull request

@titipata
Copy link
Collaborator Author

@jasmainak so last point is about scipy's expit. After I commented out fit_predict, I got the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-96431abb45f5> in <module>
----> 1 check_estimator(GLM)

~/anaconda3/lib/python3.7/site-packages/sklearn/utils/estimator_checks.py in check_estimator(Estimator)
    300     for check in _yield_all_checks(name, estimator):
    301         try:
--> 302             check(name, estimator)
    303         except SkipTest as exception:
    304             # the only SkipTest thrown currently results from not

~/anaconda3/lib/python3.7/site-packages/sklearn/utils/estimator_checks.py in check_complex_data(name, estimator_orig)
    671     estimator = clone(estimator_orig)
    672     assert_raises_regex(ValueError, "Complex data not supported",
--> 673                         estimator.fit, X, y)
    674 
    675 

~/anaconda3/lib/python3.7/site-packages/sklearn/utils/_unittest_backport.py in assertRaisesRegex(self, expected_exception, expected_regex, *args, **kwargs)
    222         context = _AssertRaisesContext(expected_exception,
    223                                        self, expected_regex)
--> 224         return context.handle('assertRaisesRegex', args, kwargs)

~/anaconda3/lib/python3.7/site-packages/sklearn/utils/_unittest_backport.py in handle(self, name, args, kwargs)
    111                 self.obj_name = str(callable_obj)
    112             with self:
--> 113                 callable_obj(*args, **kwargs)
    114         finally:
    115             # bpo-23890: manually break a reference cycle

~/Documents/github/pyglmnet/pyglmnet/pyglmnet.py in fit(self, X, y)
    832                                     alpha, self.Tau,
    833                                     reg_lambda, X, y, self.eta,
--> 834                                     beta, self.fit_intercept)
    835                 # Update
    836                 beta = beta - self.learning_rate * grad

~/Documents/github/pyglmnet/pyglmnet/pyglmnet.py in _grad_L2loss(distr, alpha, Tau, reg_lambda, X, y, eta, beta, fit_intercept)
    239 
    240     z = _z(beta[0], beta[1:], X, fit_intercept)
--> 241     mu = _mu(distr, z, eta, fit_intercept)
    242     grad_mu = _grad_mu(distr, z, eta)
    243 

~/Documents/github/pyglmnet/pyglmnet/pyglmnet.py in _mu(distr, z, eta, fit_intercept)
    100         mu = z
    101     elif distr == 'binomial':
--> 102         mu = expit(z)
    103     elif distr == 'probit':
    104         mu = norm.cdf(z)

TypeError: ufunc 'expit' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Not sure how to fix this properly.

@titipata
Copy link
Collaborator Author

As @jasmainak suggested, using sklearn dependencies make GLM works with check_estimator (see #364). We can discard the above errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants