Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSOC2017 Genaralized Negative Binomial (NB-P) model #3832

Closed
wants to merge 8 commits into from
Closed

GSOC2017 Genaralized Negative Binomial (NB-P) model #3832

wants to merge 8 commits into from

Conversation

evgenyzhurko
Copy link
Contributor

@evgenyzhurko evgenyzhurko commented Jul 24, 2017

This PR introduce implementation of Generalized Negative Binomial (NB-P) model.
This model include:

  • Log-likelihood function
  • Score, hessian
  • Tests

Status - merged #3874

@coveralls
Copy link

coveralls commented Jul 24, 2017

Coverage Status

Coverage decreased (-0.07%) to 90.806% when pulling f0868a5 on evgenyzhurko:nb-p into 5ec2856 on statsmodels:master.

@coveralls
Copy link

coveralls commented Jul 26, 2017

Coverage Status

Coverage decreased (-0.2%) to 90.727% when pulling 33a15cf on evgenyzhurko:nb-p into 5ec2856 on statsmodels:master.

@coveralls
Copy link

coveralls commented Jul 27, 2017

Coverage Status

Coverage decreased (-0.02%) to 90.86% when pulling 033901f on evgenyzhurko:nb-p into 5ec2856 on statsmodels:master.

@coveralls
Copy link

coveralls commented Jul 27, 2017

Coverage Status

Coverage decreased (-0.005%) to 90.873% when pulling 96699c3 on evgenyzhurko:nb-p into 5ec2856 on statsmodels:master.

@coveralls
Copy link

coveralls commented Jul 27, 2017

Coverage Status

Coverage increased (+0.02%) to 90.894% when pulling 73bc1da on evgenyzhurko:nb-p into 5ec2856 on statsmodels:master.

counts = np.atleast_2d(np.arange(0, np.max(self.endog)+1))
mu = self.predict(params, exog=exog, exposure=exposure,
offset=offset)[:,None]
return nbinom.pmf(counts, mu, params[-1], self.parametrization)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this is wrong, nbinom uses a different parameterization
for standard negative binomial see
https://gist.github.com/josef-pkt/c4f5d0f315c0ce4e6ecc65f0512e8296 In [22]
and #106 (comment)

we need an extra method to convert the parameterization, e.g. convert_params

Log(exposure) is added to the linear prediction with coefficient
equal to 1.
""" + base._missing_param_doc}
def __init__(self, endog, exog, p=1, offset=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty line before `def

@josef-pkt
Copy link
Member

Looks good based on a quick read (excluding unit tests)

predict "prob" will need unit tests.
(Aside: when these parts are finished, then we should see if the current models Poisson and NegativeBinomial can get some of the same enhancements, e.g. in predict

@josef-pkt
Copy link
Member

This pull request introduces 3 alerts - view on lgtm.com

new alerts:

  • 2 for Module-level cyclic import
  • 1 for First argument to super() is not enclosing class

Comment posted by lgtm.com

@evgenyzhurko
Copy link
Contributor Author

@josef-pkt
I implemented tests for predict 'prob'.
All tests successfully passed successfully locally.
Travis was failed and error message looks strange.

@@ -1772,6 +1773,244 @@ def test_predict_prob(self):
assert_allclose(chi2[:], (0.64628806058715882, 0.98578597726324468),
rtol=0.01)

class TestNegativeBinomial_pNB2Newton(CheckModelResults):
@classmethod
def setupClass(cls):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

master now uses pytest instead of nosetest.
this needs to be setup_class now, instead of camel case


class TestNegativeBinomial_pNB1Newton(CheckModelResults):
@classmethod
def setupClass(cls):
Copy link
Member

@josef-pkt josef-pkt Aug 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, and other places below

@josef-pkt
Copy link
Member

josef-pkt commented Aug 2, 2017

I made inline comments.
The test problem is most likely just setup_class. (pytest spelling)

(I will be offline for a few hours, but then I can check if there are other problems)

@coveralls
Copy link

Coverage Status

Coverage increased (+0.06%) to 81.307% when pulling 9abce87 on evgenyzhurko:nb-p into da27171 on statsmodels:master.

1 similar comment
@coveralls
Copy link

Coverage Status

Coverage increased (+0.06%) to 81.307% when pulling 9abce87 on evgenyzhurko:nb-p into da27171 on statsmodels:master.

@evgenyzhurko
Copy link
Contributor Author

evgenyzhurko commented Aug 14, 2017

@josef-pkt Can you review this PR at first? Because I want to finish it firstly and than implement ZINB, Truncated NB and some Hurdle models with NB-p.

Personal note:
Were you receiving few emails during last 2 weeks?

def loglike(self, params):
"""
Loglikelihood of Negative Binomial model
Parameters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty line before Parameters, also in other docstrings

return llf

def score_obs(self, params):
if self._transparams:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing docstring

return np.concatenate((dparams, np.atleast_2d(dalpha).T),
axis=1)

def score(self, params):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstrings

@josef-pkt
Copy link
Member

some docstrings are missing

to be more pep-8 compatible, it is better to rename the class name
NegativeBinomial_p -> NegativeBinomialP

Otherwise, I think we should merge this so you can rebase on it for the other models.

@josef-pkt
Copy link
Member

The implementation is similar to NegativeBinomial. However, there might be problems with NegativeBinomial and then similarly here. E.g. I think that the transparams might not be handled correctly and that there are problems with small alpha #3863
Those would affect both negbin classes in a similar way and might require a common refactoring.

@coveralls
Copy link

coveralls commented Aug 15, 2017

Coverage Status

Coverage increased (+0.06%) to 81.307% when pulling 7882691 on evgenyzhurko:nb-p into da27171 on statsmodels:master.

@josef-pkt
Copy link
Member

This pull request introduces 3 alerts - view on lgtm.com

new alerts:

  • 2 for Module-level cyclic import
  • 1 for First argument to super() is not enclosing class

Comment posted by lgtm.com

@evgenyzhurko
Copy link
Contributor Author

@josef-pkt
What is your opinion if I'll merge this branch into zero-inflated branch #3755 ?
It's need to implement ZINB model.

@josef-pkt
Copy link
Member

What is your opinion if I'll merge this branch into zero-inflated branch

I would rather merge this NBP PR into master and you can rebase your other branches on master.
I go over it again a bit later today, but I think it should be ready to merge (after a rebase)

@evgenyzhurko
Copy link
Contributor Author

@josef-pkt NBP has the same problem with alpha=0 as current NB. I didn't tried to fix it.

@josef-pkt
Copy link
Member

problem with alpha=0

I don't know yet if we can fix that or how to fix it. For sure it will not be easy.
For now this means that neither negbin version is reliable for tiny alpha, but the main usecases compared to Poisson is when there is a large overdispersion. (Plus we now also have GP-P as alternative model.)

Copy link
Member

@josef-pkt josef-pkt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few more comments mainly for changes in docstrings.
I only spot checked the code, test coverage seems to be good.

Then you can rebase and I will merge it

@@ -2704,6 +2706,345 @@ def fit_regularized(self, start_params=None, method='l1',

return L1NegativeBinomialResultsWrapper(discretefit)

class NegativeBinomialP(CountModel):
__doc__ = """
Negative Binomial model for count data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to distinguish from NegativeBinomial:
"Generalized Negative Binomial (NB-P) model for count data"

endog : array
A reference to the endogenous response variable
exog : array
A reference to the exogenous design.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"p" parameter is missing AFAICS


def loglikeobs(self, params):
"""
Loglikelihood for observations of Negative Binomial model
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add "NB-P" in all docstrings (first line), e.g.
"Loglikelihood for observations of Negative Binomial NB-P model"

----------
params : array-like
The parameters of the model.
Returns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add empty line before section headers

self._transparams = True
else:
if use_transparams:
warnings.warn("Paramter \"use_transparams\" is ignored",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use single quotes to avoid backslash, e.g.
warnings.warn('Parameter "use_transparams" is ignored',

Note misspelled Parameter, missing e

discretefit = L1NegativeBinomialResults(self, cntfit)
else:
raise TypeError(
"argument method == %s, which is not handled" % method)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't check if this is the general pattern in fit_regularized.
If we raise an exception based on arguments, then it should raise immediately before computations are done.
move
if method not in ...:
raise TypeError
to the top of the method

which='mean'):
"""
Predict response variable of a count model given exogenous variables.
Notes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameter and Returns section are missing
empty line before section headers


#NOTE: The bse is much closer precitions to stata
def test_bse(self):
assert_almost_equal(self.res1.bse, self.res2.bse, DECIMAL_3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general: use assert_allclose with appropriate choice of atol and/or rtol.
assert_almost_equal is much less flexible and comes from code that was written before numpy had assert_allclose

@coveralls
Copy link

coveralls commented Aug 16, 2017

Coverage Status

Coverage increased (+0.06%) to 81.309% when pulling 4046dca on evgenyzhurko:nb-p into da27171 on statsmodels:master.

@coveralls
Copy link

coveralls commented Aug 16, 2017

Coverage Status

Coverage increased (+0.06%) to 81.309% when pulling f4443bc on evgenyzhurko:nb-p into 143cc11 on statsmodels:master.

@josef-pkt
Copy link
Member

This pull request introduces 3 alerts - view on lgtm.com

new alerts:

  • 2 for Module-level cyclic import
  • 1 for First argument to super() is not enclosing class

Comment posted by lgtm.com

@coveralls
Copy link

coveralls commented Aug 21, 2017

Coverage Status

Coverage increased (+0.08%) to 81.323% when pulling e5fde3f on evgenyzhurko:nb-p into 143cc11 on statsmodels:master.

@evgenyzhurko
Copy link
Contributor Author

@josef-pkt
I fixed all problems with docs and loops from #3874 , you should update your branch.

@josef-pkt
Copy link
Member

This pull request introduces 3 alerts - view on lgtm.com

new alerts:

  • 2 for Module-level cyclic import
  • 1 for First argument to super() is not enclosing class

Comment posted by lgtm.com

@evgenyzhurko evgenyzhurko changed the title Negative binomial model with p-parameter GSOC2017 Genaralized Negative Binomial (NB-P) model Aug 29, 2017
@josef-pkt josef-pkt mentioned this pull request Apr 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants