GSOC2017 Genaralized Negative Binomial (NB-P) model #3832

evgenyzhurko · 2017-07-24T14:45:32Z

This PR introduce implementation of Generalized Negative Binomial (NB-P) model.
This model include:

Log-likelihood function
Score, hessian
Tests

Status - merged #3874

coveralls · 2017-07-24T15:22:14Z

Coverage decreased (-0.07%) to 90.806% when pulling f0868a5 on evgenyzhurko:nb-p into 5ec2856 on statsmodels:master.

coveralls · 2017-07-26T09:07:40Z

Coverage decreased (-0.2%) to 90.727% when pulling 33a15cf on evgenyzhurko:nb-p into 5ec2856 on statsmodels:master.

coveralls · 2017-07-27T09:53:05Z

Coverage decreased (-0.02%) to 90.86% when pulling 033901f on evgenyzhurko:nb-p into 5ec2856 on statsmodels:master.

coveralls · 2017-07-27T12:05:17Z

Coverage decreased (-0.005%) to 90.873% when pulling 96699c3 on evgenyzhurko:nb-p into 5ec2856 on statsmodels:master.

coveralls · 2017-07-27T14:05:38Z

Coverage increased (+0.02%) to 90.894% when pulling 73bc1da on evgenyzhurko:nb-p into 5ec2856 on statsmodels:master.

josef-pkt · 2017-07-27T23:53:08Z

statsmodels/discrete/discrete_model.py

+            counts = np.atleast_2d(np.arange(0, np.max(self.endog)+1))
+            mu = self.predict(params, exog=exog, exposure=exposure,
+                              offset=offset)[:,None]
+            return nbinom.pmf(counts, mu, params[-1], self.parametrization)


I'm pretty sure this is wrong, nbinom uses a different parameterization
for standard negative binomial see
https://gist.github.com/josef-pkt/c4f5d0f315c0ce4e6ecc65f0512e8296 In [22]
and #106 (comment)

we need an extra method to convert the parameterization, e.g. convert_params

josef-pkt · 2017-07-27T23:55:51Z

statsmodels/discrete/discrete_model.py

+        Log(exposure) is added to the linear prediction with coefficient
+        equal to 1.
+    """ + base._missing_param_doc}
+    def __init__(self, endog, exog, p=1, offset=None,


empty line before `def

josef-pkt · 2017-07-27T23:58:45Z

Looks good based on a quick read (excluding unit tests)

predict "prob" will need unit tests.
(Aside: when these parts are finished, then we should see if the current models Poisson and NegativeBinomial can get some of the same enhancements, e.g. in predict

josef-pkt · 2017-08-01T22:17:29Z

This pull request introduces 3 alerts - view on lgtm.com

new alerts:

2 for Module-level cyclic import
1 for First argument to super() is not enclosing class

Comment posted by lgtm.com

evgenyzhurko · 2017-08-02T09:48:01Z

@josef-pkt
I implemented tests for predict 'prob'.
All tests successfully passed successfully locally.
Travis was failed and error message looks strange.

josef-pkt · 2017-08-02T12:46:38Z

statsmodels/discrete/tests/test_discrete.py

@@ -1772,6 +1773,244 @@ def test_predict_prob(self):
        assert_allclose(chi2[:], (0.64628806058715882, 0.98578597726324468),
                        rtol=0.01)

+class TestNegativeBinomial_pNB2Newton(CheckModelResults):
+    @classmethod
+    def setupClass(cls):


master now uses pytest instead of nosetest.
this needs to be setup_class now, instead of camel case

josef-pkt · 2017-08-02T12:47:06Z

statsmodels/discrete/tests/test_discrete.py

+
+class TestNegativeBinomial_pNB1Newton(CheckModelResults):
+    @classmethod
+    def setupClass(cls):


same here, and other places below

josef-pkt · 2017-08-02T12:48:19Z

I made inline comments.
The test problem is most likely just setup_class. (pytest spelling)

(I will be offline for a few hours, but then I can check if there are other problems)

coveralls · 2017-08-02T14:17:19Z

Coverage increased (+0.06%) to 81.307% when pulling 9abce87 on evgenyzhurko:nb-p into da27171 on statsmodels:master.

coveralls · 2017-08-02T14:17:20Z

Coverage increased (+0.06%) to 81.307% when pulling 9abce87 on evgenyzhurko:nb-p into da27171 on statsmodels:master.

evgenyzhurko · 2017-08-14T07:20:42Z

@josef-pkt Can you review this PR at first? Because I want to finish it firstly and than implement ZINB, Truncated NB and some Hurdle models with NB-p.

Personal note:
Were you receiving few emails during last 2 weeks?

josef-pkt · 2017-08-14T15:04:03Z

statsmodels/discrete/discrete_model.py

+    def loglike(self, params):
+        """
+        Loglikelihood of Negative Binomial model
+        Parameters


empty line before Parameters, also in other docstrings

josef-pkt · 2017-08-14T15:04:35Z

statsmodels/discrete/discrete_model.py

+        return llf
+
+    def score_obs(self, params):
+        if self._transparams:


missing docstring

josef-pkt · 2017-08-14T15:04:54Z

statsmodels/discrete/discrete_model.py

+        return np.concatenate((dparams, np.atleast_2d(dalpha).T),
+                              axis=1)
+
+    def score(self, params):


josef-pkt · 2017-08-14T15:24:14Z

some docstrings are missing

to be more pep-8 compatible, it is better to rename the class name
NegativeBinomial_p -> NegativeBinomialP

Otherwise, I think we should merge this so you can rebase on it for the other models.

josef-pkt · 2017-08-14T15:28:47Z

The implementation is similar to NegativeBinomial. However, there might be problems with NegativeBinomial and then similarly here. E.g. I think that the transparams might not be handled correctly and that there are problems with small alpha #3863
Those would affect both negbin classes in a similar way and might require a common refactoring.

coveralls · 2017-08-15T13:17:49Z

Coverage increased (+0.06%) to 81.307% when pulling 7882691 on evgenyzhurko:nb-p into da27171 on statsmodels:master.

josef-pkt · 2017-08-15T13:57:17Z

This pull request introduces 3 alerts - view on lgtm.com

new alerts:

2 for Module-level cyclic import
1 for First argument to super() is not enclosing class

Comment posted by lgtm.com

evgenyzhurko · 2017-08-16T08:11:22Z

@josef-pkt
What is your opinion if I'll merge this branch into zero-inflated branch #3755 ?
It's need to implement ZINB model.

josef-pkt · 2017-08-16T14:01:21Z

What is your opinion if I'll merge this branch into zero-inflated branch

I would rather merge this NBP PR into master and you can rebase your other branches on master.
I go over it again a bit later today, but I think it should be ready to merge (after a rebase)

evgenyzhurko · 2017-08-16T14:04:37Z

@josef-pkt NBP has the same problem with alpha=0 as current NB. I didn't tried to fix it.

josef-pkt · 2017-08-16T14:27:57Z

problem with alpha=0

I don't know yet if we can fix that or how to fix it. For sure it will not be easy.
For now this means that neither negbin version is reliable for tiny alpha, but the main usecases compared to Poisson is when there is a large overdispersion. (Plus we now also have GP-P as alternative model.)

josef-pkt

I added a few more comments mainly for changes in docstrings.
I only spot checked the code, test coverage seems to be good.

Then you can rebase and I will merge it

josef-pkt · 2017-08-16T16:05:13Z

statsmodels/discrete/discrete_model.py

@@ -2704,6 +2706,345 @@ def fit_regularized(self, start_params=None, method='l1',

        return L1NegativeBinomialResultsWrapper(discretefit)

+class NegativeBinomialP(CountModel):
+    __doc__ = """
+    Negative Binomial model for count data


better to distinguish from NegativeBinomial:
"Generalized Negative Binomial (NB-P) model for count data"

josef-pkt · 2017-08-16T16:05:57Z

statsmodels/discrete/discrete_model.py

+    endog : array
+        A reference to the endogenous response variable
+    exog : array
+        A reference to the exogenous design.


"p" parameter is missing AFAICS

josef-pkt · 2017-08-16T16:07:41Z

statsmodels/discrete/discrete_model.py

+
+    def loglikeobs(self, params):
+        """
+        Loglikelihood for observations of Negative Binomial model


I think we should add "NB-P" in all docstrings (first line), e.g.
"Loglikelihood for observations of Negative Binomial NB-P model"

josef-pkt · 2017-08-16T16:08:19Z

statsmodels/discrete/discrete_model.py

+        ----------
+        params : array-like
+            The parameters of the model.
+        Returns


add empty line before section headers

josef-pkt · 2017-08-16T16:11:52Z

statsmodels/discrete/discrete_model.py

+            self._transparams = True
+        else:
+            if use_transparams:
+                warnings.warn("Paramter \"use_transparams\" is ignored",


you can use single quotes to avoid backslash, e.g.
warnings.warn('Parameter "use_transparams" is ignored',

Note misspelled Parameter, missing e

josef-pkt · 2017-08-16T16:15:24Z

statsmodels/discrete/discrete_model.py

+            discretefit = L1NegativeBinomialResults(self, cntfit)
+        else:
+            raise TypeError(
+                    "argument method == %s, which is not handled" % method)


I didn't check if this is the general pattern in fit_regularized.
If we raise an exception based on arguments, then it should raise immediately before computations are done.
move
if method not in ...:
raise TypeError
to the top of the method

josef-pkt · 2017-08-16T16:16:23Z

statsmodels/discrete/discrete_model.py

+                which='mean'):
+        """
+        Predict response variable of a count model given exogenous variables.
+        Notes


Parameter and Returns section are missing
empty line before section headers

josef-pkt · 2017-08-16T16:20:46Z

statsmodels/discrete/tests/test_discrete.py

+
+    #NOTE: The bse is much closer precitions to stata
+    def test_bse(self):
+        assert_almost_equal(self.res1.bse, self.res2.bse, DECIMAL_3)


in general: use assert_allclose with appropriate choice of atol and/or rtol.
assert_almost_equal is much less flexible and comes from code that was written before numpy had assert_allclose

coveralls · 2017-08-16T21:31:39Z

Coverage increased (+0.06%) to 81.309% when pulling 4046dca on evgenyzhurko:nb-p into da27171 on statsmodels:master.

coveralls · 2017-08-16T22:20:33Z

Coverage increased (+0.06%) to 81.309% when pulling f4443bc on evgenyzhurko:nb-p into 143cc11 on statsmodels:master.

josef-pkt · 2017-08-16T23:13:51Z

This pull request introduces 3 alerts - view on lgtm.com

new alerts:

2 for Module-level cyclic import
1 for First argument to super() is not enclosing class

Comment posted by lgtm.com

coveralls · 2017-08-21T10:20:52Z

Coverage increased (+0.08%) to 81.323% when pulling e5fde3f on evgenyzhurko:nb-p into 143cc11 on statsmodels:master.

evgenyzhurko · 2017-08-21T10:24:13Z

@josef-pkt
I fixed all problems with docs and loops from #3874 , you should update your branch.

josef-pkt · 2017-08-21T11:10:53Z

This pull request introduces 3 alerts - view on lgtm.com

new alerts:

2 for Module-level cyclic import
1 for First argument to super() is not enclosing class

Comment posted by lgtm.com

Implemented llf for Negative Binomial with p parameter

f0868a5

evgenyzhurko added 2 commits July 25, 2017 17:08

Implemented score and hessian for NB-p

07327c3

Implemented predict

33a15cf

Added tests for NB-p

73bc1da

josef-pkt reviewed Jul 27, 2017

View reviewed changes

josef-pkt reviewed Aug 2, 2017

View reviewed changes

Added tests for predict prob

9abce87

josef-pkt added comp-discrete type-enh labels Aug 10, 2017

josef-pkt reviewed Aug 14, 2017

View reviewed changes

Improved docs

7882691

josef-pkt reviewed Aug 16, 2017

View reviewed changes

Updated docs, refactored tests

f4443bc

josef-pkt mentioned this pull request Aug 18, 2017

ENH: add NegativeBinomial-P rebased #3874

Merged

Updated docs, vectorized loops

e5fde3f

evgenyzhurko closed this Aug 21, 2017

evgenyzhurko changed the title ~~Negative binomial model with p-parameter~~ GSOC2017 Genaralized Negative Binomial (NB-P) model Aug 29, 2017

josef-pkt mentioned this pull request Feb 14, 2018

ENH: GAMLSS - Box-Cox Cole Green distribution #4256

Open

josef-pkt mentioned this pull request Apr 15, 2018

release 0.9 #3076

Closed

GSOC2017 Genaralized Negative Binomial (NB-P) model #3832

GSOC2017 Genaralized Negative Binomial (NB-P) model #3832

Conversation

evgenyzhurko commented Jul 24, 2017 • edited Loading

coveralls commented Jul 24, 2017 • edited Loading

coveralls commented Jul 26, 2017 • edited Loading

coveralls commented Jul 27, 2017 • edited Loading

coveralls commented Jul 27, 2017 • edited Loading

coveralls commented Jul 27, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josef-pkt commented Jul 27, 2017

josef-pkt commented Aug 1, 2017

evgenyzhurko commented Aug 2, 2017

Choose a reason for hiding this comment

josef-pkt Aug 2, 2017 • edited Loading

Choose a reason for hiding this comment

josef-pkt commented Aug 2, 2017 • edited Loading

coveralls commented Aug 2, 2017

coveralls commented Aug 2, 2017

evgenyzhurko commented Aug 14, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josef-pkt commented Aug 14, 2017

josef-pkt commented Aug 14, 2017

coveralls commented Aug 15, 2017 • edited Loading

josef-pkt commented Aug 15, 2017

evgenyzhurko commented Aug 16, 2017

josef-pkt commented Aug 16, 2017

evgenyzhurko commented Aug 16, 2017

josef-pkt commented Aug 16, 2017

josef-pkt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented Aug 16, 2017 • edited Loading

coveralls commented Aug 16, 2017 • edited Loading

josef-pkt commented Aug 16, 2017

coveralls commented Aug 21, 2017 • edited Loading

evgenyzhurko commented Aug 21, 2017

josef-pkt commented Aug 21, 2017

evgenyzhurko commented Jul 24, 2017 •

edited

Loading

coveralls commented Jul 24, 2017 •

edited

Loading

coveralls commented Jul 26, 2017 •

edited

Loading

coveralls commented Jul 27, 2017 •

edited

Loading

coveralls commented Jul 27, 2017 •

edited

Loading

coveralls commented Jul 27, 2017 •

edited

Loading

josef-pkt Aug 2, 2017 •

edited

Loading

josef-pkt commented Aug 2, 2017 •

edited

Loading

evgenyzhurko commented Aug 14, 2017 •

edited

Loading

coveralls commented Aug 15, 2017 •

edited

Loading

coveralls commented Aug 16, 2017 •

edited

Loading

coveralls commented Aug 16, 2017 •

edited

Loading

coveralls commented Aug 21, 2017 •

edited

Loading