Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG in ChainLadder grain OYDQ #480

Open
lorentzenchr opened this issue Dec 7, 2023 · 4 comments
Open

BUG in ChainLadder grain OYDQ #480

lorentzenchr opened this issue Dec 7, 2023 · 4 comments

Comments

@lorentzenchr
Copy link

lorentzenchr commented Dec 7, 2023

import chainladder as cl
import pandas as pd

df = pd.DataFrame({
    "claim_year": 2000 + pd.Series([0] * 8 + [1] * 4),
    "claim_month": [1, 4, 7, 10] * 3,
    "dev_year": 2000 + pd.Series([0] * 4 + [1] * 8),
    "dev_month": [1, 4, 7, 10] * 3,
    "payment": [1] * 12,
})

tr = cl.Triangle(
    df,
    origin=["claim_year", "claim_month"],
    development=["dev_year", "dev_month"],
    columns="payment",
    cumulative=False,
).grain("OYDQ")

cl_est = cl.Chainladder().fit(cl.Development(average="volume").fit_transform(tr))
cl_est.ultimate_

cl_est.full_triangle_

results in

3 6 9 12 15 18 21 24 27 30 33 36 9999
2000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
2001 1.0000 1.0000 1.0000 1.0000 7.0000 7.0000 7.0000 7.0000

Observe the predicted 7 in dev period 15, 28, 21 and 24 for OY 2001. This is pretty odd!!!

The triangle without estimation reads

3 6 9 12 15 18 21 24 27 30 33 36 9999
2000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
2001 1.0000 1.0000 1.0000 1.0000

When the grain is changed to same O and D, it seems to work fine. But with different origin and development steps not.

BTW, there is an additional warning

RuntimeWarning: Mean of empty slice
  xp.nansum(w * x * y, axis) - xp.nansum(x * w, axis) * xp.nanmean(y, axis)
@jbogaardt
Copy link
Collaborator

Thanks for finding this, it is definitely an issue.

@lorentzenchr
Copy link
Author

@jbogaardt Do you have a rough idea where this bug origins?
What I could contribute, if it helps, is to add the above snippet as a test.

@johalnes
Copy link
Contributor

johalnes commented Dec 15, 2023

I've been trying to use this to get more understanding of how Chainladder package work.

Of what I can see it seem like the issue is that the function call latest_diagonal.val_to_dev() don't get the same dimensions as the original triangle, but starts from period 12.

cl.Chainladder().fit(cl.Development(average="volume").fit_transform(tr)).latest_diagonal.val_to_dev()

12 15 18 21 24
2000-01-01 nan nan nan nan 8
2001-01-01 4 nan nan nan nan

This when _align_cdf is called it then uses this shape, and gets the fifth element for 2000 which is 1.6 and 1 for 2001 which is 8.

The easy fix seems to use incr_to_cum() instead of latest_diagonal when using incremental. That is change get_ultimate to:

    def _get_ultimate(self, X, sample_weight=None):
        """ Private method that uses CDFs to obtain an ultimate vector """
        if X.is_cumulative == False:
            ld = X.incr_to_cum().latest_diagonal    #ld = X.sum('development')
            ultimate = X.incr_to_cum().copy()       #ultimate = ld.val_to_dev()
        else:
            ld = X.latest_diagonal
            ultimate = X.copy()
        cdf = self._align_cdf(ultimate, sample_weight) 
        ultimate = ld * cdf 
        return self._set_ult_attr(ultimate) 

This gives same output as creating the cumulative triangle first, and tests are passing. But I'm not sure the side effects of this change. And if the issue actually is val_to_dev, than this maybe is just hiding something that should have been taken care of.

I've tried to take a deeper look at val_to_dev, but not skilled enough yet. @jbogaardt - Any idea how to fix this in an efficient and simple way? 😄

@johalnes
Copy link
Contributor

FYI - same error occurs when using Benktander:

dev = cl.Development(average="volume").fit_transform(tr)
cl.Benktander(apriori=1, n_iters=10000).fit(dev, sample_weight =dev.latest_diagonal).full_triangle_

Seems like changing align_cdf_ passes all tests, by using something with original form, for instance
cdf = X.cdf_.iloc[..., : self.X_.shape[-1]]. But again not sure of the consequences.

Just another thought - may it be better to aggregate everything to cumulative at initiation? Instead of testing in the different models? Then one probably only have to do conversion back when using IO methods like to_frame/to_json/to_pickle?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants