Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ParetoNBD with covariates #545

Merged
merged 2 commits into from
Mar 15, 2024

Conversation

ricardoV94
Copy link
Contributor

@ricardoV94 ricardoV94 commented Feb 22, 2024

Description

Related Issue

Checklist

Modules affected

  • MMM
  • CLV

Type of change

  • New feature / enhancement
  • Bug fix
  • Documentation
  • Maintenance
  • Other (please specify):

📚 Documentation preview 📚: https://pymc-marketing--545.org.readthedocs.build/en/545/

@ColtAllen
Copy link
Collaborator

I think an updated UML diagram would be a helpful addition to CONTRIBUTING.MD to better understand all these new class inheritances:

image

Here are some instructions on how to create one:

#178

Copy link

codecov bot commented Feb 22, 2024

Codecov Report

Attention: Patch coverage is 19.73684% with 122 lines in your changes are missing coverage. Please review.

Project coverage is 35.61%. Comparing base (6f7a3d4) to head (db06d52).

Files Patch % Lines
pymc_marketing/clv/models/pareto_nbd.py 20.68% 115 Missing ⚠️
pymc_marketing/clv/models/beta_geo.py 0.00% 5 Missing ⚠️
pymc_marketing/clv/distributions.py 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #545       +/-   ##
===========================================
- Coverage   91.56%   35.61%   -55.96%     
===========================================
  Files          21       21               
  Lines        2052     2137       +85     
===========================================
- Hits         1879      761     -1118     
- Misses        173     1376     +1203     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pymc-labs pymc-labs deleted a comment from BBDS-Colt Feb 27, 2024
@ColtAllen
Copy link
Collaborator

Code is certainly cleaner than the IF statement jungle of the PR I posted haha. Best way I can contribute is to create a child branch and work in parallel in the dev notebook for user-oriented feedback.

@ricardoV94
Copy link
Contributor Author

Code is certainly cleaner than the IF statement jungle of the PR I posted haha. Best way I can contribute is to create a child branch and work in parallel in the dev notebook for user-oriented feedback.

It's always easier to refactor than write the first pass. The notebook would help a lot!

I'll be testing the covariate model today, and fix stuff I've certainly broken

@ricardoV94 ricardoV94 force-pushed the pareto_covar_ricardo branch 3 times, most recently from bc5a7f3 to 6ba6ee1 Compare February 28, 2024 19:24
@ricardoV94 ricardoV94 force-pushed the pareto_covar_ricardo branch 3 times, most recently from 5e49dac to 839aaf0 Compare March 1, 2024 18:28
@ricardoV94
Copy link
Contributor Author

@ColtAllen finally pushed my last changes. I broke the pre-existing methods API very much, but we can:

  1. Go back and make it like it looked before, plus optional argument for covariates in all methods
  2. Use a new API but support old API with deprecation warnings
  3. Something else. I don't like asking for a whole dataframe in cases where the user is only changing one variable, but for covariates it makes sense to request a dataframe anyway, so it makes sense that other variable also comes in the same dataframe

I'm in favor of 2.

Finally, the slow TestParetoNBDModelWithCovariates::test_inference is failing badly. I used synthetic data from fixed true parameters (so I could control the covariate coefficients), but it is not recovering those at all. Should I use different true parameter values?

@ricardoV94 ricardoV94 force-pushed the pareto_covar_ricardo branch 2 times, most recently from cfd6d75 to 133fd36 Compare March 1, 2024 18:49
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@ColtAllen
Copy link
Collaborator

ColtAllen commented Mar 1, 2024

@ColtAllen finally pushed my last changes. I broke the pre-existing methods API very much, but we can:

  1. Go back and make it like it looked before, plus optional argument for covariates in all methods
  2. Use a new API but support old API with deprecation warnings
  3. Something else. I don't like asking for a whole dataframe in cases where the user is only changing one variable, but for covariates it makes sense to request a dataframe anyway, so it makes sense that other variable also comes in the same dataframe

I'm in favor of 2.

Yes, let's do option 2.

Finally, the slow TestParetoNBDModelWithCovariates::test_inference is failing badly. I used synthetic data from fixed true parameters (so I could control the covariate coefficients), but it is not recovering those at all. Should I use different true parameter values?

Yes - the true parameters you're using are for a model fit without covariates. When I fit a covariate model in the dev notebook of my previous PR, all parameter values changed.

I also noticed the covariates were synthesized and tacked onto an existing dataset. If we're to test via parameter recovery, we'll need to generate a full dataset from the parameters used for testing.

@ricardoV94
Copy link
Contributor Author

I also noticed the covariates were synthesized and tacked onto an existing dataset. If we're to test via parameter recovery, we'll need to generate a full dataset from the parameters used for testing.

No, I'm generating new observations (recency and frequency) in that test as well

@ricardoV94 ricardoV94 force-pushed the pareto_covar_ricardo branch 2 times, most recently from 48770dd to 024e76f Compare March 4, 2024 12:13
@ColtAllen
Copy link
Collaborator

No, I'm generating new observations (recency and frequency) in that test as well

Buy why is only a single prior predictive sample being taken?

@ricardoV94
Copy link
Contributor Author

No, I'm generating new observations (recency and frequency) in that test as well

Buy why is only a single prior predictive sample being taken?

One draw is enough, it contains an observation for each synthetic customer

@ColtAllen
Copy link
Collaborator

No, I'm generating new observations (recency and frequency) in that test as well
One draw is enough, it contains an observation for each synthetic customer

I see. The new frequency column name is misspelled though, so it's testing on the original column rather than what was generated for the test.

@ColtAllen
Copy link
Collaborator

Is the user API still a work in progress? All predictive methods require arguments to be converted into dataframe columns beforehand, which seems rather cumbersome.

@ricardoV94 ricardoV94 force-pushed the pareto_covar_ricardo branch 4 times, most recently from 4de7aba to 2ae0b94 Compare March 14, 2024 19:47
@ricardoV94 ricardoV94 marked this pull request as ready for review March 14, 2024 19:49
@ricardoV94 ricardoV94 added enhancement New feature or request CLV labels Mar 14, 2024
@ricardoV94 ricardoV94 force-pushed the pareto_covar_ricardo branch 3 times, most recently from db45c51 to 652b0c9 Compare March 15, 2024 14:32
pymc_marketing/clv/models/pareto_nbd.py Outdated Show resolved Hide resolved
pymc_marketing/clv/models/pareto_nbd.py Outdated Show resolved Hide resolved
pymc_marketing/clv/models/pareto_nbd.py Outdated Show resolved Hide resolved
tests/clv/models/test_pareto_nbd.py Outdated Show resolved Hide resolved
pymc_marketing/clv/models/pareto_nbd.py Outdated Show resolved Hide resolved
pymc_marketing/clv/models/pareto_nbd.py Show resolved Hide resolved
pymc_marketing/clv/models/pareto_nbd.py Show resolved Hide resolved
@ricardoV94 ricardoV94 force-pushed the pareto_covar_ricardo branch from 652b0c9 to d64a4df Compare March 15, 2024 15:02
@ricardoV94 ricardoV94 force-pushed the pareto_covar_ricardo branch from d64a4df to db06d52 Compare March 15, 2024 15:08
@ricardoV94 ricardoV94 merged commit 5aba73e into pymc-labs:main Mar 15, 2024
9 of 10 checks passed
@ricardoV94 ricardoV94 deleted the pareto_covar_ricardo branch March 15, 2024 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLV enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants