Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Stan backend #347

Open
wants to merge 51 commits into
base: master
Choose a base branch
from
Open

[WIP] Stan backend #347

wants to merge 51 commits into from

Conversation

hyunjimoon
Copy link

@hyunjimoon hyunjimoon commented Aug 1, 2022

Description

1. What is this PR doing and how?

The main contribution of this PR is inference; its benefits are estimating parameter values and calibration. Currently pysd is doing a good job in data generation conditional on parameter values but lacks modules that retrieves parameter from the generated data. This pr aims to fill the gap; Stan is a computational Bayesian statistics package, with a large eco-system, providing state of the art inference algorithms including MCMC (HMC-NUTS) and variational inference. Given the following input from users, our output is posterior samples of estimated parameters.

  1. dynamic model in .mdl or .xmile or .stmx format

  2. user's classification of

    a. assumed parameter
    a-1. time series of parameters (e.g. temperature or demand)
    a-2. fixed value of parameter (e.g. size of one franchise branch)

    b. estimated parameter (e.g. delay time of shipment)
    b-1. prior distribution
    b-2. prior parameter

    c. observed state (stock variables with observed data)

Under the hood, files under pysd/builders/stan folder, we parse xmile, mdl, stmx files into one Stan code file .stan and pass it to Stan's state of the art MCMC infefinference engine, which serves inverse process of pysd's data generation. Stan has a large ecosystem that supports both calibration and model expansion (e.g. hierarchical regression, generalized linear).

These are tested in test_scripts folder, and main parsing logics are in here and here. (SQ3. could you add some detailes on each file in the last sentence?)

2. Future plans

This is part of SilkRoad project documented here.

SilkRoad, where User and Programs

Step Program's work (P-rows have .function(input)) User's work added info (for U-row) or infrastructure (for P-row) - out format
U1 Vensim assists U1.a() a. Translate mental model to SD model Relation btw Variables .mdl
U2 PySD assists U2.a() a. Classify parameters est_param, ass_param, b. Select obs_state among stocks Want (est_param) and Have (assume: ass_param, observe: obs_state) .json
P1 PySD, .read_vensim(U1.a): Switch lang. from FP to OOP None .py
U3 a. Supply value or series of assmed_param, b. Choose family(:= dist. of msr_err_scale) DataFrame object, .json
U4 a. Choose prior_family(est_param's prior dist. type) , b. Set prior_param (est_param's prior param) .json
P2 PySD, .run(U2.ab, U3.ab): Generate synthetic data DataFrame object
P3 PySD,.create_stan_program(U2.ab, U3.ab): Switch lang. from OOP to PPL .stan
U5 a. Set precision with iter_sampling (:= # of samples) int
P4 Stan, .sample(P2, P3, U5.a)): Sample posterior draws (optimize measure with specified precision) DataFrame object

Our work is casting dynamic model into statistical model structure as part of Bayesian workflow project which provides principled guidelines for model building.
Two works are planned for the near future and it would be tremendously helpful some support could be given in the second.

  • Translating vensim and/or stella function using python here
    Currently integration, random generation, lookup are completed (but need testing) and smooth is in progress. Some parts we are missing: if then else, step (using if then else), pulse(point mass + divide by dt), delay (hard..), material memory (not possible in stan as it does not allow).

PR verification (to be filled by reviewers)

  • The code follows the PEP 8 style
  • The new code has been tested properly for all the possible cases
  • The overall coverage has not dropped and other features have not been broken.
  • If new features have been added, they have been properly documented
  • docs/whats_new.rst has been updated
  • PySD version has been updated

@pep8speaks
Copy link

pep8speaks commented Aug 1, 2022

Hello @hyunjimoon! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 51:80: E501 line too long (86 > 79 characters)
Line 91:80: E501 line too long (80 > 79 characters)
Line 96:80: E501 line too long (168 > 79 characters)
Line 97:80: E501 line too long (98 > 79 characters)
Line 259:80: E501 line too long (121 > 79 characters)
Line 264:80: E501 line too long (121 > 79 characters)
Line 298:80: E501 line too long (101 > 79 characters)

Line 25:80: E501 line too long (151 > 79 characters)
Line 29:80: E501 line too long (133 > 79 characters)
Line 36:5: E303 too many blank lines (2)
Line 60:80: E501 line too long (102 > 79 characters)
Line 75:80: E501 line too long (164 > 79 characters)
Line 79:80: E501 line too long (109 > 79 characters)
Line 83:80: E501 line too long (114 > 79 characters)
Line 89:80: E501 line too long (154 > 79 characters)
Line 96:80: E501 line too long (85 > 79 characters)
Line 100:80: E501 line too long (107 > 79 characters)
Line 120:80: E501 line too long (86 > 79 characters)
Line 121:80: E501 line too long (103 > 79 characters)
Line 162:80: E501 line too long (80 > 79 characters)
Line 163:80: E501 line too long (122 > 79 characters)
Line 180:80: E501 line too long (146 > 79 characters)
Line 186:1: W293 blank line contains whitespace
Line 196:80: E501 line too long (147 > 79 characters)
Line 204:80: E501 line too long (127 > 79 characters)
Line 210:5: E303 too many blank lines (2)
Line 228:80: E501 line too long (83 > 79 characters)
Line 230:80: E501 line too long (152 > 79 characters)
Line 232:80: E501 line too long (141 > 79 characters)
Line 236:5: E303 too many blank lines (3)
Line 237:80: E501 line too long (132 > 79 characters)
Line 240:80: E501 line too long (99 > 79 characters)
Line 257:80: E501 line too long (134 > 79 characters)
Line 266:80: E501 line too long (80 > 79 characters)
Line 269:80: E501 line too long (95 > 79 characters)
Line 287:80: E501 line too long (83 > 79 characters)
Line 340:80: E501 line too long (82 > 79 characters)
Line 404:80: E501 line too long (107 > 79 characters)
Line 427:80: E501 line too long (93 > 79 characters)
Line 452:80: E501 line too long (103 > 79 characters)
Line 453:1: W391 blank line at end of file

Line 3:11: E401 multiple imports on one line
Line 9:1: E302 expected 2 blank lines, found 1
Line 20:80: E501 line too long (131 > 79 characters)
Line 22:80: E501 line too long (105 > 79 characters)
Line 23:80: E501 line too long (119 > 79 characters)
Line 28:80: E501 line too long (150 > 79 characters)
Line 35:80: E501 line too long (107 > 79 characters)
Line 47:1: E302 expected 2 blank lines, found 1
Line 59:13: E722 do not use bare 'except'
Line 80:80: E501 line too long (82 > 79 characters)
Line 82:80: E501 line too long (99 > 79 characters)
Line 92:80: E501 line too long (98 > 79 characters)
Line 95:80: E501 line too long (151 > 79 characters)
Line 101:80: E501 line too long (90 > 79 characters)
Line 114:80: E501 line too long (96 > 79 characters)
Line 117:80: E501 line too long (106 > 79 characters)
Line 121:80: E501 line too long (90 > 79 characters)
Line 125:80: E501 line too long (128 > 79 characters)
Line 130:80: E501 line too long (83 > 79 characters)
Line 135:80: E501 line too long (85 > 79 characters)
Line 142:80: E501 line too long (109 > 79 characters)
Line 143:80: E501 line too long (82 > 79 characters)
Line 144:80: E501 line too long (86 > 79 characters)
Line 152:80: E501 line too long (134 > 79 characters)
Line 155:80: E501 line too long (83 > 79 characters)
Line 156:80: E501 line too long (90 > 79 characters)
Line 157:80: E501 line too long (179 > 79 characters)
Line 161:80: E501 line too long (98 > 79 characters)
Line 162:80: E501 line too long (84 > 79 characters)
Line 163:80: E501 line too long (126 > 79 characters)
Line 165:80: E501 line too long (139 > 79 characters)
Line 166:80: E501 line too long (98 > 79 characters)
Line 168:80: E501 line too long (145 > 79 characters)
Line 170:80: E501 line too long (169 > 79 characters)
Line 173:5: E303 too many blank lines (2)
Line 175:80: E501 line too long (115 > 79 characters)
Line 176:80: E501 line too long (121 > 79 characters)
Line 177:80: E501 line too long (115 > 79 characters)
Line 183:80: E501 line too long (93 > 79 characters)
Line 184:80: E501 line too long (140 > 79 characters)
Line 187:80: E501 line too long (100 > 79 characters)
Line 189:80: E501 line too long (150 > 79 characters)
Line 192:80: E501 line too long (97 > 79 characters)
Line 202:80: E501 line too long (86 > 79 characters)
Line 205:80: E501 line too long (108 > 79 characters)
Line 208:80: E501 line too long (125 > 79 characters)
Line 213:80: E501 line too long (94 > 79 characters)
Line 215:80: E501 line too long (102 > 79 characters)
Line 216:80: E501 line too long (106 > 79 characters)
Line 217:80: E501 line too long (110 > 79 characters)
Line 218:80: E501 line too long (99 > 79 characters)
Line 229:80: E501 line too long (97 > 79 characters)
Line 238:80: E501 line too long (112 > 79 characters)
Line 240:80: E501 line too long (86 > 79 characters)
Line 249:17: E265 block comment should start with '# '
Line 249:80: E501 line too long (174 > 79 characters)
Line 251:80: E501 line too long (125 > 79 characters)
Line 253:80: E501 line too long (94 > 79 characters)
Line 254:56: E127 continuation line over-indented for visual indent
Line 254:80: E501 line too long (102 > 79 characters)
Line 255:56: E127 continuation line over-indented for visual indent
Line 255:80: E501 line too long (106 > 79 characters)
Line 256:56: E127 continuation line over-indented for visual indent
Line 256:80: E501 line too long (95 > 79 characters)
Line 257:56: E127 continuation line over-indented for visual indent
Line 260:80: E501 line too long (95 > 79 characters)
Line 261:80: E501 line too long (163 > 79 characters)
Line 268:1: W391 blank line at end of file

Line 3:1: E302 expected 2 blank lines, found 1
Line 40:5: E303 too many blank lines (2)

Line 5:1: E265 block comment should start with '# '
Line 6:1: E265 block comment should start with '# '
Line 19:33: W292 no newline at end of file

Line 23:80: E501 line too long (106 > 79 characters)
Line 23:88: E251 unexpected spaces around keyword / parameter equals
Line 23:90: E251 unexpected spaces around keyword / parameter equals
Line 37:1: E265 block comment should start with '# '
Line 40:1: E265 block comment should start with '# '
Line 41:1: E265 block comment should start with '# '
Line 42:1: E265 block comment should start with '# '
Line 42:18: W292 no newline at end of file
Line 42:18: W292 no newline at end of file
Line 42:18: W292 no newline at end of file

Comment last updated at 2022-09-11 12:48:26 UTC

@hyunjimoon
Copy link
Author

hyunjimoon commented Aug 4, 2022

Tom Fiddaman (from Vensim) shared his recent talk on calibration which is relevant to our goal.
Fiddaman22_VensimCalib.pdf

Based on PySD team's warm greeting and Vensim team's recent interest, may I go one step further from inference to calibration on this PR? I carefully drafted my opinion of future plans, detailed list of works, and some questions.

Work for Vensim-PySD-Stan Calibration Collaboration

Table for the whole workflow is one the first pr documentation here.

Vensim-PySD-Stan

  • Revise the above table until we have consensus (feedback welcomed!)

    • Time-orderly step from U1 to U5 and P1 to P4 describes interaction interface between programs and users as they evolve. (branched out from Bayesian workflow)
    • P.a(), U.a() are function, P.a, U.a are output
    • In U2: est_param : estimated parameter, ass_param: assumed parameter, obs_state: observed state
  • Adding hierarchy by connecting vensim's subscript to PySD object to Stan model block

PySD-Stan

  • Unify input for P3 and P4 by defining types for U2ab, U3ab (ref: Arviz.InferenceData type)
  • Define function with the following input (ref: brms brm() ) would enhance process' modularity
    • family
    • prior
    • algorithm
    • stan_model_args
  • Define prior class (ref: brms set_prior, comparison with pymc's prior system is needed)
  • Connect algorithm to Stan's different algorithm (optimize() for optimization,  variational() for variational inference, look for Kalman filtering if needed)
  • Decision-based calibration using generate_quantities()
  • Add the following to notebook:
    • a. prior predictive check: how users do U3 and U4 (especially U4 requires both stat + domain knowledge)
    • b. calibration:
        1. revisit goals and resources in U2, assumptions in U3, prior knowledge in U4, system tolerance in U5 based until model passes SBC test
        1. sensitivity check on prior distribution and prior parameter
        1. compare posterior with prior
    • c. posterior predictive: compare with the real observed state data

Question for discussion

  • U2b: which is better between "b. Select obs_state among stocks" and b. Classify obs_state and nonobs_state? It depends on the average ratio of obs_state among stock variable which was 1 for prey-predator, and .75 for demand-supply model.
  • Is it ok to view vensim as functional programming (FP) language? Compared to OOP, FP's traits fewer data types and many functions seem related (perhaps why Julia has great ODE lib) e.g. of object vs function orient: array.sort() vs sort(array)
  • do you agree
    • FP is better for U1 elicit mental model? May I express this as translate prior knowledge to prior distribution (or archetype, policy, parameter)?
    • OOP is better for P2 data generation?

@enekomartinmartinez
Copy link
Collaborator

HI @hyunjimoon @Dashadower
I just wanted to let you know that we are planning to move PySD to SDXorg in September.
You should not notice any change as the links will be automatically redirected, however, it is recommendable once the migration is done to update the link location to the new one. The issues and PR will be also migrated so everything should keep as before with the only change of the repository location.
You can follow the migration process in the following issue #348

@enekomartinmartinez
Copy link
Collaborator

@hyunjimoon @Dashadower also mention that currently the tests are "waiting for approval". I will not approve them until you have included the test for the new builder so we save sending jobs to GitHub Actions.
Please, let me know when you want the CI test to be running for this PR.

@hyunjimoon
Copy link
Author

hyunjimoon commented Sep 8, 2022

@Dashadower and I am planning for a test:

  1. prey-predator.mdl's python translation

  2. prey-predator.mdl's stan translation

  3. compare ytilde for 1 and 2 (with alpha = .55, gamma = .8, beta = .028, delta = .024)

I would need @Dashadower's help on completing stan_builder's draws2data(). Minimum Viable Product in that estimated parameter has constant value.

@JamesPHoughton Could you confirm the above would be enough to build trust on stan-interface modeling on which our future collaboration can be based?

@enekomartinmartinez
Copy link
Collaborator

@hyunjimoon @Dashadower I would recommend you to merge the master branch from PySD into your branch. So you do not work with an outdated version of PySD.

We fixed some bugs related to the latest parsimonious and pandas releases. And there are several other bug fixes and new features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants