Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why are notebooks in the taxdata repository? #206

Closed
martinholmer opened this issue May 28, 2018 · 4 comments
Closed

Why are notebooks in the taxdata repository? #206

martinholmer opened this issue May 28, 2018 · 4 comments
Labels

Comments

@martinholmer
Copy link
Contributor

One of the prime benefits of using a version control system (such as git) is that it provides a history of changes in a project's source code. But putting source code in notebooks makes it difficult to see the history of changes in source code, undercutting one of the major benefits of using version control.

On Friday, May 18, 2018, @MattHJensen discussed the issues related to including notebooks in GitHub project repositories, and concluded that they should not be included in project repositories. Notebooks are a convenient way to share work with other people, and so he was open to creating a notebook-sharing repository if there was demand for such a thing among those working with taxdata, Tax-Calculator, and projects that use Tax-Calculator.

Given that decision, Tax-Calculator pull request 203 removed the only notebook in the Tax-Calculator repository (read-the-docs/notebooks/10_Minutes_to_Tax-Calculator.ipynb).

Given all this, my question is why are notebooks included in the taxdata repository?

On the tip of the master branch, the following two notebooks exist:

  1. taxdata/stage1/notebook/Stage I.ipynb, which was last modified by @andersonfrailey more than a year ago on Feb 13, 2017, and is not documented (as far as I can see). What is its relationship to taxdata/stage1/stage1.py? Why is it under version control?

  2. taxdata/puf_stage2/notebook/CLP_solver_16Years.ipynb, which was last modified by @andersonfrailey about a month ago, and is not documented (as far as I can see). What is its relationship to taxdata/puf_stage2/stage2.py? Why is it under version control?

@MattHJensen @Abraham-Leventhal

@martinholmer martinholmer changed the title Why are notebooks in taxdata repository? Why are notebooks in the taxdata repository? May 28, 2018
@andersonfrailey
Copy link
Collaborator

@martinholmer,

I believe the notebooks were originally used to develop the python scripts associated with whichever directory they're in. I'd say the stage 1 notebook is no longer needed, but the stage 2 notebook is still useful.

The CyLP model used for PUF stage 2 and the PuLP model I used for the CPS stage 2 will do a "pre-solve" to see if the LP problem they're solving is feasible with the given tolerance. Unfortunately even if the problem is determined to be infeasible the models will still try and solve it with the only notification of it's infeasibility being a print statement in the terminal window. Unless you actively stop the program when you see the infeasibility notice it will keep running indefinitely.

The notebook makes it easier for us to check that the problem will be solvable each year, update the .py files with the tolerances we know will result in a feasible problem, and then run the .py file without having to worry about whether or not the problem is feasible. Otherwise whoever is running the stage 2 scripts will have to keep checking the terminal and restart the entire thing every time a tolerance needs to be updated.

@martinholmer
Copy link
Contributor Author

@andersonfrailey said in #206:

I believe the notebooks were originally used to develop the python scripts associated with whichever directory they're in. I'd say the stage 1 notebook is no longer needed ...

OK. I'll prepare a PR that removes it.

...but the stage 2 notebook is still useful.

The CyLP model used for PUF stage 2 and the PuLP model I used for the CPS stage 2 will do a "pre-solve" to see if the LP problem they're solving is feasible with the given tolerance. Unfortunately even if the problem is determined to be infeasible the models will still try and solve it with the only notification of it's infeasibility being a print statement in the terminal window. Unless you actively stop the program when you see the infeasibility notice it will keep running indefinitely.

The notebook makes it easier for us to check that the problem will be solvable each year, update the .py files with the tolerances we know will result in a feasible problem, and then run the .py file without having to worry about whether or not the problem is feasible. Otherwise whoever is running the stage 2 scripts will have to keep checking the terminal and restart the entire thing every time a tolerance needs to be updated.

Thanks for the explanation. I see the problem, but I think there is a better solution to this problem.
What exactly does the "infeasibility notice" look like?
Do the solvers print the notice to stdout or stderr?

I'll post a comment here that describes a solution that does not require an error-prone duplication of code (in the Python script and in the notebook) and does stop execution of the Python script once an "infeasibility notice" is written.

@andersonfrailey
Copy link
Collaborator

@martinholmer it prints to stdout. Here is the message you see in terminal (on a mac) when using CyLP:
screen shot 2018-05-29 at 10 57 54 am
You get a similar message with PuLP.

@martinholmer
Copy link
Contributor Author

Closing issue #206 following decision to not include any notebooks in the taxdata repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants