Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logic to allow dropping units with high MTR while behavioral response calculation #1856

Closed
wants to merge 5 commits into from

Conversation

GoFroggyRun
Copy link
Contributor

@GoFroggyRun GoFroggyRun commented Feb 5, 2018

This PR implements @feenberg 's suggestion in #1852 that adds optional logic to ignore units with high (mtr_cap = 0.7 as discussed in #1852) MTRs while calculating behavior responses.

The added logic replaces the nearone logic, where the latter one might amplify notches in cases where unit's MTR is larger than nearone = 0.999999 (more details can be found here). A test test_drop_hi_mtr is added to test_behavior.py to make sure that the error introduced by ignoring those high MTR units is reasonably small. The test passes on my local:

(taxcalc-dev) Sean-Wangs-MacBook-Pro:taxcalc seanwang$ py.test -m test_drop_hi_mtr
===================================================== test session starts =====================================================
platform darwin -- Python 2.7.14, pytest-3.3.0, py-1.5.2, pluggy-0.6.0
rootdir: /Users/seanwang/Documents/GIT/tax-calculator, inifile: setup.cfg
plugins: xdist-1.20.1, forked-0.2, hypothesis-3.38.5, pep8-1.0.6
collected 439 items                                                                                                           

tests/test_behavior.py .                                                                                                [100%]

==================================================== 438 tests deselected =====================================================
========================================== 1 passed, 438 deselected in 22.47 seconds ==========================================

However, I have encountered one test failure. The test test_behavioral_response in test_tbi.py fails to pass. The error message reads:

(taxcalc-dev) Sean-Wangs-MacBook-Pro:taxcalc seanwang$ py.test -n4
===================================================== test session starts =====================================================
platform darwin -- Python 2.7.14, pytest-3.3.0, py-1.5.2, pluggy-0.6.0
rootdir: /Users/seanwang/Documents/GIT/tax-calculator, inifile: setup.cfg
plugins: xdist-1.20.1, forked-0.2, hypothesis-3.38.5, pep8-1.0.6
gw0 [438] / gw1 [438] / gw2 [438] / gw3 [438]
scheduling tests via LoadScheduling
....................................................................................................................... [ 27%]
....................................................................................................................... [ 54%]
....................................................................................................................... [ 81%]
..............................................................F..................                                       [100%]
========================================================== FAILURES ===========================================================
...
...
...
**** DIFF for year 2026 (year_n=7):
TBI RESULTS:
ind_tax          -204,496,151
payroll_tax    -2,256,177,524
combined_tax   -2,460,673,675
Name: 0_7, dtype: float64
STD RESULTS:
ind_tax          -324,621,038
payroll_tax    -2,258,628,355
combined_tax   -2,583,249,393
Name: 0_7, dtype: float64
**** DIFF for year 2027 (year_n=8):
TBI RESULTS:
ind_tax         2,694,995,128
payroll_tax    -2,179,762,550
combined_tax      515,232,577
Name: 0_8, dtype: float64
STD RESULTS:
ind_tax         2,688,980,813
payroll_tax    -2,179,917,092
combined_tax      509,063,721
Name: 0_8, dtype: float64
...

I'm not sure why this particular test won't pass. Is 2e-3 percent to small to account for the effect of dropQ?

cc @martinholmer @hdoupe

@codecov-io
Copy link

codecov-io commented Feb 5, 2018

Codecov Report

Merging #1856 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff           @@
##           master   #1856   +/-   ##
======================================
  Coverage     100%    100%           
======================================
  Files          37      37           
  Lines        3168    3171    +3     
======================================
+ Hits         3168    3171    +3
Impacted Files Coverage Δ
taxcalc/behavior.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 07e28d1...bad3985. Read the comment docs.

@feenberg
Copy link
Contributor

feenberg commented Feb 5, 2018 via email

@GoFroggyRun
Copy link
Contributor Author

@feenberg I'm comparing the logic that ignores high MTR against standard logic that doesn't involve nearone, and this particular test passes without any problem. The one that fails has something to do with dropQ logic.

@GoFroggyRun
Copy link
Contributor Author

GoFroggyRun commented Feb 5, 2018

The test failure for tbi_vs_std_behavior is strange. I did the following:

Using drop_hi_mtr logic for both TBI and STD by adding drop_hi_mtr=False and drop_hi_mtr=True respectively to here and here:

**Drop High MTRs in Both Cases**

**** DIFF for year 2026 (year_n=7):
TBI RESULTS:
ind_tax          -204,496,151
payroll_tax    -2,256,177,524
combined_tax   -2,460,673,675
Name: 0_7, dtype: float64
STD RESULTS:
ind_tax          -324,621,038
payroll_tax    -2,258,628,355
combined_tax   -2,583,249,393
Name: 0_7, dtype: float64
**** DIFF for year 2027 (year_n=8):
TBI RESULTS:
ind_tax         2,694,995,128
payroll_tax    -2,179,762,550
combined_tax      515,232,577
Name: 0_8, dtype: float64
STD RESULTS:
ind_tax         2,688,980,813
payroll_tax    -2,179,917,092
combined_tax      509,063,721
Name: 0_8, dtype: float64

**Do NOT Drop at All**

**** DIFF for year 2026 (year_n=7):
TBI RESULTS:
ind_tax          -204,496,151
payroll_tax    -2,256,177,524
combined_tax   -2,460,673,675
Name: 0_7, dtype: float64
STD RESULTS:
ind_tax          -324,621,038
payroll_tax    -2,258,628,355
combined_tax   -2,583,249,393
Name: 0_7, dtype: float64
**** DIFF for year 2027 (year_n=8):
TBI RESULTS:
ind_tax         2,694,995,128
payroll_tax    -2,179,762,550
combined_tax      515,232,577
Name: 0_8, dtype: float64
STD RESULTS:
ind_tax         2,688,980,813
payroll_tax    -2,179,917,092
combined_tax      509,063,721
Name: 0_8, dtype: float64

The numbers in the two cases are identical, suggesting that drop_hi_mtr logic does not affect the aggregate results whatsoever. I'm confused why the test would fail.

@GoFroggyRun
Copy link
Contributor Author

GoFroggyRun commented Feb 5, 2018

After my most recent commit, the tbi_vs_std_behavior error message looks like the following:

**Drop High MTRs in Both Cases**

**** DIFF for year 2026 (year_n=7):
TBI RESULTS:
ind_tax          -204,496,151
payroll_tax    -2,256,177,524
combined_tax   -2,460,673,675
Name: 0_7, dtype: float64
STD RESULTS:
ind_tax          -324,621,038
payroll_tax    -2,258,628,355
combined_tax   -2,583,249,393
Name: 0_7, dtype: float64
**** DIFF for year 2027 (year_n=8):
TBI RESULTS:
ind_tax         2,694,995,128
payroll_tax    -2,179,762,550
combined_tax      515,232,577
Name: 0_8, dtype: float64
STD RESULTS:
ind_tax         2,688,980,813
payroll_tax    -2,179,917,092
combined_tax      509,063,721
Name: 0_8, dtype: float64

**Do NOT Drop at All**

**** DIFF for year 2026 (year_n=7):
TBI RESULTS:
ind_tax           -38,037,872
payroll_tax    -2,249,706,987
combined_tax   -2,287,744,859
Name: 0_7, dtype: float64
STD RESULTS:
ind_tax          -158,162,759
payroll_tax    -2,252,157,818
combined_tax   -2,410,320,577
Name: 0_7, dtype: float64
**** DIFF for year 2027 (year_n=8):
TBI RESULTS:
ind_tax         2,845,938,145
payroll_tax    -2,169,900,603
combined_tax      676,037,542
Name: 0_8, dtype: float64
STD RESULTS:
ind_tax         2,839,923,831
payroll_tax    -2,170,055,145
combined_tax      669,868,686
Name: 0_8, dtype: float64

The results make more sense than my previous comment. But I still can't understand why tbi_vs_std_behavior would fail even when drop_hi_mtr=False.

@martinholmer
Copy link
Collaborator

@GoFroggyRun, as I said ten days ago in this comment and several times before that:

We have a standard procedure to deal with bugs. You should prepare a two-part pull request that does this:

1. Identify bug by creating a new test that fails because of the buggy logic

2. Correct bug by fixing the buggy logic so that the new test now passes and all old tests continue to pass

You need to print this out in a large font, put it by your computer, and read it several times a day until you have internalized this standard procedure.

If the test you have added is supposed to be the test in Step 1 above, then you need to add it first and show it fails because of buggy logic in Tax-Calculator.

Then when you fix the code in Step 2 you need to show the new test passes and all the old tests (including the requires_pufcsv tests) continue to pass. Your "fix" has caused older test to fail, so it would seem your "fix" is introducing new bugs.

@GoFroggyRun
Copy link
Contributor Author

GoFroggyRun commented Feb 6, 2018

Ok. I think I got it. When I switched to current master, and removed the nearone logic (i.e. remove lines here and define pch = ((1. - wage_mtr2) / (1. - wage_mtr1)) - 1.), the tbi_vs_std_behavior test fails with the following error message:

**** DIFF for year 2026 (year_n=7):
TBI RESULTS:
ind_tax           -38,037,872
payroll_tax    -2,249,706,987
combined_tax   -2,287,744,859
Name: 0_7, dtype: float64
STD RESULTS:
ind_tax          -158,162,759
payroll_tax    -2,252,157,818
combined_tax   -2,410,320,577
Name: 0_7, dtype: float64
**** DIFF for year 2027 (year_n=8):
TBI RESULTS:
ind_tax         2,845,938,145
payroll_tax    -2,169,900,603
combined_tax      676,037,542
Name: 0_8, dtype: float64
STD RESULTS:
ind_tax         2,839,923,831
payroll_tax    -2,170,055,145
combined_tax      669,868,686
Name: 0_8, dtype: float64

Note that the result here matches the **Do NOT Drop at All** section in my previous comment, meaning that there's no problem with drop_hi_mtr=False logic. The reason that tbi_vs_std_behavior test doesn't fail when first introduced is due to the existence of nearone logic, which blows up the behavior response and thus narrows the percentage differences between TBI and tax-calculator. That said, I'm not surprised to see that the tbi_vs_std_behavior test would fail in this PR as it removes the nearone logic.

@martinholmer
Copy link
Collaborator

@GoFroggyRun proposed adding this code to the Behavior.response method:

if drop_hi_mtr:
            # marginal tax rate > 0.7 is considered to be unreasonably high.
            mtr_cap = 0.7

And then he proceeds to cap all MTRs at 0.70.

Whatever the problems in the Behavior.response method (and, in my view, there are more than what we've been discussing since issue #1668 was raised in November), I don't see this as a sensible solution. We are allowing Tax-Calculator users to set tax rates as high a 1.0 in each taxable-income bracket, so it would be quite likely that a user who, say, wanted to study the behavioral effects of a move from Eisenhower-era income tax rates to present-day tax rates, would have a a baseline policy in which a relatively large number of filing units had a MTR in excess of 0.70. And it would be among those filing units that the user would expect to see the largest behavioral responses. So, this approach is going to generate results that make no sense to that user, and that user would likely file a Tax-Calculator bug report. And that user would be right, in my view.

@martinholmer
Copy link
Collaborator

@GoFroggyRun, It's nice you fixed in commit 6543f75 the notebook bug you diagnosed over two weeks ago in issue #1829, but that fix has nothing to do with this pull request, which you've titled: Add logic to allow dropping units with high MTR while behavioral response calculation.

Can you put commit 6543f75 in a separate pull request that resolves #1829?

@martinholmer
Copy link
Collaborator

Pull request #1856 has been closed (in favor of #1858) for reasons cited in the discussion of #1856 and #1858.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants