Add logic to allow dropping units with high MTR while behavioral response calculation #1856

GoFroggyRun · 2018-02-05T19:58:41Z

This PR implements @feenberg 's suggestion in #1852 that adds optional logic to ignore units with high (mtr_cap = 0.7 as discussed in #1852) MTRs while calculating behavior responses.

The added logic replaces the nearone logic, where the latter one might amplify notches in cases where unit's MTR is larger than nearone = 0.999999 (more details can be found here). A test test_drop_hi_mtr is added to test_behavior.py to make sure that the error introduced by ignoring those high MTR units is reasonably small. The test passes on my local:

(taxcalc-dev) Sean-Wangs-MacBook-Pro:taxcalc seanwang$ py.test -m test_drop_hi_mtr
===================================================== test session starts =====================================================
platform darwin -- Python 2.7.14, pytest-3.3.0, py-1.5.2, pluggy-0.6.0
rootdir: /Users/seanwang/Documents/GIT/tax-calculator, inifile: setup.cfg
plugins: xdist-1.20.1, forked-0.2, hypothesis-3.38.5, pep8-1.0.6
collected 439 items                                                                                                           

tests/test_behavior.py .                                                                                                [100%]

==================================================== 438 tests deselected =====================================================
========================================== 1 passed, 438 deselected in 22.47 seconds ==========================================

However, I have encountered one test failure. The test test_behavioral_response in test_tbi.py fails to pass. The error message reads:

(taxcalc-dev) Sean-Wangs-MacBook-Pro:taxcalc seanwang$ py.test -n4
===================================================== test session starts =====================================================
platform darwin -- Python 2.7.14, pytest-3.3.0, py-1.5.2, pluggy-0.6.0
rootdir: /Users/seanwang/Documents/GIT/tax-calculator, inifile: setup.cfg
plugins: xdist-1.20.1, forked-0.2, hypothesis-3.38.5, pep8-1.0.6
gw0 [438] / gw1 [438] / gw2 [438] / gw3 [438]
scheduling tests via LoadScheduling
....................................................................................................................... [ 27%]
....................................................................................................................... [ 54%]
....................................................................................................................... [ 81%]
..............................................................F..................                                       [100%]
========================================================== FAILURES ===========================================================
...
...
...
**** DIFF for year 2026 (year_n=7):
TBI RESULTS:
ind_tax          -204,496,151
payroll_tax    -2,256,177,524
combined_tax   -2,460,673,675
Name: 0_7, dtype: float64
STD RESULTS:
ind_tax          -324,621,038
payroll_tax    -2,258,628,355
combined_tax   -2,583,249,393
Name: 0_7, dtype: float64
**** DIFF for year 2027 (year_n=8):
TBI RESULTS:
ind_tax         2,694,995,128
payroll_tax    -2,179,762,550
combined_tax      515,232,577
Name: 0_8, dtype: float64
STD RESULTS:
ind_tax         2,688,980,813
payroll_tax    -2,179,917,092
combined_tax      509,063,721
Name: 0_8, dtype: float64
...

I'm not sure why this particular test won't pass. Is 2e-3 percent to small to account for the effect of dropQ?

cc @martinholmer @hdoupe

codecov-io · 2018-02-05T20:05:55Z

Codecov Report

Merging #1856 into master will not change coverage.
The diff coverage is 100%.

@@          Coverage Diff           @@
##           master   #1856   +/-   ##
======================================
  Coverage     100%    100%           
======================================
  Files          37      37           
  Lines        3168    3171    +3     
======================================
+ Hits         3168    3171    +3

Impacted Files	Coverage Δ
taxcalc/behavior.py	`100% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 07e28d1...bad3985. Read the comment docs.

feenberg · 2018-02-05T20:21:28Z

I don't know what the test is that is failing. The code with nearzero is not a good comparison, since it is wrong. I would look at the forecast behavior and check that it was reasonably close to a back of the envelope calculation done on a single income bracket. dan

…

On Mon, 5 Feb 2018, Sean.Wang wrote: This PR implements @feenberg 's suggestion in #1827 that adds optional logic to ignore units with high (mtr_cap = 0.7 as discussed in #1827) MTRs while calculating behavior responses. The added logic replaces the nearone logic, where the latter one might amplify notches in cases where unit's MTR is larger than nearone = 0.999999 (more details can be found here). A test test_drop_hi_mtr is added to test_behavior.py to make sure that the error introduced by ignoring those high MTR units is reasonably small. The test passes on my local: (taxcalc-dev) Sean-Wangs-MacBook-Pro:taxcalc seanwang$ py.test -m test_drop _hi_mtr ===================================================== test session starts = ==================================================== platform darwin -- Python 2.7.14, pytest-3.3.0, py-1.5.2, pluggy-0.6.0 rootdir: /Users/seanwang/Documents/GIT/tax-calculator, inifile: setup.cfg plugins: xdist-1.20.1, forked-0.2, hypothesis-3.38.5, pep8-1.0.6 collected 439 items tests/test_behavior.py . [100%] ==================================================== 438 tests deselected = ==================================================== ========================================== 1 passed, 438 deselected in 22.4 7 seconds ========================================== However, I have encountered one test failure. The test test_behavioral_response in test_tbi.py fails to pass. The error message reads: ... **** DIFF for year 2026 (year_n=7): TBI RESULTS: ind_tax -204,496,151 payroll_tax -2,256,177,524 combined_tax -2,460,673,675 Name: 0_7, dtype: float64 STD RESULTS: ind_tax -324,621,038 payroll_tax -2,258,628,355 combined_tax -2,583,249,393 Name: 0_7, dtype: float64 **** DIFF for year 2027 (year_n=8): TBI RESULTS: ind_tax 2,694,995,128 payroll_tax -2,179,762,550 combined_tax 515,232,577 Name: 0_8, dtype: float64 STD RESULTS: ind_tax 2,688,980,813 payroll_tax -2,179,917,092 combined_tax 509,063,721 Name: 0_8, dtype: float64 ... I'm not sure why this particular test won't pass. Is 2e-3 percent to small to account for the effect of dropQ? cc @martinholmer @hdoupe ___________________________________________________________________________ You can view, comment on, or merge this pull request online at: #1856 Commit Summary * fix notebook * add logic that allows to drop high mtr * add test to compare drop vs not drop File Changes * M read-the-docs/notebooks/10_Minutes_to_Tax-Calculator.ipynb (44) * M taxcalc/behavior.py (29) * M taxcalc/tests/test_behavior.py (24) Patch Links: * https://github.com/open-source-economics/Tax-Calculator/pull/1856.patch * https://github.com/open-source-economics/Tax-Calculator/pull/1856.diff — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.[AHvQVf3kG0l3mo4Mj1b_p5MWuEBSIC0bks5tR11ygaJpZM4R6A1F.gif]

GoFroggyRun · 2018-02-05T20:45:05Z

@feenberg I'm comparing the logic that ignores high MTR against standard logic that doesn't involve nearone, and this particular test passes without any problem. The one that fails has something to do with dropQ logic.

GoFroggyRun · 2018-02-05T21:42:51Z

The test failure for tbi_vs_std_behavior is strange. I did the following:

Using drop_hi_mtr logic for both TBI and STD by adding drop_hi_mtr=False and drop_hi_mtr=True respectively to here and here:

**Drop High MTRs in Both Cases**

**** DIFF for year 2026 (year_n=7):
TBI RESULTS:
ind_tax          -204,496,151
payroll_tax    -2,256,177,524
combined_tax   -2,460,673,675
Name: 0_7, dtype: float64
STD RESULTS:
ind_tax          -324,621,038
payroll_tax    -2,258,628,355
combined_tax   -2,583,249,393
Name: 0_7, dtype: float64
**** DIFF for year 2027 (year_n=8):
TBI RESULTS:
ind_tax         2,694,995,128
payroll_tax    -2,179,762,550
combined_tax      515,232,577
Name: 0_8, dtype: float64
STD RESULTS:
ind_tax         2,688,980,813
payroll_tax    -2,179,917,092
combined_tax      509,063,721
Name: 0_8, dtype: float64

**Do NOT Drop at All**

**** DIFF for year 2026 (year_n=7):
TBI RESULTS:
ind_tax          -204,496,151
payroll_tax    -2,256,177,524
combined_tax   -2,460,673,675
Name: 0_7, dtype: float64
STD RESULTS:
ind_tax          -324,621,038
payroll_tax    -2,258,628,355
combined_tax   -2,583,249,393
Name: 0_7, dtype: float64
**** DIFF for year 2027 (year_n=8):
TBI RESULTS:
ind_tax         2,694,995,128
payroll_tax    -2,179,762,550
combined_tax      515,232,577
Name: 0_8, dtype: float64
STD RESULTS:
ind_tax         2,688,980,813
payroll_tax    -2,179,917,092
combined_tax      509,063,721
Name: 0_8, dtype: float64

The numbers in the two cases are identical, suggesting that drop_hi_mtr logic does not affect the aggregate results whatsoever. I'm confused why the test would fail.

GoFroggyRun · 2018-02-05T23:33:45Z

After my most recent commit, the tbi_vs_std_behavior error message looks like the following:

**Drop High MTRs in Both Cases**

**** DIFF for year 2026 (year_n=7):
TBI RESULTS:
ind_tax          -204,496,151
payroll_tax    -2,256,177,524
combined_tax   -2,460,673,675
Name: 0_7, dtype: float64
STD RESULTS:
ind_tax          -324,621,038
payroll_tax    -2,258,628,355
combined_tax   -2,583,249,393
Name: 0_7, dtype: float64
**** DIFF for year 2027 (year_n=8):
TBI RESULTS:
ind_tax         2,694,995,128
payroll_tax    -2,179,762,550
combined_tax      515,232,577
Name: 0_8, dtype: float64
STD RESULTS:
ind_tax         2,688,980,813
payroll_tax    -2,179,917,092
combined_tax      509,063,721
Name: 0_8, dtype: float64

**Do NOT Drop at All**

**** DIFF for year 2026 (year_n=7):
TBI RESULTS:
ind_tax           -38,037,872
payroll_tax    -2,249,706,987
combined_tax   -2,287,744,859
Name: 0_7, dtype: float64
STD RESULTS:
ind_tax          -158,162,759
payroll_tax    -2,252,157,818
combined_tax   -2,410,320,577
Name: 0_7, dtype: float64
**** DIFF for year 2027 (year_n=8):
TBI RESULTS:
ind_tax         2,845,938,145
payroll_tax    -2,169,900,603
combined_tax      676,037,542
Name: 0_8, dtype: float64
STD RESULTS:
ind_tax         2,839,923,831
payroll_tax    -2,170,055,145
combined_tax      669,868,686
Name: 0_8, dtype: float64

The results make more sense than my previous comment. But I still can't understand why tbi_vs_std_behavior would fail even when drop_hi_mtr=False.

martinholmer · 2018-02-06T01:19:23Z

@GoFroggyRun, as I said ten days ago in this comment and several times before that:

We have a standard procedure to deal with bugs. You should prepare a two-part pull request that does this:

1. Identify bug by creating a new test that fails because of the buggy logic

2. Correct bug by fixing the buggy logic so that the new test now passes and all old tests continue to pass

You need to print this out in a large font, put it by your computer, and read it several times a day until you have internalized this standard procedure.

If the test you have added is supposed to be the test in Step 1 above, then you need to add it first and show it fails because of buggy logic in Tax-Calculator.

Then when you fix the code in Step 2 you need to show the new test passes and all the old tests (including the requires_pufcsv tests) continue to pass. Your "fix" has caused older test to fail, so it would seem your "fix" is introducing new bugs.

GoFroggyRun · 2018-02-06T01:21:11Z

Ok. I think I got it. When I switched to current master, and removed the nearone logic (i.e. remove lines here and define pch = ((1. - wage_mtr2) / (1. - wage_mtr1)) - 1.), the tbi_vs_std_behavior test fails with the following error message:

**** DIFF for year 2026 (year_n=7):
TBI RESULTS:
ind_tax           -38,037,872
payroll_tax    -2,249,706,987
combined_tax   -2,287,744,859
Name: 0_7, dtype: float64
STD RESULTS:
ind_tax          -158,162,759
payroll_tax    -2,252,157,818
combined_tax   -2,410,320,577
Name: 0_7, dtype: float64
**** DIFF for year 2027 (year_n=8):
TBI RESULTS:
ind_tax         2,845,938,145
payroll_tax    -2,169,900,603
combined_tax      676,037,542
Name: 0_8, dtype: float64
STD RESULTS:
ind_tax         2,839,923,831
payroll_tax    -2,170,055,145
combined_tax      669,868,686
Name: 0_8, dtype: float64

Note that the result here matches the **Do NOT Drop at All** section in my previous comment, meaning that there's no problem with drop_hi_mtr=False logic. The reason that tbi_vs_std_behavior test doesn't fail when first introduced is due to the existence of nearone logic, which blows up the behavior response and thus narrows the percentage differences between TBI and tax-calculator. That said, I'm not surprised to see that the tbi_vs_std_behavior test would fail in this PR as it removes the nearone logic.

martinholmer · 2018-02-06T12:49:57Z

@GoFroggyRun proposed adding this code to the Behavior.response method:

if drop_hi_mtr:
            # marginal tax rate > 0.7 is considered to be unreasonably high.
            mtr_cap = 0.7

And then he proceeds to cap all MTRs at 0.70.

Whatever the problems in the Behavior.response method (and, in my view, there are more than what we've been discussing since issue #1668 was raised in November), I don't see this as a sensible solution. We are allowing Tax-Calculator users to set tax rates as high a 1.0 in each taxable-income bracket, so it would be quite likely that a user who, say, wanted to study the behavioral effects of a move from Eisenhower-era income tax rates to present-day tax rates, would have a a baseline policy in which a relatively large number of filing units had a MTR in excess of 0.70. And it would be among those filing units that the user would expect to see the largest behavioral responses. So, this approach is going to generate results that make no sense to that user, and that user would likely file a Tax-Calculator bug report. And that user would be right, in my view.

martinholmer · 2018-02-06T12:55:13Z

@GoFroggyRun, It's nice you fixed in commit 6543f75 the notebook bug you diagnosed over two weeks ago in issue #1829, but that fix has nothing to do with this pull request, which you've titled: Add logic to allow dropping units with high MTR while behavioral response calculation.

Can you put commit 6543f75 in a separate pull request that resolves #1829?

martinholmer · 2018-02-13T23:47:08Z

Pull request #1856 has been closed (in favor of #1858) for reasons cited in the discussion of #1856 and #1858.

GoFroggyRun added 3 commits February 5, 2018 14:00

fix notebook

6543f75

add logic that allows to drop high mtr

2214f8c

add test to compare drop vs not drop

6c980e6

add comments

38f13c7

fix drop_hi_mtr always true

bad3985

martinholmer added the bug label Feb 6, 2018

martinholmer added WIP and removed bug labels Feb 6, 2018

MattHJensen mentioned this pull request Feb 6, 2018

Filing units with high values of percentage change in (1-MTR) under TCJA #1852

Closed

martinholmer mentioned this pull request Feb 6, 2018

Add optional argument to Behavior.response method #1858

Merged

martinholmer removed the WIP label Feb 13, 2018

martinholmer closed this Feb 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add logic to allow dropping units with high MTR while behavioral response calculation #1856

Add logic to allow dropping units with high MTR while behavioral response calculation #1856

GoFroggyRun commented Feb 5, 2018 •

edited

Loading

codecov-io commented Feb 5, 2018 •

edited

Loading

feenberg commented Feb 5, 2018 via email

GoFroggyRun commented Feb 5, 2018

GoFroggyRun commented Feb 5, 2018 •

edited

Loading

GoFroggyRun commented Feb 5, 2018 •

edited

Loading

martinholmer commented Feb 6, 2018

GoFroggyRun commented Feb 6, 2018 •

edited

Loading

martinholmer commented Feb 6, 2018

martinholmer commented Feb 6, 2018

martinholmer commented Feb 13, 2018

Add logic to allow dropping units with high MTR while behavioral response calculation #1856

Add logic to allow dropping units with high MTR while behavioral response calculation #1856

Conversation

GoFroggyRun commented Feb 5, 2018 • edited Loading

codecov-io commented Feb 5, 2018 • edited Loading

Codecov Report

feenberg commented Feb 5, 2018 via email

GoFroggyRun commented Feb 5, 2018

GoFroggyRun commented Feb 5, 2018 • edited Loading

GoFroggyRun commented Feb 5, 2018 • edited Loading

martinholmer commented Feb 6, 2018

GoFroggyRun commented Feb 6, 2018 • edited Loading

martinholmer commented Feb 6, 2018

martinholmer commented Feb 6, 2018

martinholmer commented Feb 13, 2018

GoFroggyRun commented Feb 5, 2018 •

edited

Loading

codecov-io commented Feb 5, 2018 •

edited

Loading

GoFroggyRun commented Feb 5, 2018 •

edited

Loading

GoFroggyRun commented Feb 5, 2018 •

edited

Loading

GoFroggyRun commented Feb 6, 2018 •

edited

Loading