Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change logic in taxcalc/validation/taxsim/ so it works with new TAXSIM-27 #2140

Merged
merged 22 commits into from
Dec 4, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
b64d3cd
Initial version of taxsim/process_tc_output.py
martinholmer Nov 28, 2018
4f26c95
Initial step in validation refactoring
martinholmer Nov 28, 2018
c0e5866
Add taxsim/prepare_tc_input.py
martinholmer Nov 28, 2018
4926b20
Reinstate master-branch version of taxsim/d15.taxdiffs
martinholmer Nov 28, 2018
4e53845
Rename files ?15.taxdiffs as ?15.taxdiffs-expect
martinholmer Nov 28, 2018
aca3694
Update validation/taxsim/test.sh
martinholmer Nov 28, 2018
97dc765
Remove comparison of MTRs in validation/taxsim/taxdiffs.tcl script
martinholmer Nov 28, 2018
bc9ead2
Revise README files to describe shift to using TAXSIM-27
martinholmer Nov 28, 2018
91c0ab4
Replace taxsim_in.tcl with taxsim_input.py
martinholmer Nov 29, 2018
b3029c2
Rename two taxcalc/validation/taxsim/ scripts
martinholmer Nov 29, 2018
568a83a
taxsim_input.py clarifications
martinholmer Nov 29, 2018
9152379
Update validation/taxsim/taxcalc.sh
martinholmer Nov 29, 2018
a775804
Revert pdb settings mistakenly changed in prior commit
martinholmer Nov 29, 2018
17d2d99
Merge branch 'master' into update-validation
martinholmer Nov 30, 2018
0389d93
Remove stray debugging statements
martinholmer Nov 30, 2018
33ced02
Update process_taxcalc_output.py script
martinholmer Dec 1, 2018
542b09b
Add save logic to taxcalc/validation/taxsim/taxcalc.sh
martinholmer Dec 1, 2018
a2ec437
Revise taxsim_input.py logic
martinholmer Dec 1, 2018
f922d45
Complete taxsim_input.py sample generation logic
martinholmer Dec 2, 2018
f7b2895
Add intinc to b samples in taxsim_input.py
martinholmer Dec 3, 2018
61a24bd
Remove zip file containing old TAXSIM 9 output
martinholmer Dec 3, 2018
db481e8
Require at least pandas version 0.23
martinholmer Dec 3, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions conda.recipe/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,15 @@ requirements:
build:
- python=3.6
- "numpy>=1.13"
- "pandas>=0.22"
- "pandas>=0.23"
- "bokeh>=0.13"
- numba
- toolz

run:
- python=3.6
- "numpy>=1.13"
- "pandas>=0.22"
- "pandas>=0.23"
- "bokeh>=0.13"
- numba
- toolz
Expand Down
3 changes: 1 addition & 2 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,13 @@ name: taxcalc-dev
dependencies:
- python=3.6
- "numpy>=1.13"
- "pandas>=0.22"
- "pandas>=0.23"
- "bokeh>=0.13"
- numba
- toolz
- pytest
- pytest-pep8
- pytest-xdist
- mock
- pycodestyle
- pylint
- coverage
1 change: 0 additions & 1 deletion taxcalc/tests/test_4package.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,6 @@ def test_for_consistency(tests_path):
'pytest',
'pytest-pep8',
'pytest-xdist',
'mock',
'pycodestyle',
'pylint',
'coverage'
Expand Down
41 changes: 11 additions & 30 deletions taxcalc/validation/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Validation of Tax-Calculator Logic
==================================

The Tax-Calculator computes federal income and payroll taxes for a
Tax-Calculator computes USA federal income and payroll taxes for a
sample of tax filing units in years beginning with 2013. The Python
code that performs the tax calculations has been validated in several
ways. During the course of development Tax-Calculator results for a
Expand Down Expand Up @@ -49,39 +49,20 @@ four-step process are provided in a different sub-directory for each
other model. Here are links to the cross-model validation results
that are currently available:

[Internet-TAXSIM](https://github.com/open-source-economics/Tax-Calculator/blob/master/taxcalc/validation/taxsim/README.md#validation-of-tax-calculator-against-internet-taxsim)
[Internet TAXSIM version 27](https://github.com/open-source-economics/Tax-Calculator/blob/master/taxcalc/validation/taxsim/README.md#validation-of-tax-calculator-against-internet-taxsim-version-27)

[...]()


Details on Using the Validation Tools
-------------------------------------

The current version of the validation tools in this directory should
work on Linux or Mac OS X without any changes and without adding any
extra software. Those who want to use these validation tools on Windows
will have to do three things: (a) install an AWK interpreter,
(b) install a Tcl interpreter, and (c) translate each `tests.sh` bash script
into a Windows batch file (tests.bat). The Free Software Foundation
provides a free AWK interpreter for Windows (gawk.exe) and ActiveState
provides a free Tcl interpreter for Windows (tclsh.exe).

The `taxsim_in.tcl` and `csv_in.py` scripts are used to randomly
generate INPUT files, which have increasingly longer sets of filing
unit attributes and contain as many as 100,000 filing units. Read the
source code of the scripts for additional details on how to use them.

The `taxdiffs.tcl` script calls the `taxdiff.awk` script to compute
the number of large and small tax differences between two OUTPUT files
that are formatted like Internet-TAXSIM 28-variable output files. See
[this link](https://users.nber.org/~taxsim/taxsim-calc9/index.html) for
details on the space-delimited Internet-TAXSIM output file format.
All dollar amount differences of one cent or more are reported but
those differences are divided into small and large differences, where
small is defined as being ten dollars or less and large being greater
than ten dollars in absolute value. This small/large borderline is
arbitrary and has been specified in an attempt to separate out
differences that arise from repeatedly applying IRS-approved
rounding-to-the-nearest dollar rules (which Tax-Calculator does not
implement). Read the source code of the `taxdiffs.tcl` script for
additional details on how to use it.
The current version of the validation tools in this directory tree
should work on Linux or Mac OS X without any changes and without
adding any extra software. Those who want to use these validation
tools on Windows will have to do three things: (a) install an AWK
interpreter, (b) install a Tcl interpreter, and (c) translate each
`tests.sh` and `test.sh` bash script into a Windows batch file. The
Free Software Foundation provides a free AWK interpreter for Windows
(`gawk.exe`). ActiveState provides a free Tcl interpreter for Windows
(`tclsh.exe`).
66 changes: 18 additions & 48 deletions taxcalc/validation/taxsim/README.md
Original file line number Diff line number Diff line change
@@ -1,54 +1,24 @@
Validation of Tax-Calculator against Internet-TAXSIM
====================================================
Validation of Tax-Calculator against Internet TAXSIM version 27
===============================================================

The cross-model validation process described
The general cross-model validation process described
[here](https://github.com/open-source-economics/Tax-Calculator/blob/master/taxcalc/validation/README.md#validation-of-tax-calculator-logic)
has used
[Internet-TAXSIM](https://users.nber.org/~taxsim/taxsim-calc9/index.html)
to generate step-three results.

We are in the process of comparing Tax-Calculator and Internet-TAXSIM
results generated from the `a` and `d` assumption sets in the
`taxsim_in.tcl` script for the year 2015. Each INPUT file is used to
generate a Tax-Calculator OUTPUT file using the `simtax.py` interface
to the Tax-Calculator with the `--taxsim2441` option. And each INPUT
file is used to generate an Internet-TAXSIM OUTPUT file by uploading
it to the Internet-TAXSIM website using the `56 1` option (in order to
do the EITC property-income eligibility test exactly without any
smoothing of property income) and requesting detailed intermediate
calculations. These two OUTPUT files are compared using the
`taxdiffs.tcl` script. See the `tests.sh` and `test.sh` scripts in this
directory for more details.
is being executed in this directory using
[TAXSIM-27](https://users.nber.org/~taxsim/taxsim27/).

We are in the process of comparing Tax-Calculator and TAXSIM-27
results generated from several assumption sets in the `taxsim_in.py`
script for years beginning with 2015. Each INPUT file is used to
generate a TAXSIM-27 OUTPUT file by uploading it to the TAXSIM-27
website and requesting detailed intermediate calculations. And each
INPUT file is translated into a CSV-formatted input file that is read
by the Tax-Calculator `tc.py` tool to generate output that is then
transformed into an OUTPUT file having the TAXSIM-27 format. Finally,
these two OUTPUT files are compared using the `taxdiffs.tcl` script.
See the `tests.sh` and `test.sh` scripts in this directory for more
details.

Validation Results
------------------

Here is a summary of the cross-model validation results.

### 2015 `a` Sample ###

As of 07-Sep-2016, we have compared OUTPUT files for a 2015 `a` sample
of 100,000 randomly-generated filing units. The payroll tax
liabilities and marginal payroll tax rates are exactly the same
(except for nine marginal payroll tax rates for filing units exactly
at the threshold of paying the Net Investment Income Tax, where the
marginal rate is not well defined). The intermediate income tax
results are in close agreement with the largest difference being
slightly more that a dollar (except for three large differences in AMT
taxable income, which do not translate into differences in AMT
liability). The largest difference in total income tax liability is
one cent in absolute value. And there are no meaningful differences
in marginal income tax rates. See the `a15.taxdiffs` file for details
on the differences.

### 2015 `d` Sample ###

As of 07-Sep-2016, we have compared OUTPUT files for a 2015 `d` sample
of 100,000 randomly-generated filing units. Each filing unit in the
`d` sample has additional tax attributes beyond those present in the
`a` sample, including three kinds of itemized-deduction expenses,
child care expenses, and other property income. The marginal tax
rates are essentially the same in the two OUTPUT files, the payroll
tax liabilities are exactly the same (down to the penny), and the
federal income tax liabilities are differnt by no more than one cent.
See the `d15.taxdiffs` file for details on the differences.
...
Binary file removed taxcalc/validation/taxsim/output-taxsim.zip
Binary file not shown.
114 changes: 114 additions & 0 deletions taxcalc/validation/taxsim/prepare_taxcalc_input.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
"""
Translates TAXSIM-27 input file to Tax-Calculator tc input file.
"""
# CODING-STYLE CHECKS:
# pycodestyle prepare_tc_input.py
# pylint --disable=locally-disabled prepare_tc_input.py

import argparse
import os
import sys
import numpy as np
import pandas as pd


def main():
"""
High-level logic.
"""
# parse command-line arguments:
usage_str = 'python prepare_tc_input.py INPUT OUTPUT [--help]'
parser = argparse.ArgumentParser(
prog='',
usage=usage_str,
description=('Translates TAXSIM-27 input file into a Tax-Calculator '
'CSV-formatted tc input file. '
'Any pre-existing OUTPUT file contents are overwritten. '
'For details on Internet TAXSIM version 27 INPUT '
'format, go to '
'https://users.nber.org/~taxsim/taxsim27/'))
parser.add_argument('INPUT', nargs='?', default='',
help=('INPUT is name of file that contains '
'TAXSIM-27 input.'))
parser.add_argument('OUTPUT', nargs='?', default='',
help=('OUTPUT is name of file that will contain '
'CSV-formatted Tax-Calculator tc input.'))
args = parser.parse_args()
# check INPUT filename
if args.INPUT == '':
sys.stderr.write('ERROR: must specify INPUT file name\n')
sys.stderr.write('USAGE: {}\n'.format(usage_str))
return 1
if not os.path.isfile(args.INPUT):
emsg = 'INPUT file named {} does not exist'.format(args.INPUT)
sys.stderr.write('ERROR: {}\n'.format(emsg))
return 1
# check OUTPUT filename
if args.OUTPUT == '':
sys.stderr.write('ERROR: must specify OUTPUT file name\n')
sys.stderr.write('USAGE: {}\n'.format(usage_str))
return 1
if os.path.isfile(args.OUTPUT):
os.remove(args.OUTPUT)
# read TAXSIM-27 INPUT file into a pandas DataFrame
ivar = pd.read_csv(args.INPUT, delim_whitespace=True,
header=None, names=range(1, 28))
# translate INPUT variables into OUTPUT variables
invar = translate(ivar)
# write OUTPUT file containing Tax-Calculator input variables
invar.to_csv(args.OUTPUT, index=False)
# return no-error exit code
return 0
# end of main function code


def translate(ivar):
"""
Translate TAXSIM-27 input variables into Tax-Calculator input variables.
Both ivar and returned invar are pandas DataFrame objects.
"""
assert isinstance(ivar, pd.DataFrame)
invar = pd.DataFrame()
invar['RECID'] = ivar.loc[:, 1]
invar['FLPDYR'] = ivar.loc[:, 2]
# no Tax-Calculator use of TAXSIM variable 3, state code
mstat = ivar.loc[:, 4]
assert np.all(np.logical_or(mstat == 1, mstat == 2))
invar['age_head'] = ivar.loc[:, 5]
invar['age_spouse'] = ivar.loc[:, 6]
num_deps = ivar.loc[:, 7]
mars = np.where(mstat == 1, np.where(num_deps > 0, 4, 1), 2)
assert np.all(np.logical_or(mars == 1,
np.logical_or(mars == 2, mars == 4)))
invar['MARS'] = mars
invar['f2441'] = ivar.loc[:, 8]
invar['n24'] = ivar.loc[:, 9]
num_eitc_qualified_kids = ivar.loc[:, 10]
invar['EIC'] = np.minimum(num_eitc_qualified_kids, 3)
num_taxpayers = np.where(mars == 2, 2, 1)
invar['XTOT'] = num_taxpayers + num_deps
invar['e00200p'] = ivar.loc[:, 11]
invar['e00200s'] = ivar.loc[:, 12]
invar['e00200'] = invar['e00200p'] + invar['e00200s']
invar['e00650'] = ivar.loc[:, 13]
invar['e00300'] = ivar.loc[:, 14]
invar['p22250'] = ivar.loc[:, 15]
invar['p23250'] = ivar.loc[:, 16]
nonqualified_dividends = ivar.loc[:, 17]
invar['e00600'] = invar['e00650'] + nonqualified_dividends
invar['e00800'] = ivar.loc[:, 18]
invar['e01700'] = ivar.loc[:, 19]
invar['e01500'] = invar['e01700']
invar['e02400'] = ivar.loc[:, 20]
invar['e02300'] = ivar.loc[:, 21]
# no Tax-Calculator use of TAXSIM variable 22, non-taxable transfers
# no Tax-Calculator use of TAXSIM variable 23, rent paid
invar['e18500'] = ivar.loc[:, 24]
invar['e18400'] = ivar.loc[:, 25]
invar['e32800'] = ivar.loc[:, 26]
invar['e19200'] = ivar.loc[:, 27]
return invar


if __name__ == '__main__':
sys.exit(main())
Loading