The Tax-Calculator computes federal income and payroll taxes for a sample of tax filing units in years beginning with 2013. The Python code that performs the tax calculations has been validated in several ways. During the course of development Tax-Calculator results for a number of filing units have been compared to hand calculations performed using IRS tax forms and instructions. In addition, a more systematic program of cross-model validation is part of the ongoing development effort.
The premise behind cross-model validation work is that independently developed tax-simulation models or tax-preparation software are unlikely to contain the same bug, which means looking for differences between the output from two models using the same input is an effective way to locate bugs in tax-calculation logic.
The tools included in this directory support the following validation work flow:
- Generate a random sample of tax filing units (INPUT).
- Generate OUTPUT from INPUT using Tax-Calculator.
- Obtain OUTPUT from INPUT generated by another tax program.
- Generate tax differences by comparing the two OUTPUT files.
The working assumption in our cross-model validation work is that tax differences are more likely than not to be caused by bugs in Tax-Calculator. If exploration of specific differences do confirm a bug, it is corrected and the four-step validation process is repeated again until there are no meaningful differences in the two OUTPUT files.
This four-step validation process can be repeated for different sized INPUT files that vary in the number of input variables used to specify each filing unit's attributes and in the number of filing units included in the INPUT file. A more extensive list of input variables and a larger number of filing units increase the likelihood of finding cross-model differences. In our work, each INPUT file is generated randomly to insure a wide range of filing unit attributes.
Our goal is to repeat the four-step cross-model validation process described above using more than one other tax program with which to compare Tax-Calculator results. The details and results of the four-step process are provided in a different sub-directory for each other model. Here are links to the cross-model validation results that are currently available:
The current version of the validation tools in this directory should
work on Linux or Mac OS X without any changes and without adding any
extra software. Those who want to use these validation tools on Windows
will have to do three things: (a) install an AWK interpreter,
(b) install a Tcl interpreter, and (c) translate each tests.sh
bash script
into a Windows batch file (tests.bat). The Free Software Foundation
provides a free AWK interpreter for Windows (gawk.exe) and ActiveState
provides a free Tcl interpreter for Windows (tclsh.exe).
The taxsim_in.tcl
and csv_in.py
scripts are used to randomly
generate INPUT files, which have increasingly longer sets of filing
unit attributes and contain as many as 100,000 filing units. Read the
source code of the scripts for additional details on how to use them.
The taxdiffs.tcl
script calls the taxdiff.awk
script to compute
the number of large and small tax differences between two OUTPUT files
that are formatted like Internet-TAXSIM 28-variable output files. See
this link for
details on the space-delimited Internet-TAXSIM output file format.
All dollar amount differences of one cent or more are reported but
those differences are divided into small and large differences, where
small is defined as being ten dollars or less and large being greater
than ten dollars in absolute value. This small/large borderline is
arbitrary and has been specified in an attempt to separate out
differences that arise from repeatedly applying IRS-approved
rounding-to-the-nearest dollar rules (which Tax-Calculator does not
implement). Read the source code of the taxdiffs.tcl
script for
additional details on how to use it.