-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect income percentiles #45
Comments
@codykallen computed some AGI values at different points of the AGI distribution and was troubled by his results. I've written a short Python script that tabulates the mean AGI in each AGI percentile.
Here is what this script produces.
These results are close to what @codykallen generated in his notebook, so it would seem that the AGI values are very low. What about Expanded Income? |
Using both of y'all's code with expanded income TC numbers are still low. Here is the income distribution:
And mean expanded income:
Based on some digging into the distribution of the different sources of income and conversations with @codykallen and others, I believe part of the problem might be in stage II of our extrapolation process where the weights for each target are adjusted to ensure aggregate totals for things like interest income, dividends, and wages are hit. We currently only target the distribution of wages and salaries. For everything else it is just the aggregate total. Adding more targets for other sources of income might help solve some of the problems. After I finish working on the CPS file I can try adding the additional targets and report back on how they affected the income distributions we're seeing. |
@andersonfrailey said:
I'm in full agreement that adding the additional targets will be useful and might improve this situation. But I'm not sure yet that our current distributional accuracy is as bad as any of the above analysis might make it appear. First note that both of @codykallen's comps appear to be Census bureau data, as Census Bureau is the source of the TPC tables. If we compare TD's AGI in 2014 against SOI AGI data in 2014 -- the latest year for which comparable administrative data is available -- the numbers look a little more reassuring: eyeballing from the "accumulated" section of SOI table 1.1, median AGI is around $35k. Now that is still higher than the $28k that @codykallen generates from TD, but keep in mind that SOI's $35k only includes tax filers, while @codykallen's script appears to also include non-filers, so we should expect his number to be lower. I think a fruitful next steps would be to (1) look at TC/TD's median w/o non-filers and compare that to the $35k and (2) dig into the differences between Census' income measure and AGI. As for (1), any remaining discrepancy between SOI AGI and TC/TD AGI w/o nonfilers would hopefully be narrowed by @andersonfrailey's upcoming work on additional stage 2 targets. Perhaps the primary value of that stage 2 work is that it will mean that our distribution of income items like capital gains, dividends, interest will be more accurate and we could more accurately estimate revenue for reforms that target those income items. Those improvements won't necessarily be apparent in these aggregate AGI statistics, though. As for (2) it could be that Census includes income that is excluded from AGI like transfer/welfare payments, employer provided health, as well as the other items that are included in our expanded income measure but not in AGI. If it turns out that this is the case, then the resulting differences are food for thought regarding our tab variable for distributional tables (as @codykallen notes) rather than relevant to our ability to do revenue analysis, and they won't be improved through stage 2 extrapolation, but rather by imputing major excluded income items. For background on imputations that would be helpful, see #35 to see a list of open projects for expanding TD/TC's expanded income measure. |
This comment on taxdata issue #45 follows up some suggestions made by @MattHJensen. Here we tabulate the
Now the script.
|
@codykallen, Is taxdata issue #45, which you raised in Nov 2016, still unresolved from your point of view? |
@martinholmer, I believe this can be considered complete. Closing now. |
Thanks @codykallen |
The distribution of income quintiles in the PUF appears incorrect.
Using the Tax-Calculator, I found the following values for median tax unit income. The numbers in parentheses are the nominal median household incomes from the Census Bureau.
2013: $27,250 ($52,250)
2014: $28,022 ($53,657)
2015: $28,877 ($55,775)
The numbers calculated with TC are more consistent with median individual incomes, but AGI includes income from the primary and secondary earner. I think the only major discrepancy should come from married couples filing separately.
The Tax Policy Center also has some percentile distributions. Their numbers are in parentheses next to those estimated using TC (for 2013).
20th: $5894 ($21,000)
40th: $18,537 ($41,035)
60th: $38,659 ($67,200)
80th: $77,975 ($110,232)
If our percentile distributions are too far off, we can't do a distributional analysis of a tax plan.
Link to my workbook
cc @martinholmer @MattHJensen @Amy-Xu @andersonfrailey
The text was updated successfully, but these errors were encountered: