-
-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How should distribution/difference tables handle those with negative income? #1888
Comments
Originally in taxdata issue 143 @codykallen said this: @MaxGhenis mentioned:
Tax Policy Center omits those with negative income from distributional analyses but includes them in totals. From the footnotes to their distributional tables:
JCT excludes such taxpayers as well. From the footnotes to their distributional tables:
Tax Foundation's approach is not so clear in their publications, but a footnote from a 2009 TF working paper says,
CBO does this too. According to their report, "The Distribution of Household Income and Federal Taxes, 2013" (published August 2016):
|
@codykallen, Thank you for correcting my misunderstanding of how other tax analysis groups handle filing units with negative income in their distributional tables. But the quotes you provide from those groups' publications raise, in my mind, more questions than they answer. What exactly does it mean to "drop" filing units with negative income from the distributional table? It makes no sense (to me) to drop them before constructing the quintiles or deciles. If you do that you are arbitrarily shifting every filing unit's location in the quintile/decile distribution. The units with negative income are a fact of life and should be placed in the lowest income groups it seems to me. Do the other groups actually drop those with negative income before constructing the quintiles/deciles? If so, what's the rationale for doing that? Or perhaps the quotes mean that negative-income units are dropped in the calculation of the quintile/decile statistic (for example, the percentage change in after-tax expanded income statistic that started this whole discussion in @MaxGhenis' issue #1806). Is that what the other tax analysis groups do? If that is what they do, what rationale do they provide for doing that? And the biggest question in my mind is about the practice of somehow "dropping" negative-income filing units from the quintile/decile statistics but yet including them in the whole sample statistics. If including negative-income units in quintile/decile statistics is somehow undesirable, why are those with negative incomes included in the whole-sample statistics? This strikes me as consistent logic. |
@martinholmer asked several questions:
From what they've disclosed, this isn't entirely clear. It appears that they may include these people when determining the quintiles or deciles but exclude them when calculating changes in their tax liabilities or after-tax incomes.
It appears that they probably keep them when determining the cutoffs for each quintile or decile, but drop them when calculating totals or averages within each quintile or decile. However, if they do drop these individuals entirely before running the calculations, it is because these individuals' incomes are mismeasured. As an example (sorry for the politics), Donald Trump wrote off a $916 million loss in 1995, which he could carry forward for the next 18 years to offset any positive income (thus having zero or nearly zero expanded income over the next 18 years). If he was included in our sample, he would be counted as in the bottom decile. In other words, people who write off large losses to have zero or negative expanded income are not actually poor or low-income and do not belong in the bottom decile. This is also why they could be included in the aggregate totals but not the distributional analysis. They still count as tax filers, but they are not low-income and should not be included in the bottom decile. As you've noted, the statements I quoted are less than clear. But none of their models are open source, and few disclose any details beyond occasional footnotes. |
@codykallen, Thanks for your thoughts on the vexing problem of handling filing units with negative income. While I'm sympathetic to your characterization of this problem as one of mismeasured income, I'm don't think will ever be able to derive from the data we are using in Tax-Calculator a credible present-value of lifetime (past and future) income statistic that we could use to assign filing units to lifetime income quintiles or deciles or dollar bins. That would be conceptually sensible, but I don't see it as a practical possibility. And remember this mismeasurement problem is widespread. Consider the elderly couple whose only income is their social security benefits but have ten million dollars in an IRA invested in tax-free bonds. In our data, they are going to be placed in an income percentile that is way below their "true" lifetime income. Or consider someone who experienced a long spell of unemployment; that person's annual income in our data is also well below the person's "true" lifetime income. I don't see how we will ever be able to sort filing units by their "true" lifetime income. So, then the question becomes what to do with the annual tax-related income data we do have. One way to think about this problem, is that we don't want those with negative income distorting subgroup statistics like the percentage change in after-tax expanded income. But if a tax analyst wanted to look at the distribution of the dollar change in after-tax expanded income across subgroups, there would be no problem in showing that statistic for each filing units (because there's no dividing by a non-positive income to get the percentage change). This is why I'm reluctant to drop filing units with negative income. The negative incomes don't cause a problem when you don't use them as a divisor. So, what do you think about the following approach to calculating the percentage change in after-tax expanded-income statistic by income subgroups? Instead of deciles or quintiles, we could compute this statistic for each baseline expanded-income percentile (that is, 100 equal-sized subgroups), but show in the table or graph only the percentiles that contain no filing units with negative expanded income in the baseline. This is exactly what TaxBrain does with the dollar income bins. Tax-Calculator computes statistics for all the bins (including the lowest bin containing those with negative expanded income), but TaxBrain does not show that bottom bin. (We are in the process of fixing the labeling of these TaxBrain tables as discussed in #1889.) Also, this is the approach taken by the average tax rate graph generated by the First the MTR graph which plots all of the percentiles: And now the ATR graph which does not plot a few of the low percentiles: |
@MaxGhenis, we are interested in your thoughts on the approach described in this comment. |
The lifetime income question is an interesting one, and I agree that percentile plots can address some of the problem and are more useful in some circumstances, negative income aside. For example, UBI reforms have a more significant relative effect for, say, the bottom 5%. But to be clear, @martinholmer are you suggesting removing all equal-frequency binning aside from percentiles? For better or worse, deciles and quintiles are commonly reported from other tax analysis groups and in the media, so I don't think this is a tenable position. They give a single number or small set of numbers that can be easily absorbed. Given percentiles only, one could report the average change over percentiles in the bottom decile, but then we're basically back where we started. The approach that makes most sense to me is including all tax units when defining quantiles, and then dropping negatives from calculations involving such quantiles. @martinholmer minor question on We may also want to think about zeros, as to avoid infinite percentage changes. This isn't an issue now with only 0.75% having zero given benefits, but one could imagine comparisons of chained reforms that introduce this problem. |
@MaxGhenis asked:
No, I don't think I ever suggested that. In fact, the quintile and decile graphs of percentage change in income are still part of the Tax-Calculator library and so are the decile distribution and difference tables. The documentation for those graphs and tables have been revised to say that they include filing units with negative and zero income. |
@MaxGhenis said:
Fine. Because Tax-Calculator is an open-source project, you have the complete freedom to do that. |
@MaxGhenis asked:
A percentile with "any negatives [or zeros]" is not shown in the graph. |
OK could you clarify the approach you're suggesting / asking for feedback on? If it's only the offering of the new
Re:
The logical extension of this would be not showing the bottom decile and quintile, since they include any negatives. Would it be confusing to have different logic for reporting percentiles vs. quintiles and deciles? If a goal is consistency across reporting of quantiles, I think there are four options:
|
On Wed, 21 Feb 2018, Martin Holmer wrote:
@MaxGhenis said:
The approach that makes most sense to me is including all tax
units when defining quantiles, and then dropping negatives from
calculations involving such quintiles.
Fine. Because Tax-Calculator is an open-source project, you have the
complete freedom to do that.
But does seem like the sort of improvement that should be welcomed by the
maintainers, otherwise it could stimulate a fork. Mixing negative income
taxpayers (with loss carryforwards) with other low income taxpayers
doesn't produce an informative distribution table.
dan feenberg
|
@MaxGhenis and @feenberg, Thanks for your comments in issue #1888. But as the discussion leading up to @codykallen's comment suggests, there is no "consistency" or clarity in what other tax analysis groups do with filing units with negative income. And, Dan, can you explain where we should place "taxpayers with loss carryforwards" in the income distribution? And are your sure that "taxpayers with loss carryforwards" are the only filing units with negative income? |
@feenberg said in issue #1888:
Dan, you're focusing on only one group of filing units with negative income. What about a single individual who is running a struggling small business and reports a negative Schedule C net income? Isn't it plausible to think this person is part of the group you call "other low income taxpayers"? Why should we ignore this individual? But more broadly, I don't understand why we are in this situation many years into the project. If you feel so strongly about this issue, why didn't you raise this issue at the very beginning of the project when the distribution and difference tables were first introduced into the Tax-Calculator library? |
Very good discussion of the substance in this issue. It is welcome to be able to pay more attention to implementation details like these as the project matures and contributors and users have the leisure to zoom in. |
On Wed, 21 Feb 2018, Martin Holmer wrote:
@MaxGhenis and @feenberg, Thanks for your comments in issue
But as the discussion leading up to @codykallen's comment suggests, there is no "consistency" or
clarity in what other tax analysis groups do with filing units with negative income.
And, Dan, can you explain where we should place "taxpayers with loss carryforwards" in the income
distribution? And are your sure that "taxpayers with loss carryforwards" are the only filing units
with negative income?
I would exclude taxpayers with loss carryforwards or exclude losses from
income. One or the other is highly desirable. Do you want to put Trump's
800 million dollar loss in the table? Doesn't that distort the resources
available to the bottom quintile?
I would be interested in the source of losses in our data. Do we cap
capital losses at $3,000?
dan
…
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the
thread.[AHvQVQRQn1AWP6Ftya7wkEvXEolgUepAks5tXIVkgaJpZM4SJVHD.gif]
|
@feenberg asked:
Pre-TCJA, the only losses capped were investment losses, if the sum of short-term capital gains ( |
@MaxGhenis said:
Pull request #1890 now implements your option 4. The new default behavior of the |
Will this also be the default view in TaxBrain? This seems severe to me. To capture the several strands to this discussion, I put the latest info in this document, to which all have edit access. This includes info on what other analysis groups, what leads to negatives, their impact on CPS data, and options for Tax-Calculator. I also added two options in addition to my initial four:
Leading to this overall table: I also realized that the table with shares of tax units above is mistakenly labeled as after-tax income, instead of expanded income. Since after-tax income is also relevant, I added that info to the table (not much different). This also corrects a number with zero income, and makes the top two rows shares of total instead of bottom decile, since we're now concerned with various quantiles. Feel free to edit, comment, or suggest in the doc. I'll also keep it updated for reference to reflect discussion in this issue. |
@MaxGhenis asked in issue #1888:
I have no idea; as of now TaxBrain shows no graphs generated by Tax-Calculator. If it seems "severe", why did you list it as a sensible option about how to handle this issue? |
I listed all potential options to be comprehensive, and included this because you had already enacted it for percentiles. I've made clear my preference for aligning with TPC, CBO, and TF on option 1, and thought it self-evident that discarding 20% of tax units, because 0.1% are problematic, would be basically a non-starter. At this point the maintainers need to make a decision, a process I'm not familiar with when the solution is not obvious. FWIW I've found collaborative documents and meetings more productive than long back-and-forth GitHub/request threads when dealing with complex design challenges that require consensus. I'd be curious what this process typically looks like for Tax-Calculator, though of course this is outside my realm and I don't want to step on any toes. |
Is there a goal of consistency between TaxBrain views and default taxcalc views? Is TaxBrain behavior in scope of this issue? |
@MaxGhenis asked:
Probably not because the topic is highly conjectural, seeing that right now TaxBrain does not display any graphs generated by Tax-Calculator. |
What about TaxBrain's difference tables? Shouldn't the decision for the graphs also apply to the tables? |
@MaxGhenis said in issue #1888:
and then Max described option 1 this way:
Why don't you prepare a pull request that does this so that we can better assess the pros and cons of implementing this approach? With this issue everything is in the implementation details. As I've said before, the TPC and CBO approach of dropping negative income units from the bins/quantiles but including them in the totals is logically inconsistent and is almost certainly going to lead to confusion among users for that reason. And another concern I have about this approach is that it introduces distortions (that most users would consider bugs) in other statistics in the distribution/difference tables (other than the percentage change in after-tax expanded income). So, for example, consider a UBI reform that gives every person in a filing unit $10,000 per annum tax-free. If we implement your option 1, users are going to start complaining, with good reason, about the tables not making sense. Users will say: "When I sum the product of |
@MaxGhenis asked:
Yes, that would seem reasonable. |
@martinholmer said:
No, it isn't. Those who write off large business losses and end up with negative income are not actually low-income. Although they file taxes (and thus belong in the totals), one cannot reasonably ascertain where they should realistically fall within the income distribution. Very few of them, if any, belong in the lowest quintile (or decile, or percentile), but they cannot accurately be placed into any other income bin. |
@codykallen said:
I can see your line of argument and while everything you say makes sense, it is still true that the parts do not add up to the total. At some level that is a logical inconsistency. |
@martinholmer said:
What if we add a bin for "unallocated tax units" or "undistributed tax units"? And add an asterisk or footnote saying "These tax units have negative income due to large business losses. They are included in the totals but excluded from the distributional analysis."? |
@codykallen said in issue #1888:
This is a constructive suggestion, thanks! I've got a few questions.
|
I've thought of option 1 as essentially having this extra row as @codykallen describes, and then hiding it because it conveys little useful information. Do 0.1% of tax units deserve the same real estate as an entire quintile in these charts? |
@martinholmer asked:
I think we should show totals and differences, but leave blank the entries for anything percentage-related (such as average tax rates). Or put
"These tax units wrote off losses that exceed their other sources of income, resulting in negative income. As a result, these filers cannot be accurately placed within the income distribution. They are included in the totals but excluded from the income bins." |
@martinholmer haven't you already operationalized it in |
@codykallen said in issue $1888:
That seems like a reasonable approach on first thought. And @codykallen also said:
OK, but how do we operationalize "wrote off losses that exceed their other sources of income"? |
I would suggest this this filing unit be included in the "unallocated" bin. Although I wonder how many filers this applies to. |
@codykallen, Here's my (perhaps incorrect) understanding of the proposal you're making:
Is that a correct understanding of your proposal? |
@martinholmer, that is a correct interpretation of my proposal. |
This maps to option 6 in my list, not option 1, which is the option I believe CBO and TPC have taken. In CBO/TPC's option 1, deciles are true deciles and negatives are removed from the bottom decile. Seems fine (probably doesn't change results much) but just confirming you want to deviate? Also curious what you think about using baseline after-tax income instead of baseline expanded income? Using expanded income still introduces a (small) potential for sign inversion when calculating change to after-tax income. |
@MaxGhenis said:
Suppose instead that we kept the tax units when determining the cutoffs for each bin, but dropped them from the actual bins. Then the lowest bin would be smaller than the others, so a distribution by decile would have one bin with 8% of filing units and 9 bins with 10% of units each. @MaxGhenis also asked:
Yes, I prefer using baseline expanded income as the income measure for sorting into bins. Although if any column involves dividing by income, the cell for the "unallocated" group should be left empty. |
Yes this is a trade-off between a slightly smaller bottom bin and quantiles that aren't true quantiles. How important is it that when we describe the upper decile it's actually the upper decile? Neither seems inherently better or worse to me, but it does seem noteworthy that three of the four groups you investigated appear to use the same methodology.
FYI @martinholmer @feenberg and I are discussing this over at #1893.
Right, my concern is that in using expanded income, %chg to after-tax income still divides by something that could be negative, in non-unallocated bins. Consider a tax unit with $1k expanded income and -$1k after-tax income (warrants investigation to determine prevalence), and $0 after-tax income under the reform. This tax unit would not be unallocated, since they have positive expanded income, and their after-tax income increases by $1k. But their % change to after-tax income is $1k/-$1k = -100%. This is the type of problem that initially motivated this issue. |
@MaxGhenis said:
This is indeed an issue. Personally, when I do a distributional analysis, I drop any tax units with negative expanded income and with tax liability in excess of expanded income (i.e. negative after-tax income). Since this is a step further than anything others have done, I did not specifically recommend it, but you could consider it when dealing with your special case concerns. |
I imagine you mean "or" where you say "and"? Can we formalize this as one of dropping one of three sets of tax units, i.e. from this comment? a. Tax units with negative baseline expanded income. Are there other possibilities? Maybe add AGI too to be comprehensive? (c) seems safest to me, and although it differs from the standard expanded-income-only binning, this group is already weird so that doesn't seem like a big drawback to me. |
@MaxGhenis, you're correct; I meant to say "or" (excluding the union of (a) and (b)). And this group is definitely weird. Personally, I wouldn't recommend including AGI, since that is complicated by above-the-line deductions and the TCJA's loss limitation. For example, suppose a single filer has $300,000 of wages and $500,000 of business losses. This gives an expanded income of -$200,000, regardless of before or after the TCJA, so the filer would go in the "unallocated" bin. This filer's AGI under pre-TCJA law is -$200,000, but his AGI under TCJA law is $50,000. Expanded income is far more consistent and less susceptible to tax policy than AGI. |
Thanks for the context comparing AGI and expanded income, @codykallen. The important parts to address are metrics that can serve as denominators, so leaving AGI out SGTM. I added tabulation of expanded income sign by after-tax income sign to this notebook (2018 CPS), which shows that 0.008% of tax units have positive expanded income and negative after-tax income. Including them would then expand the excluded set from 0.110% to 0.0118%, a relative increase of 7%. Although small, this is sizable enough relative to the scope of the problem to be worth doing IMHO. Sounds like it would save Cody some time in future analyses too. I added this info to this section of the doc. |
For those not following #1902, it implements a flag to hide nonpositive incomes, including zeros. I disagree with this design decision. I don't see evidence that those with zero incomes don't truly belong to the bottom decile (as we believe those with negative incomes are actually rich); the share of those with zero income is of a similar order of magnitude as other surveys; and this appears to deviate from other tax analysis groups. The advantage is not having to null out the bottom one or two percentiles that may have undefined % growth, but I don't see this necessitating discarding of these tax units, which are relevant to changes affecting the bottom decile. That said, the issue has been discussed at length here and in #1902, so I won't be discussing it further. If the maintainers want to consider future changes I'll look out for them, but otherwise I'll be using my own functions to bucket tax units discarding those with negative incomes and keeping those with zero incomes. |
@MaxGhenis said in #1888:
When there is no consensus, Tax-Calculator tries to be agnostic and provide users with the ability to make the decision. In the case of handling those with negative |
Thanks @martinholmer for supporting more flexibility! How will the TaxBrain decile tables display zeros and negatives? |
@MaxGhenis said:
I honestly don't know. The TaxBrain developers have the same choices as any other user of Tax-Calculator. |
@MaxGhenis asked:
You might want to follow PolicyBrain pull request 846. |
@MaxGhenis asked
Yes, please follow ospc-org/ospc.org#846. |
Originally in taxdata issue 143 @MaxGhenis said this:
@martinholmer thanks for the quintile graph function! I updated the notebook to use taxcalc 0.16.0 with benefits (which I'm extremely excited about!!) and look into the bottom quintile. Here's how the results shifted (I didn't calculate quintiles in the original analysis):
Given these negatives still affect the bottom decile by 28%, it seems like this may at least warrant a caveat in TaxBrain or something. Is there a source or justification for the decision of other tax analysis groups to include them? I couldn't find anything from TPC, TF, or CBO with a quick search.
Certain use cases will also justify omitting or zeroing out negatives, for example calculating the Gini coefficient. Users could spin up their own code, but it might be worth some additional taxcalc code to standardize this at some point.
The text was updated successfully, but these errors were encountered: