When Census data are collected, incomes in excess of an upper limit (approximately 1 million dollars) are top-coded and reported as $999,999 dollars rather than as the full amount. There's a bit more to it than this, but what's important is that there are reporting limits.
Recently, the effect that top-coding of census data has on income inequality has been under discussion. The issue arose when Alan Reynolds claimed that various statistical issues have created a false impression of rising inequality, a claim that has been thoroughly rebutted here and elsewhere (see below). In response, Paul Krugman emails "a finger exercise on earnings inequality and reporting limits":
Do Reporting Limits Really Affect Measured Income Inequality, by Paul Krugman 1/8/07: For my own edification, I thought I’d make a rough estimate of how much the Census reporting limits affect one dimension of inequality, inequality in earnings. What we learned amid all the nonsense from Alan Reynolds is that the Census data don’t count earned income in excess of approximately $1 million, and other forms of income are subject to even tighter reporting limits. But let’s focus just on earnings.
Now, only a tiny minority of Americans make enough for the reporting limits to matter. According to the Social Security Administration data, http://www.ssa.gov/OACT/COLA/awidevelop.html, in 2005 less than 0.06% of workers had wages and salaries exceeding $1 million – and only the part of their income over $1 million is censored. Can that piece really be big enough to significantly affect overall measures of the level and trend in inequality?
According to an estimate I’ve just done using the SSA data, the answer is yes. This kind of calculation is new to me – I was just having what passes for fun in the sick mind of an economist – but I’d like to hear any comments.
The SSA data give the number of people with wage and salary income in ranges – e.g., $1 million to $1.5 million, $1.5 million to $2 million, and so on. But it’s pretty easy to use those data to estimate the total income excluded by a $1 million reporting limit.
The key is knowing that top incomes tend to follow a Pareto distribution. That is, the number of people with any given (high) income declines exponentially with that income:
n = Ky^(-alpha)
We can integrate this to get the number of people with incomes exceeding some level Y:
N = KY^(1-alpha)/(alpha-1)
Or, in logs,
Ln(N) = ln(K/(alpha-1)) + (1-alpha) ln(Y)
Does this work? You bet! Figure 1 shows the SSA data for 2005, with income measured in millions and actual numbers of people. The Pareto distribution works very well indeed. And the fitted line lets us estimate both K and alpha: K = 154318; alpha = 2.6867.
A bit more math lets us derive the total income of people with earnings above a given Y: it’s
Y total = KY^(2-alpha)/(alpha-2)
Since we’re measuring income in millions, and looking for the income of people in the million-plus category, this becomes simply K/(alpha-2) = about $225 billion.
The SSA also gives us total wage and salary income: $5.374 trillion in 2005. So the 82,000 workers in the million-plus club, less than 0.06 percent of the work force, accounted for 4.2% of wages and salaries.
Some of that total – the part under $1 million – was counted. How much? $82 billion - $1 million per worker. What’s left, the part that was above the reporting limit, I estimate at 2.7% of total wage and salary income. If I understand correctly, that explains about a third of the difference between the Census and Piketty-Saez estimates of the top 5% share. Bear in mind that the reporting limits on other forms of income also matter, and that there are other sources of bias in the Census numbers, such as a tendency of high-income respondents to understate their incomes.
More important is how the reporting limits affect trends. Here’s what I did: I went back to 1994, just after the Census changed the reporting limits, and did exactly the same exercise. Figure 2 shows the Pareto plot for 1994: notice that the line is steeper, which says that income among the million-plus club wasn’t quite as unequal in 1994 as it was in 2005.
When you run through the whole exercise, what you find is that the earnings that would be missed because of reporting limits are much smaller, only 0.7% of total wage and salary income. This isn’t surprising: the reporting limit hasn’t changed, while the structure of wages has shifted right both because of inflation and because of rising average real earnings. Moreover, top incomes have become more unequal, with more income in the far right tail. So there’s a lot more income above the reporting limit, all of which will be captured by income-tax-based estimates of income inequality, but won’t be captured by Census data.
Now, the Census data say that the income share of the top 5% rose only slightly, from 21.2% to 22.2%, between 1994 and 2005. The Piketty-Saez data, which only go up to 2004, show a 3.7% rise. Our little exercise with earnings data suggests that the missed income due to reporting limits rose by about 2 percentage points over the same period, so that more than half the difference between the Census and Piketty-Saez trends could be the result of reporting limits that caused the Census data to miss a large and growing amount of income at the very top.
The bottom line: top-coding really, truly does matter – and yes, Virginia, income inequality is still rising.
Listed below are links to weblogs that reference Paul Krugman: Reporting Limits and Inequality:
Real vs. abstract. The reality is what exist, not abstract. The fact that most are no better off and many are worse off is the reality. All the data in the world is abstract. sometimes some people confuse the two thinking reality for the people is abstract and the data real.
Posted by: ken melvin | Jan 8, 2007 12:54:41 PM
Mr. Reynolds is not confused. He knows perfectly well what he knows.
Mr. Krugman is also not confused. He knows what he knows and can prove it, abstractly and in reality.
Mr. Reynolds can't prove what he knows, either abstractly or in reality.
Some people might call Mr. Reynolds attitude "faith-based", but actually faith is a little bit more empirical than Mr. Reynolds' attitude.
I await Mr. Reynolds' refutation of Mr. Krugman with unbated breath.
Posted by: evagrius | Jan 8, 2007 1:52:36 PM
Based upon some of the arguments Reynolds has been putting up at some of the blogs reporting on this issue, I would say that he is firmly in the "obfuscation based" community.
Posted by: Marcus Aurelius | Jan 8, 2007 1:54:36 PM
With all due respect, how can a supposed serious discussion about the full scope of Census Bureau top-coding (not just the small portion mentioned by Paul Krugman and Mark Thoma) not include the following top-coding references in any blog main posts? Or not include any Paul Krugman or EV main blog post reference to such documents and facts?
Appendix B in the 2001 issue of the Census Bureau SIPP Users' Guide outlines the specific top-codes used in the 1996 survey. To suggest that over 60 Census Bureau top-coding caps apply only to the top 1% of income earners or a tiny portion of such individuals in the top 1% is simply a factual distortion if Appendix B is read and understood.
Posted by: Movie Guy | Jan 8, 2007 5:09:17 PM
I don't think that Krugman was arguing that top-coding only applied to the top 1%, only that top-coding is a standard practice for the Census Bureau and therefore results in incomplete information.
It's the same when looking at web sites that map cities and counties for income using Census data. There's no further distinction in areas with incomes more than $200K a year.
Posted by: evagrius | Jan 8, 2007 5:42:08 PM
It should be axiomatic that if you understate the incomes of units receiving a million or more by capping them at a million you are going to get a lower Gini than the real one. In short you are going to understate the degree of inequality in incomes.
Posted by: maria | Jan 8, 2007 7:33:19 PM
" How much? $82 billion - $1 million per worker. What’s left, the part that was above the reporting limit, I estimate at 2.7% of total wage and salary income."
The CPS's problem is overstating earnings; not undereporting earnings. Since the '96 CPS revision that replaced hard topcodes with the mean topcoding amount currently used, the CPS has tended to overshoot on earnings and exceeded NIPA earnings amts.
For the last several years, I believe that the CPS has earnings well in excess of SSA data and NIPA, according to Schwabish in '06.
I have trouble reconciling the fact that a) The CPS overstates earnings income and b) The CPS massively underreports earnings for the top 5%. This would mean that the CPS is attributing several hundred billion dollars of wages to the middle and lower income invidividuals. I don't think the CPS is that worthless.
What am I misunderstanding?
Posted by: hederman | Jan 9, 2007 6:46:06 AM