Naked Statistics: Stripping The Dread From The Data

October 31, 2018

Naked Statistics: Stripping The Dread From The Data

By Charles Wheelan (author of Naked Economics)

Statistics is like a high-caliber weapon: helpful when used correctly and potentially disastrous in the wrong hands. The problem is that if the data are poor, or if the statistical techniques are used improperly, the conclusions can be wildly misleading and even potentially dangerous.

Statistics rarely offers a single "right" way of doing anything. Does it provide meaningful information in an easily accessible way? Absolutely. It's a nice tool for making a quick comparison between the performances of two quarterbacks on a given day.

Chapter 1. What's the point?

The Gini index measures how evenly wealth (or income) is shared within a country on a scale from zero to one. The statistic can be calculated for wealth or for annual income, and it can be calculated at the individual level or at the household level. (All of these statistics will be highly correlated but not identical.) The Gini index, like the passer rating, has no intrinsic meaning; it's a tool for comparison. A country in which every household had identical wealth would have a Gini index of zero. By contrast, a country in which a single household held the country's entire wealth would have a Gini index of one. As you can probably surmise, the closer a country is to one, the more unequal its distribution of wealth. The United States has a Gini index of .45, according to CIA. So what?
Once that number is put into context, it can tell us a lot.
For example, Sweden has a Gini index of .23. Canada is .32. China's is .42. Brazil's is .54. South Africa's is .65.
As we look across those numbers, we get a sense of where the U.S. falls relative to the rest of the world when it comes to income inequality. We can also compare different points in time. The Gini index for the U.S. was .41 in 1997 and grew to .45 over the next decade. (The most recent CIA data are for 2007.) This tells us in a objective way that while the U.S. grew richer over that period of time, the distribution of wealth grew more unequal.
Again, we can compare the changes in the Gini index across countries over roughly the same time period. Inequality in Canada was basically unchanged over the same stretch. Sweden has had significant economic growth over the past two decades, but the Gini index in Sweden actually fell from .25 in 1992 to .23 in 2005, meaning that Sweden grew richer and more equal over that period.

Is the Gini index the perfect measure of inequality? Absolutely not -- just as the passer rating is not a perfect measure of quarterback performance. But it certainly gives us some valuable information on a socially significant phenomenon in a convenient format.

What's the point? The point is that statistics helps us process data, which is really just a fancy name for information. Sometimes the data are trivial in the grand scheme of things, as with sports statistics. Sometimes they offer insight into the nature of human existence, as with the Gini index.

The world is producing more and more data, ever faster and faster. Yet, as the New York Times has noted, "Data is merely the raw material of knowledge." Statistics is the most powerful tool we have for using information to some meaningful end, whether that is identifying underrated baseball players or paying teachers more fairly. Here is a quick tour of how statistics can bring meaning to raw data.

Description and Comparison

A bowling score is a descriptive statistics. So is batting average. We use numbers, in sports and everywhere else in life to summarize information. How good a baseball player was? To a baseball fan, that is a meaningful statement, because it encapsulates an eighteen-season career (something mildly depressing about having one's lifework collapsed into a single number). Of course, baseball fans have also come to recognize that descriptive statistics other than batting average may better encapsulate a player's value on the field.

We evaluate the academic performance of high school and college students by means of a grade point average, or GPA. Someone has a 3.7 GPA is clearly a stronger student than someone at the same school with a 2.5 GPA. That makes it a nice descriptive statistic. It's easy to calculate, to understand, and to compare across students. But it's not perfect. The GPA does not reflect the difficulty of the courses that different students may have taken.

Overreliance on any descriptive statistic can lead to misleading conclusions, or cause undesirable behavior. "Oversimplified descriptive statistic", but the word "oversimplified" is redundant. Descriptive statistics eixt to simplify, which always implies some loss of nuance or detail. Anyone working with numbers needs to recognize as much.

Inference

How many homeless people live on the streets of Chicago?
It is expensive and logistically difficult to count the homeless population in a large metropolitan area. Yet it is important to have a numerical estimate of this population for purposes of providing social services, earning eligibility for state and federal revenues, and gaining congressional representation. One important statistical practice is sampling, which is the process of gathering data for a small area, say, a handful of census tracts, and then using those data to make an informed judgment, or inference, about the homeless population for the city as a whole. Sampling requires far less resources that trying to count an entire population; done properly, it can be every bit as accurate.

A political poll is one form of sampling. A research organization will attempt to contact a sample of households that are broadly representative of the larger population and ask them their views about a particular issue or candidate. This is obviously much cheaper and faster than trying to contact every household in an entire state or country. The polling and research firm Gallup reckons that a methodologically sound poll of 1,000 households will produce roughly the same results as a poll that attempted to contact every household in America.
That's how we figured out how often Americans are having sex, with whom, and what kind. In the mid-1990s, the National Opinion Research Center at the University of Chicago carried out a remarkably ambitious study of American sexual behavior. The results were based on detailed surveys conducted in person with a large, representative sample of American adults.

Assessing Risk and Other Probability-Related Events

Casinos make money in the long run -- always. That does not mean that they are making money at any given moment. The whole gambling industry is built on games of chance, meaning that the outcome of any particular roll of the dice or turn of the card is uncertain. At the same time, the underlying probabilities for the relevant events are known. When the underlying probabilities favor the casinos (as they always do), we can be increasingly certain that the "house" is going to come out ahead as the number of bets wagered gets larger and larger, even as those bells and whistles keep going off.

This turns out to be a powerful phenomenon in areas of life far beyond casinos. Many businesses must assess the risks associated with assorted adverse outcomes. They cannot make those risks go away entirely, just as a casino cannot guarantee that you won't win every hand of blackjack that you play. However, any business facing uncertainty can manage these risks by engineering processes so that the probability of an adverse outcome, anything from an environmental catastrophe to a defective product, becomes acceptably low. Wall Street firms will often evaluate the risks posed to their portfolios under different scenarios, with each of those scenarios weighted based on its probability. The financial crisis 2008 was precipitated in part by a series of market events that had been deemed extremely unlikely, as if every player in a casino drew blackjack all night. I will argue that these Wall Street models were flawed and that the data they used to assess the underlying risks were too limited, but the point here is that any model to deal with risk must have probability as its foundation.

When individuals and firms cannot make unacceptable risks go away, they seek protection in other ways. The entire insurance industry is built upon charging customers to protect them against some adverse outcome, such as a car crash or a house fire. The insurance industry does not make money by eliminating there events; car crash and houses burn every day. Instead, the insurance industry makes money by charging premiums that are more than sufficient to pay for the expected payouts from car crashes and house fires.

Probability can even be used to catch cheats in some situations. One company will flag exams at a school or test site on which the number of identical wrong answers is highly unlikely, usually a pattern that would happen by chance less than one time in a million. The mathematical logic stems from the fact that we cannot learn much when a large group of students all answer a question correctly. That's what they are supposed to do; they could be cheating, or they could be smart. But when those same test takers get an answer wrong, they should not all consistently have the same wrong answer. If they do, it suggests that they are copying from one another. The company also looks for exams in which a test taker does significantly better on hard questions than one easy questions, suggesting that he or she had answers in advance, and for exams on which the number of "wrong to right" erasures is significantly higher than the number of "right to wrong erasures, suggesting that a teacher or administrator changed the answer sheets after the test.

Of course, you can see the limitations of using probability. A large group of test takers might have the same wrong answers by coincidence; in fact, the more schools we evaluate, the more likely it is that we will observe such patterns just as a matter of chance. A statistical anomaly does not prove wrong doing.

Probability is one weapon in an arsenal that requires good judgment.

Identifying Important Relationships (Statistical Detective Work)

Does smoking cigarettes cause cancer? The scientific method dictates that if we are testing a scientific hypothesis, we should conduct a controlled experiment in which the variable of interest (e.g. smoking) is the only thing that differs between the experimental group and the control group. If we observe a marked difference in some outcome between the two groups (e.g. lung cancer), we can safely infer that the variable of interest is what caused that outcome. We cannot do that kind of experiment on humans. If our working hypothesis is that smoking causes cancer, it would be unethical to assign recent college graduates to two groups, smokers and nonsmokers and then see who has cancer at the twentieth reunion. (We can conduct controlled experiments on humans when our hypothesis is that a new drug or treatment may improve their health; we cannot knowingly expose human subjects when we expect an adverse outcome.)
You might point out that we do not need to conduct an ethically dubious experiment to observe the effects of smoking. Couldn't we just skip the whole fancy methodology and compare cancer rates at the twentieth reunion between those who have smoked since graduation and those who have not?
No. Smokers and nonsmokers are likely to be different in ways other than their smoking behavior. For example, smokers may be more likely to have other habits, such as drinking heavily or eating badly, that cause adverse health outcomes. If the smokers are particularly unhealthy at the twentieth reunion, we would not know whether to attribute this outcome to smoking or to other unhealthy things that many smokers happen to do. We would also have a serious problem with the data on which are basing our analysis. Smokers who have become seriously ill with cancer are less likely to attend the twentieth reunion. As a result, any analysis of the health of the attendees at the twentieth reunion will be seriously flawed by the fact that the healthiest members of the class are the most likely to show up. The further the class gets from graduation, say, a fortieth or a fiftieth reunion, the more serious this bias will be.

We cannot treat humans like laboratory rats. As a result, statistics is a lot like good detective work. The data yield clues and patterns that can ultimately lead to meaningful conclusions. You have probably watched one of those impressive police procedural shows like CSI: New York in which very attractive detectives and forensic experts pore over minute clues -- DNA from a cigarette butt, teeth marks on an apple, a single-fiber from a car floor mat -- and then use the evidence to catch a violent criminal. The appeal of the show is that these experts do not have the conventional evidence used to find the bad guy, such as an eyewitness or a surveillance videotape. So they turn to scientific inference instead. Statistics does basically the same thing. The data present unorganized clues -- the crime scene. Statistical analysis is the detective work that crafts the raw data into some meaningful conclusion.
Regression analysis is the tool that enables researchers to isolate a relationship between two variables such as smoking and cancer, while holding constant (or "controlling for") the effects of other important variables, such as diet, exercise, weight, and so on. When you read in the newspaper that eating a bran muffin every day will reduce your chances of getting colon cancer, you need not fear that some unfortunate group of human experimental subjects has been force-fed bran muffins in the basement of a federal laboratory somewhere while the control group in the next building gets bacon and eggs. Instead, researchers will gather detailed information of thousands of people, including how frequently they eat bran muffins, and then use regression analysis to do two crucial things: (1) quantify the association observed between eating bran muffins and contracting colon cancer (e.g. a hypothetical finding that people who eat bran muffins have a 9 percent lower incidence of colon cancer, controlling for other factors that may affect the incidence of the disease); and (2) quantify the likelihood that the association between bran muffins and a lower rate of colon cancer observed in this study is merely a coincidence -- a quirk in the data for this sample of people -- rather than a meaningful insight about the relationship between diet and health.

There is an academic literature on terrorists and suicide bombers -- a subject that would be difficult to study by means of human subjects (or lab fats for that matter). One such book, What Makes a Terrorist, was written by a graduate school statistics professors. The book draws its conclusions from data gathered on terrorist attacks around the world. A sample finding: Terrorists are not desperately poor, or poorly educated. The author, Alan Krueger, concludes, "Terrorists tend to be drawn from well-educated, middle-class or high-income families."
Why? Well, that exposes one of the limitation of regression analysis. We can isolate a strong association between two variables by using statistical analysis, but we cannot necessarily explain why that relationship in causal, meaning that a change in one variable is really causing a change in the other. In the case of terrorism, Prof. Krueger hypothesizes that since terrorists are motivated by political goals, those who are most educated and affluent have strongest incentive to change society. These individuals may also particularly rankled by suppression of freedom, another factor associated with terrorism. In Krueger's study, countries with high levels of political repression have more terrorist activity (holding other factors constant).

What's the point? The point is not to do math to dazzle others with advanced statistical techniques. The point is to learn things that inform our lives.

Lies, Damned Lies, and Statistics
Even in the best of circumstances, statistical analysis rarely unveils "the truth." We are usually building a circumstantial case based on imperfect data. As a result, there are numerous reasons that intellectually honest individuals may disagree about statistical results or their implications. At the most basic level, we may disagree on the question that is being answered. Fancy descriptive statistics can inform this question, but they will never answer it definitively.
There are limits on the data we can gather and the kinds of experiments we can perform. Alan Krueger's study of terrorists did not follow thousands of youth over multiple decades to observe which of them evolved into terrorists. It's just not possible. Nor can we create two identical nations -- except that one is highly repressive and the other is not -- and then compare the nuber of suicide bombers that emerge in each. Even when we can conduct large, controlled experiments on human beings, they are neither easy nor cheap. Researchers did a large-scale study on whether or not prayer reduces postsurgical complications, which was one of the question raised.
Secretary of Defense Donald Rumsfeld famously said, " You go to war with the army you have -- note the army you might want or wish to have at a later time." Whatever you may thing of Rumsfeld (and the Iraq war that he was explaining), that aphorism applies to research, too. We conduct statistical analysis using the best data and methodologies and resources available. The approach is not like addition or long division, in which the correct technique yields the "right" answer and a computer is always more precise and less fallible than a human. Statistical analysis is more like good detective work. Smart and honest people will often disagree about what the data are trying to tell us. But who says that everyone using statistics is smart or honest? The reality is that you can lie with statistics. Or you can make inadvertent errors. In either case, the mathematical rpecision attached to statistical analysis can dress up some serious nonsense.
So, what's the point of learning statistics?
--> To summarize huge quantities of data
--> To make better decision
--> To answer important social questions
--> To recognize patterns that can refine how we do everything from selling diapers to catching criminals
--> To catch cheaters and prosecute criminals
--> To evaluate the effectiveness of policies, programs, drugs, medical procedures, and other innovations
--> To spot the scoundrels who use these very same powerful tools for nefarious ends

Chapter 2. Descriptive Statistics

Descriptive statistics are often used to compare two figures or quantities. I'm one inch taller than my brother; today's temperature is nine degrees above the historical average for this date; and so on. Those comparisons make sense because most of us recognize the scale of the units involved. But suppose I told you that Granola Cereal A contains 31 milligrams more sodium than Granola Cereal B. Unless you know an awful lot about sodium (and the serving sizes for granola cereal), that statement is not going to be particularly informative. Or what if I gold you that my cousin Al earned $53,000 less this year than last year? Should we be worried about Al? Or is he a hedge fund manager for whom $53,000 is a rounding error in his annual compensation?
In both the sodium and the income examples, we're missing context. The easiest way to give meaning to these relative comparisons is by using percentages. It would mean something if I told you that Granola Bar A has 50% more sodium than Granola Bar B, or that Uncle Al's income fell 47% last year. Measuring change as a percentage gives us some sense of scale.

Chapter 3. Deceptive Description

To anyone who has ever contemplated dating, the phrase "he's got a great personality" usually sets off alarm bells, not because the description is necessarily wrong, but for what it may not reveal, such as the fact that the guy has a prison record or that his divorce is "not entirely final". We don't doubt that this guy has a great personality; we are wary that a true statement, the great personality, is being used to mask or obscure other information in a way that is seriously misleading (assuming that most of us would prefer not to date ex-felons who are still married). The statement is not a lie per se, meaning that is wouldn't get you convicted of perjury, but it still could be so inaccurate as to be untruthful.
And so it is with statistics. Although the field of statistics is rooted in mathematics, and mathematics is exact, the use of statistics to describe complex phenomena is not exact. That leaves plenty of room for shading the truth. Mark Twain famously remarked that there are three kinds of lies: lies, damned lies, and statistics. Most phenomena that we care about can be described in multiple ways. The description statistics that we choose to use (or not to use) will have a profound impact on the impression that we leave. Someone with nefarious motives can use perfectly good facts and figures to support entirely disputable or illegitimate conclusions.
We ought to begin with the crucial distinction between "precision" and "accuracy". These words are not interchangeable. Precision reflects the exactitude with which we can express something. In a description of the length of your commute." 41.6 miles' is more precise than "about 40 miles," which is more precise than "a long f---ing way." If you ask me how far it is to the nearest gas station, and I tell you that it's 1.265 miles to the east, that's a precise answer. Here is the problem: that answer may be entirely inaccurate if the gas station happens to be in the other direction. On the other hand, if I tell you, "Drive ten minutes or so until you see a hot dog stand. The gas station will be a couple hundred yards after that on the right. If you pass the Hooters, you've gone too far," my answer is less precise than "1.265 miles to the east" but significantly better because I am sending you in the direction of the gas station. Accuracy is a measure of whether a figure is broadly consistent with the truth --hence the danger of confusing precision with accuracy. If an answer is accurate, then more precision is usually better. But no amount of precision can make up for inaccuracy.
In fact, precision can mask inaccuracy by giving us a false sense of certainty, either inadvertently or quite deliberately.
I learned the important distinction between precision and accuracy.
Many of the Wall Street risk management models prior to the 2008 financial crisis were quite precise. The concept of "value at risk" allowed firms to quantify with precision the amount of the firm's capital that could be lost under different scenarios. The math was complex and arcane. The answers it produced were reassuringly precise. But the assumptions about what might happen to global markets that were embedded in the models were just plain wrong, making the conclusions wholly inaccurate in ways that destabilized not only Wall Street but the entire global economy.
Even the most precise and accurate descriptive statistics can suffer from a more fundamental problem: a lack of clarity over what exactly we are trying to define, describe or explain. Statistical arguments have much in common with bad marriages; the disputants often talk past one another.

Search This Blog

LookUp

Naked Statistics: Stripping The Dread From The Data

Comments

Post a Comment

Popular Posts

Why Nations Fail: The Virtuous Circle vs. The Vicious Circle

Factfulness: Ten Reasons We're Wrong About The World - And Why Things Are Better Than You Think