How to Lie and Cheat with Statistics

 Ok, this is what you have been waiting for. How can you lie with statistics? Actually, the purpose of this page is NOT to teach you how to lie and cheat with statistics. Rather, I hope you will learn how it is possible to be misled and how to spot "statistical abuse." You can find poor use of statistics everywhere: magazines, newspapers, polls, TV, even research papers. I do not want to hear of any of you readers using these poor methods. Contents of Statistics Pages

## The Average Switcheroo

Which average (mean, median, or mode) should be used to report the results of an experiment or survey? All three types of averages describe the data truthfully. However, depending on the data, the mean, median and mode can be very different from one another.

Here is an example: suppose you asked 7 people how much money they brought to school. Here are the answers:

 Person Money John 2 Ann 3 Bob 1 Mary 10 Sue 5 Carol 2 Ken 999
What is the mean, median and mode of the amount of money brought to school?

Mean

Median

Mode

If you use the mean as the average, then it will look like people bring a lot of money to school. However, if you use the median and mode, it will look like people do not bring much money to school. Each way to describe the numbers is correct. However, because "Ken" brought \$999 to school, the mean is much different than the median and mode. Therefore, when you hear someone say, "The average...", make sure you know which type of average (mean? median? mode?) they are talking about.

## The Meaningless Mean

QUESTION: When is a mean meaningless?

ANSWER: When a mean is created from ordinal data.

You will find this trick in many places...unfortunately this error can sometimes be found in research papers.

If you remember from the page on scales, ordinal data can be ranked, but nothing can be said about differences between numbers. Let's use the hot pepper example again. A hot pepper is scored as a "1", a hotter pepper is scored as a "2" and the hottest pepper is scored as a "3." Let's say you wanted to test the hotness of these 3 peppers and gave them to people to taste. Here are the results:

 Person Pepper A Pepper B Pepper C John 2 1 3 Mary 1 2 3 Rob 2 1 3 Sue 1 2 3 Ann 2 1 3

What can be said and what cannot be said about the taste of these peppers? First, all of the people thought that Pepper C was the hottest. It also looks likes Peppers A and B tasted about the same. If you took the means of these numbers you would get:

Mean hotness of Pepper A = 1.6
Mean hotness of Pepper B = 1.4
Mean hotness of Pepper C = 3.0

But is this fair? Can you say that Pepper C was about twice as hot as Peppers A and B? Probably not. Here's why. What if Pepper A and Pepper B were not very hot at all, but Pepper C was so hot that you had to drink many glasses of water to cool the taste. The numerical differences between the taste of these peppers has no meaning. The ranking of the peppers is fine...Pepper C IS hotter than Pepper A and B, but these data give NO indication of HOW MUCH hotter. Therefore, be careful when you read about differences between numbers that come from rankings.

Actually, it is not even correct to create a mean from these data. The mean hotness of one hot pepper (score = 1) and one of the hottest peppers (score = 3) does NOT necessarily give you the score of a hotter pepper (score = 2). It may be that the hottest pepper is 100 times as hot as the least hot pepper. These data just do NOT give you this information. They only give you rankings.

## The Sampling Trick

It is essential that data come from a random sample of the population. If it doesn't, then the results of the experiment or survey may not be an accurate reflection of the population. This happened in the early 1900s when polls were taken during the U.S. presidential campaign of Franklin D. Roosevelt (FDR). The polls surveyed only those people with telephones. The pollsters predicted one candidate would win, but FDR actually won the real election. The poll did NOT accurately reflect all of the voters because the opinions of only one part of the population (wealthy people with telephones) were taken into account.

The size of the sample is another important consideration. If you flipped a coin 5 times and it came up "heads" 4 times, would you be correct to say that the coin will land on heads 80% of the time? It did for your sample of 5 flips. But what would happen if you flipped the coin 100 times or 1000 times. Would heads still come up 80% of the time? The number of people or number of trials in an experiment that are needed to give you an accurate estimate of the population is dependent on several variables. One important consideration is how much variability there is in the response. If a response has a high degree of variability, then a larger sample will be needed. In general, the larger the sample size, the better the estimation.

## Games with Graphics

Misusing and abusing graphics are easy ways to mislead people. People like to see graphs for a quick way to evaluate a set of numbers. But BEWARE! Make sure you are not fooled. Let's use pumpkins grown in the gardens of Mary, Joe and Ann. Here is the first graph:

 This graph does not say very much because there is no scale on the y-axis. Does the graph represent the weight, volume, width or height of the pumpkins? It does not say.

Here is a graph that is much better:

 Now we know that the graph refers to the weight of the pumpkins and we know how much each pumpkin weighs because the numbers are given. This is a fair graph.

What if you wanted to convince people that Ann's pumpkin was bigger than Mary's and Joe's pumpkin. Look at this graph:

 The numbers are the same, but the y-axis has been changed. Now it appears that Ann's pumpkin is much bigger than the other two.

What if you wanted to convince people that all the pumpkins were about the same size. Look at this graph:

 The numbers are the same, but the y-axis has been changed again. This time the y-axis has expanded. Now it appears that there there is only a small difference in weight.

Often a picture is used to represent data. Here is a fair way to show the difference in the weights of the pumpkins using a picture:

 In this graphic, each pumpkin represents 10 lbs. It clearly and fairly shows the difference in the weights of pumpkins from the different gardens.

However, let's try to show that Ann's pumpkin is much bigger than the rest:

 This graphic distorts the data. To show the differences in the weights, this picture changes the height of each pumpkin to represent pumpkin weight. Joe's pumpkin (20 lbs) is twice as high as Mary's (10 lbs.); Ann's (40 lbs.) is twice as high as Joe's (20 lbs.) and four times as high as Mary's (10 lbs). Is this fair? I think not! The reason is because as the height of the pumpkin is increased, the WIDTH of the pumpkin increased. Therefore, while the heights are in proportion the AREAS of the pumpkins are not. Remember, the formulas to determine area: Area of a rectangle = Height X Width Area of a circle = pr2 So this picture makes it look like Ann's pumpkin is much larger than Mary's and Joe's. I also used different sized letters for the different pumpkin weights to give the impression that Ann's pumpkin was larger.

## Meaningless Graphics

Newspapers and magazines like to use colorful pictures to represent public opinion and survey responses. However, often times the pictures are too simple to give meaningful information. Take this example:

 This map shows how people in different states of the US like pizza. (I just made up these data). The code for the state color is: Red States = People Love Pizza Yellow States = People Like Pizza Purple States = People Hate Pizza

That's all the information we have. The map really doesn't say very much. We don't know how it was determined that people like pizza...were people asked if they liked pizza? Were people asked how much pizza they ate in a week? a month? a year? Was the number of pizzas purchased at stores in different states counted? Was the number of pizza restaurants in different states counted?

We also do not know if there are any real differences between how much people like pizza in the different states. How much do people love pizza in California? What is the difference between how much people love pizza in Utah compared to how much they like pizza in Nevada? There are no scales or measurements to indicate any of this information. Although this type of graphic gives almost no information, it is used frequently in many popular magazines.

 For more ways to misuse statistics, there are two interesting books: Darrell Huff, How to Lie with Statistics, W.W. Norton & Co., New York, 1954 (reissued in 1982 and 1993). Cooper B. Holmes, The Honest Truth About Lying With Statistics, Charles C. Thomas, Springfield, 1990. For general information, see: