transpcr.gif (812 bytes) Accuracy

The median is more resistant to extreme, misleading data values so it would seem to be the clear choice. However, we also need to consider accuracy. Is the median or the mean more likely to be close to the true value?

To evaluate the relative accuracy of the median and the mean, let's consider how they do when we know the true center of the data. Suppose that the only possible scores are the whole numbers between 0 and 100.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100

The center of these 101 numbers, whether we use the median or the mean, is 50. What if we were to select five numbers randomly from this set of 101 and calculate the median and mean of those five numbers? Would the median or the mean be closer to what we know is the true value of 50? Suppose the five scores we selected randomly were

38 40 50 67 88

In this case, the median is 50, right on the true center, and the mean is 56.6, above the true center. So in this instance, the median would be the more accurate estimate of the true center. But we can't be sure whether this one case is itself typical or a quirk. The graph below will allow you to take many different random sets of five scores and determine whether the median or the mean is more accurate.

 
The graph to the right allows you to try many samples to determine whether the median or the mean is usually the more accurate estimate. The height of the bars indicate the proportion of times the median or the mean, respectively, was the more accurate estimate (i.e., closer to 50).
 
It is not that surprising that the mean is more often closer to the true center than the median. The median can sometime be inaccurate because it does not use very much of the information available in the data. While the mean is the average of all the data values, the median is the "average" of just the one or two observations in the very middle of the data. Thus, in situations where there are no extreme observations, the median is often less accurate than the mean.

File:
© 1999, Duxbury Press.