[This article originally appeared in the
May 2001
issue of Northwest Runner
magazine.]
The premise of this column is that one's training and eating habits should be guided in part by a knowledge of exercise science research. Sounds pretty straightforward, right? Yet finding out "what the research says" is often anything but simple. For every well-designed study, there's at least one stinker . . . and for every well-informed coach, there are several whose grasp of the scientific literature is rather tenuous. Separating the good information from the not-so-good can thus be a daunting task.
Daunting, yes, but not impossible. When someone claims that a particular idea is "supported by research," you don't have to simply accept the claim at face value. Instead, you can evaluate the validity of the claim by asking pointed questions such as those listed below. These questions are accompanied by examples of how scientific misunderstandings can and do occur. The point of the examples is not to single anyone out as being a bad scientist or a bad coach, but rather to illustrate how a healthy dose of skepticism can be useful in sorting through training-related information.
1. How many people were studied? Small studies are problematic for two reasons. First, it can be hard to spot trends in the data when you only have data on a few individuals. Second, when there does appear to be a clear pattern, you can't be sure that this pattern will hold up in a larger group of people. The most extreme cases of the small-size problem are investigations in which only one person was tested. While reports such as "Running with Jim Ryun: a five-year study" (Daniels, Physician and Sportsmedicine 2(9): 63-7, 1974) and "Following Steve Scott: physiological changes accompanying training" (Conley et al., Physician and Sportsmedicine 12(1): 103-7, 1984) are interesting case studies, their focus on isolated individuals precludes them from being particularly useful to coaches.
2. Were the people studied similar to the people you care about? The results of a study may vary according to the training status, dietary habits, age, gender, etc. of the people being studied. Therefore, there are limits to what you can learn from a study of people who are quite different from the athletes you advise. An example of this limitation is provided by the not-so-classic paper "Effects of sexual intercourse on maximal aerobic power, oxygen pulse, and double product in male sedentary subjects" (Boone & Gilmore, Journal of Sports Medicine and Physical Fitness 35: 214-7, 1995). This study showed that having sex the night before a treadmill test is not detrimental to the running performance of couch potatoes. Here's the problem: nobody cares whether sex affects exercise performance in people who don't normally exercise. What we care about is whether sex affects the performance of trained athletes, i.e., people who actually run races (and who thus need to know whether pre-race love-making is a bad idea). Boone & Gilmore made an earnest attempt to address an interesting question; they just studied the wrong people.
3. Were the people studied divided into well-matched groups? Consider the following: a cross-country running team is divided into three groups. Endurance training is assigned to one group, interval training to the second group, and a combination of endurance work and intervals to the third group. The "combination" runners are slower than the other runners, but they show the greatest improvement during the 12-week training period. Does this mean that combination training is best? Or does it simply mean that runners who are initially out of shape tend to improve more than runners who are already fairly fit? We have no way of knowing, and consequently this study -- an obscure University of Washington master's thesis from the '60s -- deserves the obscurity that it has achieved. In interventional studies such as this one, it is usually important to ensure that the groups being compared to each other are as similar as possible at the outset of the investigation.
4. Did groups that were intended to be different actually differ from each other? The importance of dividing subjects into roughly equivalent groups was just noted. Once these groups have been established, however, they are usually treated differently; for example, one group might receive an experimental drug while another group gets a placebo. But what if both groups received the drug, or both received the placebo? Believe it or not, this sort of thing does happen. In a study of stretching, warming up, cooling down, and injury prevention (van Mechelen et al., American Journal of Sports Medicine 21: 711-9, 1993), an "intervention" group was given information on proper stretching/warm-up/cool-down techniques, whereas the control group was not. Nevertheless, members of the control group wound up stretching, warming up, and cooling down about as often as the members of the intervention group. Not surprisingly, injury rates in the two groups were about the same. While this study wasn't as bad as I've made it out to be, it obviously sheds little light on the issue of whether stretching, warming up, and cooling down can help prevent injuries.
5. Did the experimenters make the measurements needed to test their hypothesis?
The paper of St. Clair Gibson et al. (American Journal of Physiology 281: R187-96, 2001) hypothesizes that, in long-distance cycling events, cyclists use less than 20% of their quadriceps muscle fibers at a time. Accordingly, their electromyography (EMG) measurements show that the electrical activity in the quadriceps during a 100-kilometer time trial was only about 10-20% of the activity generated by the quads during a brief all-out knee extension test. Interesting. But what do these whole-muscle measurements tell us about the individual muscle fibers? Not a whole heck of a lot. Perhaps the time-trialing cyclists were only using a small subset of their fibers, as St. Clair Gibson et al. contended. Or perhaps nearly all of the fibers were active, with each fiber simply receiving a submaximal level of nerve input (for example, 10 impulses per second as compared to, say, 60 per second during the knee extension test). Unfortunately, the measurements that were made do not allow us to distinguish between these possibilities.
The work of Chapman et al. (Journal of Applied Physiology 85: 1448-56, 1998), on the other hand, illustrates how a complete set of careful measurements can lead to important new insights. This study examined the changes in runners' blood that occur when they live at high altitude. Living at altitude generally causes an increase in one's red blood cell count and thus one's hematocrit -- the fraction of the blood that is actually red blood cells, as opposed to plasma. An increased hematocrit is therefore often assumed to be an indication of an increased red-blood-cell count. As expected, the runners studied by Chapman et al. responded to a high-altitude sojourn by increasing their hematocrit. However, only some of the runners got faster as a result of their excursion; i.e., the changes in hematocrit per se did not account for the changes in performance. Fortunately, Chapman et al. had very detailed data on each athlete's blood and were able to make some sense of this person-to-person variability. They found that, among the runners whose times improved after going to altitude, the increase in hematocrit was due in part to an increased number of red blood cells. However, among the runners who didn't improve, the increase in hematocrit was generally due to dehydration. (In other words, these runners' blood was thicker not because of an increased number of red blood cells but rather because of a decreased plasma volume -- an undesirable condition that would not be expected to aid performance.) Thus Chapman et al. were able to account for differences in athlete performances because they had painstakingly collected a huge set of relevant data. This would not have been possible if they had only measured hematocrit.