As scientists, we are supposed to be objective and unbiased. We are trained to use sound scientific methods and experimental design to let the data speak for itself. By doing this, we remove our preconceptions and biases from the equation, because as human beings, we are both subjective and biased. Early this summer, there was quite a buzz about an article published in PLoS Biology (1) that debunked a paper (and prize winning book) by Stephen Jay Gould (1941–2002) that many of us probably read at some point in our academic training as scientists.
Stephen Gould was a well-known evolutionary biologist, paleontologist and science historian who (along with Niles Eldredge) is known for the theory of punctuated equilibrium. He was a prolific writer and his popular science essays and best selling books have been credited for increasing both public interest and understanding of science. In his paper, “Morteon’s rank of races by cranial capacity: unconscious manipulation of data may be a scientific norm” (2) and the following book, The Mismeasure of Man (3), Stephen Gould argued a case against Samuel Morton (1799–1851) to support his argument that “unconscious manipulation of data may be a scientific norm” because “scientist are human beings rooted in cultural contexts, not automatons directed toward external truth.”
Morton was a 19th century physician and physical anthropologist who was famous for his detailed measurements of nearly 1,000 human skulls from all over the world. At the time he took his measurements, Morton focused on cranium capacity, the skeletal equivalent of brain size, in hopes of determining if the different human populations were one species resulting from one (monogenesis) event or separate species arising from several (polygenesis) events. Although this question seems archaic and fraught with bigotry now, it was a major debate during the pre-Darwinian era of science in which Morton lived. In fact, Morton’s approach of objectively gathering data by measuring large numbers of specimens was groundbreaking in his day. Morton’s results ranked the populations (cranium size) in the order of Caucasians/”Malays”/blacks/”Mongolians”/Native Americans.
Gould took issue with what he inferred to be Morton’s equation of cranial capacity and intelligence, and he used his case study of Morton’s work to support his hypothesis that unconscious “finagling” or doctoring of data is common and unavoidable in science, a “profession that awards status and power for clean and unambiguous discovery”. In both his Science paper (2) and his book (3), Gould contended that Morton held priori bias toward elevating Caucasians above the other populations, and towards this end Gould charged that Morton had selectively reported data, manipulated sample composition, mismeasured skulls and made and ignored analytical errors all so that the results would support his (Morton’s) views on intelligence (i.e., cranial capacity) and differences in human populations. In fact, according to Gould’s analysis, there were only trivial differences between the populations Morton had measured. Virtually overnight, Samuel Morton became the poster child of scientific misconduct and an often cited example of how scientists are vulnerable to their own biases.
And so the story went for 30 years
The story would have continued thus were it not for a group of anthropologists who set about reassessing Morton’s results and Gould’s analysis of them. The team located and remeasured almost half of the skulls Morton originally measured. This is something Gould never did. His arguments were based solely on reanalyzing Morton’s measurements, which this team did as well. Then they turned their attention to Gould’s analysis and subsequent arguments, and that is where things get interesting.
Means and Bias
Gould claimed in his Science paper that Morton had selectively reported his data. ‘‘It is intriguing that Morton often reported Caucasian means by subsamples, which permitted him to assert the superiority of Teutons and Anglo-Saxons. But he never broke down the Indian mean.…Thus, the fact that some Indian subsamples (Iroquois at 91.5 in3, N = 4) exceeded the mean for Americans of Anglo-Saxon stock remained hidden in his raw data…” (2). Unfortunately this often quoted claim is false. Morton did report “Indian” subsample means; he did so at least 12 times in Crania Americana (4), the publication Gould was referring to. These subsample means did include the Iroquois numbers. Gould also claimed that Morton’s Native American average capacity was artificially decreased by using a straight mean (the average of each specimen in the entire sample) rather than a grouped mean (calculating the average of each subpopulation and then taking the mean of those means). This, Gould contended, would allow the numbers to be skewed by the differences in sample sizes of “large headed” versus “small headed” populations. However, if Morton had done his calculations the way Gould contended he should have, it would have resulted in a slight decrease in the Native American average (79.9in3 vs. 80.2in3).
Leaving aside which calculation method would be the best method to use, Morton clearly did not select his method to skew results toward his supposed bias. Yet when Gould reanalyzed Morton’s numbers, he calculated a higher average for the Native American skulls (83.8in3 vs. 79.9in3). How did he get this number? Well, Gould only used population samples with an n greater than 4, and then erroneously excluded 6 crania, all with small cranial capacities. Further, Gould only included skulls that Morton had measured both with mustard seed (his early measuring method) and with lead shot (his later method, which he adopted to eliminate the variation using seed might introduce). Interestingly, the authors point out, Gould did not use this same criteria when reanalyzing other populations.
Seeds and Shot
In his book, Gould speculated how Morton may have biased his seed measurements by loosely or tightly packing the seed into the skulls (3). Gould based his claims that Morton had mismeasured on his comparisons of Morton’s seed-based and lead shot-based measurements. In his reconstruction, Gould claimed that the average capacity for different groups had different increases when going from seed- to shot-based measurements, with the Caucasian skulls seeing the smallest increase. This led him to suspect a problem with the original seed-based measurements, and was his evidence for his famous “plausible scenario”.
The problem with Gould’s approach was that Morton reported individual seed-based measurements only in his volume Crania Americana (4), and these were only for Native American crania. Gould reported an average increase for these crania of 2.2in3. When the authors looked at the numbers and not just the average they found that there were increases and decreases, and these changes did not appear to be patterned by group; one skull in a subpopulation increased by 12in3 and another in the same subpopulation decreased by 5.5in3. This casts doubt on the idea that the mismeasurements were a result of bias. Since the only individual seed-based measurements Morton reported were for the Native American subpopulations, how did Gould arrive at his claims about the changes in other populations? Well, these authors contend, he must have done so by “guessing” which skulls had been included.
Morton himself acknowledged the likelihood of errors in the seed-based measurements. Some of the measurements in Crania Americana were done by an assistant, and Morton later found that this person had made errors. He stated as much in his publication Catalogue of skulls of man and the inferior animals, Third Edition (5).
Typo or Bias?
In the final table of Morton’s Crania Americana, the Native American mean cranial capacity was erroneously reported as 82.4in3 rather than 80.2in3. In this error, Gould saw Morton’s deliberate attempt to maintain his scale of Caucasion/Native American/Blacks. However, the correct value is given in the text, so the possibility of a typographical error in the table seems likely. In addition, the authors found reports of copies of Crania Americana inscribed by Morton with the number corrected and later reproductions of the table also contain the corrected value. This suggests that the error was recognized and corrected. Finally, the overall order of mean crania capacity didn’t change using either number; effectively removing Morton’s supposed motivation for allowing the error to go uncorrected.
Of all the accusations Gould leveled against Morton, the authors of this study found only two to be substantiated. First, there were several errors in the summary table of Morton’s final catalog published in 1849. However, counter to Gould’s arguments, the authors found that had Morton not made these errors, the numbers would have actually supported his presumed bias better than the published numbers did.
Secondly, Morton undoubtedly believed in the idea of different races. This belief is clear in the opening pages of his Crania Americana, and Morton made no effort to hide them. Yet despite his bias, the authors found that Morton’s measurements are reliable and fully reported.
Ironically, it seems that it was Gould’s analysis that was flawed and influenced by his biases. Where the results reported in this study falsify Gould’s hypothesis that Morton manipulated his data, they also lend support his greater hypothesis that “Unconscious or dimly perceived finagling is probably endemic in science”, as his analysis of Morton is a strong example of bias influencing results.
The Warning and the Hope
When I read this paper, my first reaction was to shake my head and chuckle at the irony. My second reaction was sadness. For thirty years science has held up Gould’s analysis of Morton as a warning to new scientists. As this study’s authors point out, we now know that most variation in human populations is largely within rather than between subpopulations (6,7) and that cranial capacity variation is mostly a factor of climate (8). We found Morton’s views on race repugnant. We wanted Gould to be right and so we didn’t bother to evaluate the arguments critically. As scientists we failed; we liked the results so we didn’t bother to question them.
Some scientist have railed against Gould for “letting us down”, but were we not all capable of looking at the numbers and checking the facts? As an undergraduate assigned to read Gould’s Science paper as preparation for a discussion on ethics and self awareness in science, I couldn’t have remeasured Morton’s skulls, but I could have checked Gould’s calculations. I could have found Morton’s original results and checked Gould’s claims. I didn’t have to accept the results of the paper just because my professor assigned it. I was being trained to think critically, and I didn’t.
My third reaction was relief. Despite the occurrences of scientific misconduct and fraud that we hear about, the final message of this paper is that the scientific method, when properly applied, is sound. Morton, despite his biases, used methods that kept his biases from influencing his data. As scientists we can not be “automatons” as Gould pointed out, but we can teach and use sound scientific methods that will shield the outcome of our work from our inevitable biases. Finally, we owe it to ourselves and our colleagues to turn an equally critical eye to results we like as to those we don’t.
- Lewis JE, Degusta D, Meyer MR, Monge JM, Mann AE, & Holloway RL (2011). The mismeasure of science: Stephen Jay Gould versus Samuel George Morton on skulls and bias. PLoS biology, 9 (6) PMID: 21666803
- Gould, S.J. (1978) Morton’s ranking of races by cranial capacity: unconscious manipulation of data may be a scientific norm. Science 200, 503–509.
- Gould, S.J. (1981) The mismeasure of man. New York: W. W. Norton and Company.
- Morton, S.G. (1839) Crania Americana; or, a comparative view of the skulls of various aboriginal nations of North and South America: to which is prefixed an essay on the varieties of the human species. Philadelphia: J. Dobson.
- Morton, S.G. (1849) Catalogue of skulls of man and the inferior animals, Third Edition. Philadelphia: Merrihew and Thomson Printers.
- Brace, C.L. (2005) ‘‘Race’’ is a four-letter word: the genesis of the concept. New York: Oxford University Press.
- Cartmill, M. (1998) The status of the race concept in physical anthropology. Am Anthropol 100, 651–660.
- Beals, K.L., Smith. C.L., Dodd, S.M. (1984) Brain size, cranial morphology, climate, and time machines. Curr Anthropol 25, 301–330.