*Time for another piece from my in-the-making dissertation, this time on the blackboxing of statistical measurements. It took more time than I expected, mostly because I had to struggle with statistics :D. Comments by statisticians are mostly appreciated (in Swedish or English). Here we go!*

**The black boxes of quantification and statistics**

So far we have looked very directly at how blackboxing takes place, how it is communicating via interfaces, and how black boxes and interfaces are combined in epistemic assemblages. There is however, another riddle that has to be solved, namely that of quantification and statistics. Once more I shall give an example from the Sociology of Scientific Knowledge, to be more precise of one work of Donald MacKenzie. I will also consider Latour’s notion of centers of calculation, the history of statistics and its epistemic status, and further advance into some more recent research in the sociology of quantification.

**Pearson and Yule – A statistical controversy**

In his article *Statistical Theory and Social Interests* (1978), Donald MacKenzie analyzes a controversy, and an emerging breakthrough in statistical methods, that took place during the first one and a half decades of the 20th century, between Karl Pearson and Udny Yule, both regarded as two pioneering statisticians today.

The controversy between Pearson and Yule concerned how to measure association on a nominal scale level. Pearson had in 1905 suggested the tetrachoric coefficient as a solution to how to quantify nominal scales, something which Yule criticized openly during several years (1). MacKenzie’s elaboration on this controversy is interpreted through an analysis of their respective differences in social interests:

/…/ Pearson’s commitment to eugenics played a vital part in motivating his work in statistical theory. Pearson’s eugenically-oriented research programme was one in which the theories of regression, correlation and association played an important part /…/ Regression was originally a means of summing up how the expected characteristics of an offspring depended on those of its parents; the bivariate normal distribution was first constructed by [Francis] Galton in an investigation of the joint distribution of parental and offspring characteristics. (MacKenzie 1978: 53)

MacKenzie’s point is that advances in statistics, even though regarded to be esoteric and mathematically ‘disembodied’, are guided and influenced by a set of social and cognitive interests, that orient the goals and directions of what to develop and what to disregard of. The early 20th century statistics in Britain was thus, at least partially, influenced by a need within eugenics and population control. In Britain, at the time, eugenics and ‘national efficiency’ were regarded as legitimate political options, and were even discussed in government departments. Yule, on the contrary, had no affection for eugenics, and instead argued that heredity was a largely unimportant factor in comparison with environmental ones (MacKenzie 1978: 58-59).

What we have is thus a classical social explanation of how statistics develops in line with the needs defined by group (such as the eugenics movement) interests and larger social interests (for example state governance). What MacKenzie pays less attention to is what happens next:

Contemporary statistical opinion takes a pluralistic view of the measurement of association, denying that any one coefficient has unique validity /…/ Yule’s Q remains a popular coefficient, especially amongst sociologists. Pearson’s tetrachoric coefficient, on the other hand, has almost disappeared from use except in psychometric work. (MacKenzie 1978: 65)

I am not in the position to evaluate whether this is valid or not for statistics in general. What I on the other hand find necessary, is to think the dispersion, usage and effects of statistical methods within the terminology of blackboxing. Back in the early 20th century, many of the core statistical measurements today used for the social sciences were developed, for example the chi-square test, Pearson’s *r*, advances in correlation and regression by Galton, etc. (see Hacking 1990: 180-188).

Just like the case of the Michelson-Morley experiments, also now deprecated statistical methods, may very well come to reinforce or weaken what black boxes to open and which ones to leave closed. Statistical methods may be blackboxed, taken out of its context of discovery, and applied widely. Or, it may be broken, considered obsolete, or just veiled in historical darkness for other reasons, perhaps only to emerge in the detailed archives of the historian of science.

An example of a ‘successful’ blackboxing is the Pearson *r*. In a textbook on social scientific methods, which is written by Gothenburg researchers close to the SOM-institute, and taught in many social science classes locally, an interesting passage appears:

The calculation of Pearson’s emph{r} is complicated to say the least /…/ Even though it can be useful to on some occasion make the calculations yourself /…/ – not long ago the researchers had to employ assistants to be able to do these calculations at all /…/ it is of course [today] the computers that calculate Pearson’s r for us.

(Esaiasson et. al. 2002: 392)

The Pearson product-moment correlation coefficient (*r*) demands a time consuming work and plenty of mathematical skills to calculate manually. Then the first moment of delegation meant to involve more humans to do this work (assistants). And finally, we today have computers doing the same work in milliseconds. A statistician, or social scientist for that matter, must of course be able to master the usage and interpretation of the output of the computer, but in routine work, he or she is able to forget the assistants and hard work it once took to use this statistical tool.

This it is possible to conclude that the statistical measurements, developed in a context of the British eugenics movement, can dislodge from its context of discovery through blackboxing, and find its way into the software packages that are used today in statistical calculation, as standardized measurements and tests to evaluate the quality of survey data. Now, this de-contextualization not only means that it is possible to forget the tedious work that had to be done before computers. It also means that it would be absurd to accuse someone calculating the Pearson’s *r* for being a follower of eugenics, just as it is equally absurd to accuse someone of militarism for using the internet, just because the internet was originally constructed as a military computer network. For statistics, Latour’s actualist principle still applies: The fate of facts and machines are in later users hands, and their qualities are a consequence of collective action (Latour 1987: 29, see also the above sections on Latour).

But not only are statistics blackboxed as they are assembled as research practices. They also function as interfaces that are able to translate research results into comprehensible facts. The time is ripe to go further along this line, to investigate how especially the modern state has requested such scientific information.

*Footnotes:*

1. The controversy is much more elaborate than this. To save space, I refer to MacKenzie’s article to be read in its entirety.

## 1 reaktion till “How to study the social sciences, part VI”