How a Mathematician Can Help a Sociologist
Samuel A. Stouffer
University of Chicago
The question asked in the title of this paper would be answered variously by sociologists. Possibly, some sociologists would answer "None." Others, like Professor Lundberg, who are dreaming dreams of the day when the laws of sociology will resemble the laws of mechanics, probably would be far out on the right-hand side of a continuum. I don't know where the opinion expressed in the present paper would fall. Probably, like Dr. Lundberg's, too far out from the average to be representative of the opinion of sociologists. If so, we may trust that other sociologists will supply a correction.
We can best make the discussion concrete by saying a few preliminary words about what sociologists are doing.
First, a few of them are constructing theories of society on the grand style. They do not verify these theories. Rather, they illustrate them, sometimes copiously. Sometimes, they become so enthusiastic about the plausibility of their theories that they think illustration constitutes proof.
Second, most sociologists do a good deal of quasi-historical research, usually on contemporary problems. This work, often unrelated to large frames of reference, is pooh-poohed by some as mere empiricism. This is quite unfair, of course, as the dullest fact-finder usually has some advance idea about what he is looking for. This work may or may not use statistics. Examples of the kind of questions we explore are the following:
Are the class differences in the birth rate tending to narrow over time?
Is the relationship between the sex ratio and the percentage of women married higher in the cities or in the country?
Is the crime rate, among foreign born, holding age
(57) constant, higher or lower than the crime rate among the native born?
What is the rate of assimilation of various foreign born groups in the United States?
What is the relative importance of push and pull in migration?
What are the time lags between the inception of various types of inventions and their quantity production on a commercial basis?
Is the only child a better or a worse risk in marriage than the middle child?
Statistical work on such subjects involves several steps:
(a) Framing the question in such a way that the data, if found, will answer it.
(b) Collecting the data. (This may involve, also, designing an instrument of measurement, such as a test.)
(c) Evaluating the reliability of the data and making adjustments for bad data.
(d) Calculating summary measures.
(e) Making tests of significance of the discrepancies between expectation and observation.
The last step (e) is probably the least time consuming of any. It may involve only five minutes out of a two years' study. And, often indeed, it is the only point at which the contribution of the mathematical statistician, or mathematician, obviously appears to enter. Why then, some sociology students ask, need we learn any mathematics? We don't have to take courses in mechanical engineering in order to use a calculating machine; why study mathematics when we have a table of Chi-square and P?
This answer is, I think, simple. It is true that Step (e) in the operations may take only a few minutes in a two years' study. But it is also true that the value of the two years' work depends, in the first place, on whether the study was set up in such a way that an appropriate test of significance would be really testing what the researcher thought he was testing. Knowing where to find Chi-square and P will not help much at the crucial planning stage. What he needs to know, and needs to know in advance, is how to design his problem and control disturbing factors such that the particular test he will use at the end is appropriate and decisive. The more he knows of the tricks of statistical procedure, the more he knows about pitfalls among the mathematical assumptions on which various textbook formulas rest, the better he can exercise this foresight. All of which is to say that, in the shadows behind a
(58) successful study, at the beginning as well as at the end, stands the mathematical statistician.
In this prologue, we have implied that sociology consists, first, of unproved and, probably as now stated, unprovable theories in the grand style; and second, of a mass of more or less unrelated research, frequently statistical, which has little direct connection with the larger theories. Thus far, the influence of the mathematician has been not on the grand theorists, but on the so-called empiricists. There is a hope--though as yet with rather a slender basis for realization--that some day the more ambitious theory can be better stated in a mathematical language, using parameters permitting measurement and data permitting verification. Of this I shall speak later.
Now, more specifically, how can the mathematician and the mathematical statistician help us? In two ways, I think. First, as a teacher. Second, as an inventor.
AS A TEACHER
With respect to teaching, I have in mind, first, classroom teaching in mathematics and mathematical statistics which will be useful to the student of sociology, and second, the writing of lucid expositions of mathematical statistics.
What mathematics a young sociologist who wants to make his name in research should know is debatable. It depends so much on his interests and enthusiasms. He can not be hurt by too much mathematics (unless it is woodenly taught), but he may rightly feel, if he is interested in non-quantitative work, that he would rather spend time on courses like history, philosophy, or psychiatry. However, even if he expects to work little with quantitative data, he might be wise to inoculate himself now against a feeling of inferiority a decade from now when he has to confess he can not read what his colleagues write. One or two college courses in mathematics should be a minimum, and preferably more, together with some systematic training n statistics. If, however, our sociology student wants to use our statistical tools, and likes them, he should acquire, sometime before he gets his Ph.D., almost the equivalent of an undergraduate major sequence in mathematics. Algebra, including matrix theory; solid analytical geometry, with some experience in n-dimensions; and calculus, with special attention to some of the integrals frequently recurring in statistics, should be mastered. Such a program, modest as it is, sounds today almost like a counsel of despair to a student, and calls for two or
(59) more years, beyond the average, in preparation for the Ph.D. (Parenthetically, I should like to indicate that there are opportunities for a career in sociology for the man who has a Ph.D. in mathematics, provided he also is willing to spend several years of study getting acquainted with the problems, theories, and above all, the tricky character of the basic data.) There is little point in being more specific about the particular courses in mathematics which a good research worker in sociology needs. I am not sure, in fact, that it makes too much difference what courses in mathematics he takes. The most important thing he gets is drill, drill, and more drill, in a variety of mathematical processes. I share the prejudice against "capsule courses" in "Calculus for Statisticians," because I believe that the big aim is ease and confidence in manipulating symbols and not the general knowledge of what, say, the definite integral is all about.
As for the teaching of statistics, there is something to say in favor of the observation that the worst person to introduce statistics to the young sociologist is the mathematician. The teacher of sociology, who knows statistics n a sociological setting, can show, by multitudes of examples, the needs for various techniques, and the triumphs and failures in using
them, and thereby in the first statistics course supply a momentum which will make the student eager to pay the necessary price to acquire the kind of mastery which only a background of mathematics and systematic courses in mathematical statistics can give. Thereupon, the courses in mathematical statistics will mean something to him, and he won't be like the student who could give a perfect derivation of Sterling's theorem for an approximation to large factorials, but did not know what a frequency distribution is. (Such a student is not a myth. He actually existed--so swears my colleague, Professor Yntema.) Parallel with courses taught by mathematicians in mathematical statistics, the sociology student should take courses in sociology, economics and psychology which are applying the tools to concrete problems in a variety of situations and which emphasize the labors of getting decent data.
The other aspect of the teaching function of the mathematician, or perhaps, I should say the mathematical statistician, is to write clear and understandable expositions of processes developed by themselves and others. Excellent examples are the paper on the theory of errors by Deming and Birge, in the Review of Modern Physics, and Snedecor's interpretations of the analysis of variance. A model in many respects is Fry's Probability and Its Engineering Uses. Somebody ought to do
(60) for sociologists what Shewhart has done for the manufacturer. Such writing requires much skill and devotion and the rewards by way of prestige may be small as compared with those which the same authors can get with their wholly original work. Some of us have followed with interest the likelihood theories of Neyman and of Egon Pearson and would welcome a systematic effort to bring together, in one place, in language as simple as possible, without loss of rigor, the various threads of their work. The problem of representative sampling, involving a finite universe, to which Carver, Neyman, and others have made such valuable contributions, needs to be reviewed as a whole. Although Yule probably has presented the most useful exposition of the treatment of qualitative data--vital in sociological research--a further synthesis of work already done is needed.
The aim of the teaching function is to acquaint the student with existing tools and teach him how to use them. If he finds that a particular problem requires new tools, he may then be able to make them for himself, or, at least, to ask those who are mathematical statisticians intelligent questions in language they can understand. If he is vague about what he wants, the mathematician may, unless he takes much time to analyze the problem, give him very bad advice, indeed. I know, because my own vagueness in asking a question once led the man I consulted to jump to a wrong conclusion, with the result that three months of work were eventually wasted.
And this leads to the second way in which the mathematician--and here I am referring mainly to the mathematical statistician--can help the sociologist, namely, as an inventor.
AS AN INVENTOR
Let us consider invention, first, with respect to statistical theory which may be useful in sociology as elsewhere, and second, with respect to the development of quantitative rational sociological theory.
With respect to statistics, I think that our major need in sociology is not so much for new and complicated statistical tools, as for greater simplicity in setting up our problems. Instead of dealing with a lot of variables at once, we need to eliminate most of them experimentally, rather than mathematically. Partial correlation can be the product of laziness. The possibilities of carrying out a truly experimental sociology have hardly been explored--yet the public schools, for example, provide a laboratory where it should be possible to examine the
(61) effects of some rather basic sociological hypotheses with quite rigorous controls. And even when we cannot actually set up an experiment, we can make much more than we have out of the method of multiple classification into sub-groups. Instead of using census data in the raw, we can get from the census special tabulations broken down a sufficient number of times to give us at least a few ultimate small groups in which a relationship between X and Y can be studied, uncontaminated by other factors. Likewise, collections like the Gallup and Fortune polls provide a promising source of this kind of analysis. Once the data are sufficiently broken down, into a multitude of small groups, conventional statistical procedures are usually applicable, generally small sample theory. But more study is perhaps needed of methods of generalizing the information from combinations and patterns of such small samples. The modifications, introduced by Yates, Snedecor, and others, in the analysis of variance to deal with situations where the frequencies in sub-groups vary are important and may require further refinement by mathematical statisticians.
When the variables become much too numerous to handle by sub-classification, other methods of combination are needed. A most promising method of reducing the number of variables is factor analysis, which is essentially an empirical method of typological classification. Data can be reduced to types, better measurements of each type can be devised, and eventually the relationships between these types and some dependent variable economically studied. There is still much work to be done by the mathematical statistician in this controversial field of factor analysis. In the case of Thurstone's approach, for example, the standard error of the factor loadings still is not known. Whether or not he can say that a given set of factors is related to some antecedent general factor depends on an interpretation of the significance of the factor loadings.
A special complication enters into some sociological and economic data which is not likely to be present in biology or psychology. That is the fact that the data are sometimes contiguous in space and time. We observe that in periods of business depression the marriage rate goes down and in periods of prosperity it goes up. Suppose we have annual data for 50 years. How many independent observations do we have? Now it won't do to dismiss the problem by saying the one should run a polynomial through the data and correlate the deviations, or that one should correlate the n'th differences. That changes the problem. Do we only have as many observations as we have cycles? Or do we have something better than that?
Analogous problems arise when data are geographically contiguous in space. Here is an opportunity for further exploration of least square and sampling theory. Possibly, the mathematics may become too complicated to be of much use. Nevertheless, it will be valuable to get a clearer idea of the direction in which the modifications lead.
(62) There doubtless are many new problems in representative sampling in which the mathematical statistician will be able to help. Particularly promising is such work as the introduction of cost as an explicit variable for determining the optimum relative size of n! and n, where n is the size of a sample chosen for superficial study and n! a sub-sample chosen for more intensive study.
These are some illustrations of the way in which the mathematical statistician as an inventor can help the sociologist. A final brief word will be said about the possible contributions to quantitative rational theory in sociology, as distinguished from statistics.
Although, as was said earlier, quantitative sociological theory is still more of a hope than a realization, there already exists, mostly on the periphery of sociology, quite a respectable and interesting body of examples. The work in mathematical economics has a close relationship to sociology--particularly, in its contribution to the study of standards of living. In demography, there has been a good deal of sophisticated mathematical reasoning, especially by actuaries, which is important to sociology. The mathematical studies of men like Lotka and Reed have made direct and significant contributions to sociology. Attempts have been made by Pearson, Hotelling, and others to introduce quantitative rational theory into the study of migration. In the field of psychology, the rational theory of learning developed by Thurstone, Gullicksen, Wolfle, and others has direct implications for social psychology and perhaps also for analyses of social groups and social movements. Rashevsky recently has published three brilliant and suggestive papers outlining a mathematical theory for the formation of social classes and for an explanation of rural-urban migration. (Characteristically, none of these papers appeared in sociological journals, for very few sociologists could read them.) The forthcoming volume by Stuart Dodd seeks to substitute a mathematical frame of reference for conventional frames of reference, encompassing the whole field of sociology.
These tendencies are very exciting to some of us in sociology. If we keep our feet on the ground and put in the countless man-hours of work to develop measurements of some of
(63) the parameters postulated in such equations, we may be able to hand on to our grandchildren a legacy of great riches. But caution is needed lest we become too enthusiastic. Grandiose theories of disease through the centuries have fallen by the wayside and among the casualties have been efforts to approach problems of medicine too easily by drawing crude analogies, sometimes mathematical, from mechanics. A vast amount of non-quantitative, classificatory, and descriptive work is necessary--in sociology, as in pathology--and when we get an intriguing idea which can be best phrased mathematically we may have to spend years in designing experiments, perfecting instruments of measurement, and checking and re-checking data. Remember, there are thousands of published researches on the pituitary gland every year. If a mathematician thinks he can sit in his study and write out a sociological theory while he watches the blue smoke from his pipe curl heavenward, he should have all encouragement. I frankly doubt whether the mathematician, by himself, will contribute very much. The contribution will come via sociologists, who, like the agricultural experimenter when the rabbits came in and ate up nine degrees of freedom, had to do something about it. But the sociologists, if trained to ask intelligent questions, can go to the mathematician for indispensable help--and get it.
Safely, I think we can forecast that many a sociologist of the future will relive in his own experience something of the homage and joy felt by Sir Francis Galton when for him a mathematician worked out the equation of the normal correlation surface.
"The problem," said Galton, "may not be difficult to an accomplished mathematician, but I certainly never felt such a glow of loyalty and respect towards the sovereignty and wide sway of mathematical analysis as when his answer arrived, confirming by purely mathematical reasoning my various and laborious statistical conclusions better than I had dared to hope, because the data ran somewhat roughly and I had to smooth them with tender caution."