A Mead Project source page

Originally published as:

Samuel A. Stouffer. "Other Checks on the Degree of Agreement Shown Between the Test Scores and Case History Ratings." Part IV of An Experimental Comparison of Statistical and Case History Methods of Attitude Research. Doctoral Dissertation, University of Chicago. (1930): 38 - 48

Editors' notes

This is Samuel Andrew Stouffer's doctoral dissertation. It was the first examination of the comparability of Thurstonian scaling methods and case history methods commonly employed at Chicago, the type of work that we know call qualitative methods. Some have interpreted the results as a "convergent validity" study, an interpretation that Thurstone disapproved of.

Robert E. Faris --- in his history of Chicago sociology --- saw it as the nail in the coffin of the case history approach. Clearly it wasn't, but it seems to have been received in that vein by some at Chicago. The study demonstrated the utility of bothe approaches. Although a comment in an extract from the study presented at the annual meeting of the American Sociological Society in December of 1930 indicated that Stouffer expected the dissertation to appear in the American Journal of Sociology, as far as we can determine the study has never been published. We suspect that its publication was suppressed.

Reference to the study was made in the discussion of Blumer's critique of The Polish Peasant in Europe and America, the first publication in the Social Science Research Council's project evaluating key post-war contributions to its member disciplines. The study is an important contribution to our understanding of attitude, especially how it is studied today.

We are exceedingly proud of bringing this document into print. This version does not preserve the the appendices which included many of the original case histories. A footnote suggests that those histories are still preserved in the University of Chicago Department of Sociology's archives.

Site Navigation

Mead Project Inventory

An Experimental Comparison of Statistical and Case History Methods of Attitude Research

IV: Other Checks on the Degree of Agreement Shown Between the Test Scores and Case History Ratings

Samuel Stouffer

Table of Contents | Next | Previous

Three other checks on the degree of agreement between the test scores and case history ratings may be briefly summarized.

1. Comparison of test scores and case history ratings with students' own ratings of their attitudes toward prohibition laws,

As mentioned before, each student rated his own attitude on the day on which he took the Smith test. This was done by placing a check mark on a graphic rating scale similar to that later used by the judges in interpreting the case histories, The score was determined by measuring the distance from the left of the scale to the check mark in tenths of inches. A few days later, each student was. asked to rate his attitude again, on an identical scale. The total score of the two ratings was then taken as the index of his self rating. Preliminary examination of the self ratings showed that the students apparently had had considerable difficulty in conceiving of the line on which they placed their check mark as a scale. They tended to check either the extremes or the middle, producing a frequency distribution with three peaks. This might have been avoided if more detailed directions had been given, although an experiment conducted by asking the students of one of Mr. Cressey's Sociology 110 classes to fill out the blanks a third time, following a week's interval, failed to

( 39) show that the giving of elaborate instructions produced a very striking difference,[1] A still more serious drawback was the carelessness with which some of the students read their blanks before checking. Although told to read carefully before making their mark, some of them placed a check mark at exactly the opposite and of the scale the second time from the position checked the first time. Eighteen cases of rather extreme disagreement, where the second rating almost exactly reversed the first, were sorted out and both rating sheets returned to the student, with the member of the class who held the key to the node numbers acting as intermediary. A note was attached, asking the student to make a correction if there had been an oversight; otherwise to leave the ratings unchanged, as such variability might conceivably be a true reflection of his shifting attitude. Of these 18 cases, 16 reported that they had misread the instructions. The ratings as corrected were used in computing the final scores. Under such circumstances, a reliability coefficient computed either with or without the subjects who made these corrected ratings included, or with the ratings left in their uncorrected form, would have an uncertain meaning. It was still possible, however, to compare the students' total self ratings as to attitudes toward prohibition laws with (1) the Smith test scores and (2) the case history ratings on attitudes toward prohibi-

( 40) -tion laws. This was done. Each of the correlation coefficients was +.80 [2]

Unsatisfactory as the self-rating method as used seemed to be, there might have been a danger signal if the correlations between the self-ratings and either the test or the case histories had been very low, or if one of the two correlation coefficients had differed markedly from the other.

2. Comparison of test scores and case history ratings, of attitudes toward prohibition laws with ratings on attitudes toward drinking liquor.

The purpose of attitude tests is more, of course, than to provide a static frequency distribution of attitudes in a given group. It is primarily to provide a scale with which other variables may be correlated. A mere yes or no vote will not serve such a purpose, for a series of reliably spaced intermediate grades is necessary. The case history method does not readily lend itself to this form of quantification if a large number of cases are desired. The test method, if valid, makes it possible to get rather cheaply and quickly the attitude scores of, say, 10,000 people. By throwing these people into various divisions according to the variables studied and making successive subdivisions it should be possible to get fairly adequate control groups. The test scores can be compared with reference to each group, with the use eventually,

(41) perhaps, of some of the more refined statistical methods of treating small samples[3] in order to avoid difficulties involved in partial and multiple correlation when variables are numerous.

In the case of attitudes toward prohibition, a variable with which an interesting comparison might be made, is that of attitudes toward drinking liquor. It seemed worth while to find whether or not the Smith test and the judges' ratings of the case histories on attitudes toward prohibition laws would lead to different conclusions when each were correlated with indices of attitudes toward drinking liquor. At the same time, this would provide an additional check on the reliability of judges' ratings of the case history material used In this investigation.

The four graduate student judges rated each of the 238 papers as to attitudes toward drinking liquor. The graphic rating scale was identical with that used in rating attitudes toward prohibition laws, except that the direction was reversed. That is, if each student were found to be as strongly in favor of drinking as he was opposed to prohibition laws, or as strongly opposed to drinking as he was in favor of prohibition laws, the correlation would be -1.00. The reliability of the judges' ratings on attitudes toward drinking liquor was found to be just as high (+.96) as the reliability of the judges' ratings on attitudes toward prohibition laws. The

(42) various reliability coefficients are summarized in Table 4.

Table 4 Correlation of Judges Ratings on Atttiudes Toward Drinking Liquor, in 258 Case Histories
Judges	Correlation
Cottrell v. Faris	+.86
Cottrell v Stonequist	+.89
Cottrell v Thompson	+.84
Faris v Stonequist	+.87
Faris v Thomson	+.83
Stonequist v Thompson	+.87
Average	+.86
Cottrell and Faris v Thompson and Stonequist	+.91
Cottrell and Stonequist v Faris and Thompson	+.92
Cottrell and Thompson v Faris and Stonequist	+.92
Average	+.92
Average reliability of composite ratings of four judges, estimated by the Spearman-Brown formula, using as a base either of the above average carried out to three decimal places	+.96
Composite ratings of four judges and self ratings by subjects on attitudes toward drinking	+.79

The correlation between the test scores and the judges' ratings of the case histories with respect to attitudes toward liquor was found to be -.58. This was practically the same as the correlation between the Judges' ratings of the case

(42) histories with respect to prohibition laws and the judges' ratings of the ease histories with respect to drinking liquor, which was -.60. An inspection of the two correlation tables (Tables 17 and 18 in Appendix B) does not disclose any striking dissimilarities.

It was possible also to compare these two correlation coefficients with those obtained by using students' self ratings.[4] The correlation between the test and self ratings of attitudes toward drinking liquor was -.56. The correlation between the judges' ratings of case histories on attitudes toward prohibition laws and self ratings on drinking liquor was -,60. The correlation between the self ratings on attitudes toward prohibition laws and self ratings on attitudes toward drinking liquor was -.47. These various correlations would lead to practically the same interpretations of the relationship between indices of attitudes toward prohibition and attitudes toward drinking liquor in this experimental group.

3. Comparison of test scores and case history ratings, of attitudes toward prohibition laws, with relation to antecedent factors in the subjects' experiences_..

In a study of several thousand subjects It might be desired to compare the attitude scores made by groups of subjects

( 43) with certain typical backgrounds in home and community as reported on a direct questionnaire. A direct questionnaire was filled out by the 238 subjects in this experiment. (See Appendix A, page 71.). Since this was an incidental aspect of the study and since 238 subjects are too few from which to draw important statistical interpretations as to the relative weight of various background factors, no attempt was made to check the validity of the answers on the direct questionnaire. The categories were crudely defined, but some of them might serve the purpose of showing whether or not the test scores and case history ratings might lead to contradictory conclusions.

The results so far obtained seem to indicate quite clearly that substantially the same conclusions would be reached, whether background factors were compared with the test scores or with the case history ratings. This aspect of the project is still in the testing stage, but Table 5 below may be presented as a fair illustration of what has been found. Thus far, an analysis of a number of somewhat similar tables has revealed nothing contradictory.l[5] In Table 5 the background

(44)

Table 5 Correlation Tables Showing Relationship Between Certain Background Factors and Smith Test Scores and Case History Ratings of Attitudes Toward Prohibition Laws Respectively
SMITH TEST SCORES
	Father’s vote dry neighbourhood dry	Father’s vote dry neighbourhood not reported dry	Father’s vote not reported dry neighbourhood dry	Father’s vote not reported dry neighbourhood not reported dry	Total
9.0 - 9.4				1	1
8.5 - 8.9					0
8.0 - 8.4		1	1	5	7
7.5 - 7.9			7	7	14
7.0 - 7.4	4	2	2	13	21
6.5 - 6.9	2	2	6	17	27
6.0 - 6.4	4	7	3	12	26
5.5 - 5.9	4	3	4	10	21
5.0 - 5.4	6	12	6	4	28
4.5 - 4.9	6	3	4	3	16
4.0 - 4.4	3	2	3	2	10
3.5 - 3.9	11	6	4	3	24
3.0 - 3.4	17	2	2	7	28
2.5 - 2.9	7	2	1	4	14
2.0 - 2.4	1				1
	65	42	43	88	238
r (assuming normality of distribution of x’s) = + .39
CASE HISTORY RATINGS
150 - 159		1		2	3
140 - 149	2			9	11
130 - 139	1	6	7	13	27
120 - 129	5	8	6	18	37
110 - 119	3	2	8	12	25
100 - 109	3	5	6	4	18
90 - 99	7	4	4	5	20
80 - 89	7	4	4	5	20
70 - 79	10	4	6	10	30
60 - 69	19	4	2	6	31
50 - 59	7	4	2	5	18
40 - 49	1				1
	65	72	46	88	238
r (assuming normality of distributions of x's) = +.35

(46) variable is expressed in four categories: (1) subjects whose fathers usually voted dry at liquor. elections and whose neighborhoods in which most of their childhood was spent were dry, (2) those whose fathers usually voted dry but whose neighborhoods were not reported dry, (3) those whose fathers were not reported as usually voting dry but whose neighborhoods were dry, and (4) those whose fathers were reported as not voting dry and, whose neighborhoods were not reported dry. The distributions of test scores and case history ratings on attitudes toward prohibition laws were found for each of these four classes. By assumption that the unknown distribution of the background variable was Gaussian, coefficients of correlation were computed.[6] The coefficients of correlation are +.39 +.04, using the test scores and +.33 + .04, using the case history ratings, These correlations may be interpreted as low, but

( 47) alike indicative of a clear tendency for a positive correlation. The difference between the two is not statistically significant.[7]

A more delicate test than merely to report the two correlation coefficients is to plot the regressions of the means of the y' s on x and the means of the x+ s on y, This is done in Chart 1, the data being summarized in Table 27 in Appendix B. The general contour of the lines connecting the means will be seen to be nearly enough alike in either of the plots to lead to substantially the same interpretations. This is particularly striking because the correlation of both indices of attitudes with the independent variable was so low that the nature of the scatter in each case about the regression line could have been widely different and yet resulted in about the same correlation coefficient.

Notes

The persistence of a memory effect may account for this, however.
The judges of the case histories, of course, did not see the students' self ratings.
Such as those projected by R. A. Fisher in Statistical Methods for Research Workers.
The same limitations apply to students' self ratings on attitudes toward liquor as apply to the self ratings on attitudes toward prohibition laws. See above, p.39.
Four master tables are presented in Appendix A, Tables 23 to 26. From these the student interested in manipulating some of the data may construct a rather large number of tables using various combinations of background factors. The variables here included are sex, parentage (native or foreign), neighborhood in which childhood was spent (whether dry or not), and father's vote at liquor elections (whether dry or not).
This is an extension of the method used in computing a biserial r. It was developed by Karl Pearson and reported from Professor Pearson's lecture notes by Holzinger, Statistical Methods for Students in Education, pp. 260-62. The assumption of a Gaussian distribution is made because some sort of a continuous variable seems to be implied and because a Gaussian distribution would seem a better guess than a rectangular distribution or any other convenient form, One test of the justification of the assumption is the distribution of the means of the arrays, as shown in Chart 1 below. The distribution would not seem to be distinctly non-linear. It should be emphasized, however, that the assumption of a Gaussian distribution is irrelevant for an actual comparison of the results obtained by using both test scores and case history ratings as the dependent variables, so long as the Gaussian distribution of the independent variable is assumed in both cases.
As a partial check on the method, two coefficients of contingency also were computed, using the test scores split into three class intervals at 4-.5 and 6.5. A 3 x 4 table yields a raw contingency coefficient of .43 and a 3 x 3 table, in which the two middle categories of the "background* variable are combined in one, yields a raw contingency coefficient of .40.

The content of this page is still protected by copyright in the United States of America and can not be reproduced for any purpose other than scholarship.

This page and related Mead Project pages constitute the personal web-site of Dr. Lloyd Gordon Ward (retired), who is responsible for its content. Although the Mead Project continues to be presented through the generosity of Brock University, the contents of this page do not reflect the opinion of Brock University. Brock University is not responsible for its content.

Fair Use Statement:

Scholars are permitted to reproduce this material for personal use. Instructors are permitted to reproduce this material for educational use by their students.

Otherwise, no part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording or any information storage or retrieval system, for the purpose of profit or personal benefit, without written permission from the Mead Project. Permission is granted for inclusion of the electronic text of these pages, and their related images in any index that provides free access to its listed documents.

The Mead Project, c/o Dr. Lloyd Gordon Ward, 44 Charles Street West, Apt. 4501, Toronto Ontario Canada M4Y 1R8