An Experimental Comparison of Statistical and Case History Methods of Attitude Research
IV: Other Checks on the Degree of Agreement Shown Between the Test Scores and Case History Ratings
Samuel Stouffer
Table of Contents | Next | Previous
Three other checks on the degree of agreement between the test scores and case history ratings may be briefly summarized.
1. Comparison of test scores and case history ratings with students' own ratings of their attitudes toward prohibition laws,
As mentioned before, each student rated his own attitude on the day on which he took the Smith test. This was done by placing a check mark on a graphic rating scale similar to that later used by the judges in interpreting the case histories, The score was determined by measuring the distance from the left of the scale to the check mark in tenths of inches. A few days later, each student was. asked to rate his attitude again, on an identical scale. The total score of the two ratings was then taken as the index of his self rating. Preliminary examination of the self ratings showed that the students apparently had had considerable difficulty in conceiving of the line on which they placed their check mark as a scale. They tended to check either the extremes or the middle, producing a frequency distribution with three peaks. This might have been avoided if more detailed directions had been given, although an experiment conducted by asking the students of one of Mr. Cressey's Sociology 110 classes to fill out the blanks a third time, following a week's interval, failed to
( 39) show that the giving of elaborate instructions produced a very striking difference,[1] A still more serious drawback was the carelessness with which some of the students read their blanks before checking. Although told to read carefully before making their mark, some of them placed a check mark at exactly the opposite and of the scale the second time from the position checked the first time. Eighteen cases of rather extreme disagreement, where the second rating almost exactly reversed the first, were sorted out and both rating sheets returned to the student, with the member of the class who held the key to the node numbers acting as intermediary. A note was attached, asking the student to make a correction if there had been an oversight; otherwise to leave the ratings unchanged, as such variability might conceivably be a true reflection of his shifting attitude. Of these 18 cases, 16 reported that they had misread the instructions. The ratings as corrected were used in computing the final scores. Under such circumstances, a reliability coefficient computed either with or without the subjects who made these corrected ratings included, or with the ratings left in their uncorrected form, would have an uncertain meaning. It was still possible, however, to compare the students' total self ratings as to attitudes toward prohibition laws with (1) the Smith test scores and (2) the case history ratings on attitudes toward prohibi-
( 40) -tion laws. This was done. Each of the correlation coefficients was +.80 [2]
Unsatisfactory as the self-rating method as used seemed to be, there might have been a danger signal if the correlations between the self-ratings and either the test or the case histories had been very low, or if one of the two correlation coefficients had differed markedly from the other.2. Comparison of test scores and case history ratings, of attitudes toward prohibition laws with ratings on attitudes toward drinking liquor.
The purpose of attitude tests is more, of course, than to provide a static frequency distribution of attitudes in a given group. It is primarily to provide a scale with which other variables may be correlated. A mere yes or no vote will not serve such a purpose, for a series of reliably spaced intermediate grades is necessary. The case history method does not readily lend itself to this form of quantification if a large number of cases are desired. The test method, if valid, makes it possible to get rather cheaply and quickly the attitude scores of, say, 10,000 people. By throwing these people into various divisions according to the variables studied and making successive subdivisions it should be possible to get fairly adequate control groups. The test scores can be compared with reference to each group, with the use eventually,
(41) perhaps, of some of the more refined statistical methods of treating small samples[3] in order to avoid difficulties involved in partial and multiple correlation when variables are numerous.
In the case of attitudes toward prohibition, a variable with which an interesting comparison might be made, is that of attitudes toward drinking liquor. It seemed worth while to find whether or not the Smith test and the judges' ratings of the case histories on attitudes toward prohibition laws would lead to different conclusions when each were correlated with indices of attitudes toward drinking liquor. At the same time, this would provide an additional check on the reliability of judges' ratings of the case history material used In this investigation.The four graduate student judges rated each of the 238 papers as to attitudes toward drinking liquor. The graphic rating scale was identical with that used in rating attitudes toward prohibition laws, except that the direction was reversed. That is, if each student were found to be as strongly in favor of drinking as he was opposed to prohibition laws, or as strongly opposed to drinking as he was in favor of prohibition laws, the correlation would be -1.00. The reliability of the judges' ratings on attitudes toward drinking liquor was found to be just as high (+.96) as the reliability of the judges' ratings on attitudes toward prohibition laws. The
(42) various reliability coefficients are summarized in Table 4.
Judges | Correlation |
---|---|
Cottrell v. Faris | +.86 |
Cottrell v Stonequist | +.89 |
Cottrell v Thompson | +.84 |
Faris v Stonequist | +.87 |
Faris v Thomson | +.83 |
Stonequist v Thompson | +.87 |
Average | +.86 |
Cottrell and Faris v Thompson and Stonequist | +.91 |
Cottrell and Stonequist v Faris and Thompson | +.92 |
Cottrell and Thompson v Faris and Stonequist | +.92 |
Average | +.92 |
Average reliability of composite ratings of four judges, estimated by the Spearman-Brown formula, using as a base either of the above average carried out to three decimal places | +.96 |
Composite ratings of four judges and self ratings by subjects on attitudes toward drinking | +.79 |
The correlation between the test scores and the judges' ratings of the case histories with respect to attitudes toward liquor was found to be -.58. This was practically the same as the correlation between the Judges' ratings of the case
(42) histories with respect to prohibition laws and the judges' ratings of the ease histories with respect to drinking liquor, which was -.60. An inspection of the two correlation tables (Tables 17 and 18 in Appendix B) does not disclose any striking dissimilarities.
It was possible also to compare these two correlation coefficients with those obtained by using students' self ratings.[4] The correlation between the test and self ratings of attitudes toward drinking liquor was -.56. The correlation between the judges' ratings of case histories on attitudes toward prohibition laws and self ratings on drinking liquor was -,60. The correlation between the self ratings on attitudes toward prohibition laws and self ratings on attitudes toward drinking liquor was -.47. These various correlations would lead to practically the same interpretations of the relationship between indices of attitudes toward prohibition and attitudes toward drinking liquor in this experimental group.
3. Comparison of test scores and case history ratings, of attitudes toward prohibition laws, with relation to antecedent factors in the subjects' experiences..
In a study of several thousand subjects It might be desired to compare the attitude scores made by groups of subjects
( 43) with certain typical backgrounds in home and community as reported on a direct questionnaire. A direct questionnaire was filled out by the 238 subjects in this experiment. (See Appendix A, page 71.). Since this was an incidental aspect of the study and since 238 subjects are too few from which to draw important statistical interpretations as to the relative weight of various background factors, no attempt was made to check the validity of the answers on the direct questionnaire. The categories were crudely defined, but some of them might serve the purpose of showing whether or not the test scores and case history ratings might lead to contradictory conclusions.
The results so far obtained seem to indicate quite clearly that substantially the same conclusions would be reached, whether background factors were compared with the test scores or with the case history ratings. This aspect of the project is still in the testing stage, but Table 5 below may be presented as a fair illustration of what has been found. Thus far, an analysis of a number of somewhat similar tables has revealed nothing contradictory.l[5] In Table 5 the background
(44)
SMITH TEST SCORES | |||||
Father’s vote dry neighbourhood dry | Father’s vote dry neighbourhood not reported dry | Father’s vote not reported dry neighbourhood dry | Father’s vote not reported dry neighbourhood not reported dry | Total | |
---|---|---|---|---|---|
9.0 - 9.4 | 1 | 1 | |||
8.5 - 8.9 | 0 | ||||
8.0 - 8.4 | 1 | 1 | 5 | 7 | |
7.5 - 7.9 | 7 | 7 | 14 | ||
7.0 - 7.4 | 4 | 2 | 2 | 13 | 21 |
6.5 - 6.9 | 2 | 2 | 6 | 17 | 27 |
6.0 - 6.4 | 4 | 7 | 3 | 12 | 26 |
5.5 - 5.9 | 4 | 3 | 4 | 10 | 21 |
5.0 - 5.4 | 6 | 12 | 6 | 4 | 28 |
4.5 - 4.9 | 6 | 3 | 4 | 3 | 16 |
4.0 - 4.4 | 3 | 2 | 3 | 2 | 10 |
3.5 - 3.9 | 11 | 6 | 4 | 3 | 24 |
3.0 - 3.4 | 17 | 2 | 2 | 7 | 28 |
2.5 - 2.9 | 7 | 2 | 1 | 4 | 14 |
2.0 - 2.4 | 1 | 1 | |||
65 | 42 | 43 | 88 | 238 | |
r (assuming normality of distribution of x’s) = + .39 | |||||
CASE HISTORY RATINGS | |||||
150 - 159 | 1 | 2 | 3 | ||
140 - 149 | 2 | 9 | 11 | ||
130 - 139 | 1 | 6 | 7 | 13 | 27 |
120 - 129 | 5 | 8 | 6 | 18 | 37 |
110 - 119 | 3 | 2 | 8 | 12 | 25 |
100 - 109 | 3 | 5 | 6 | 4 | 18 |
90 - 99 | 7 | 4 | 4 | 5 | 20 |
80 - 89 | 7 | 4 | 4 | 5 | 20 |
70 - 79 | 10 | 4 | 6 | 10 | 30 |
60 - 69 | 19 | 4 | 2 | 6 | 31 |
50 - 59 | 7 | 4 | 2 | 5 | 18 |
40 - 49 | 1 | 1 | |||
65 | 72 | 46 | 88 | 238 | |
r (assuming normality of distributions of x's) = +.35 |
(46) variable is expressed in four categories: (1) subjects whose fathers usually voted dry at liquor. elections and whose neighborhoods in which most of their childhood was spent were dry, (2) those whose fathers usually voted dry but whose neighborhoods were not reported dry, (3) those whose fathers were not reported as usually voting dry but whose neighborhoods were dry, and (4) those whose fathers were reported as not voting dry and, whose neighborhoods were not reported dry. The distributions of test scores and case history ratings on attitudes toward prohibition laws were found for each of these four classes. By assumption that the unknown distribution of the background variable was Gaussian, coefficients of correlation were computed.[6] The coefficients of correlation are +.39 +.04, using the test scores and +.33 + .04, using the case history ratings, These correlations may be interpreted as low, but
( 47) alike indicative of a clear tendency for a positive correlation. The difference between the two is not statistically significant.[7]
A more delicate test than merely to report the two correlation coefficients is to plot the regressions of the means of the y' s on x and the means of the x+ s on y, This is done in Chart 1, the data being summarized in Table 27 in Appendix B. The general contour of the lines connecting the means will be seen to be nearly enough alike in either of the plots to lead to substantially the same interpretations. This is particularly striking because the correlation of both indices of attitudes with the independent variable was so low that the nature of the scatter in each case about the regression line could have been widely different and yet resulted in about the same correlation coefficient.