Notes on the Significance and Use of the Hillegas Scale for Measuring the Quality of English Composition
Edward L. Thorndike
Teachers College, Columbia University
It is obvious that specimen 294[1] has more merit as English writing than specimen 519. It is not obvious that specimen 294 has more merit than specimen 225. But the same argument by which we justify the assertion that No. 294 has much more merit than No. 519 justifies the assertion that it has somewhat more merit than No. 225. The argument is simply an appeal to experts. Out of one hundred and sixty members of the English Section of the 1912 Conference at the University of Illinois, 70 per cent judged that 294 had more merit than 225. Now, as has been explained by Dr. Hillegas and the author, it is possible to transmute the measures of difference in merit contained in the percentages of judgments of superiority into units of amount of difference, so that, for example, we know that the difference between No. 294 and No. 519 is about three and a half times the difference between No. 294 and No. 225. Calling 1.00 that amount of difference so great (or, better, so small) that only seventy-five out of a hundred such experts rank the two samples correctly, twenty-five putting the worse sample ahead of the better, and deciding on what kind of writing has just zero or just not any merit, we can find samples that are each just 1.00 better than zero; others that are each just 1.00 better than these or 2.00 better than zero; others that are each just 1.00 better than these or 3.00 better than zero; and so on. If sample 580 is taken as zero, samples 94, 519, 534, 196, and 221 are approximately 3.7, 4.7, 5.7, 6.7, and 7.7. Now samples 94 to 221, or amounts of merit 4.7 to 7.7, give roughly the quality of work which our high-school pupils display in examination papers,
( 552) in set themes and the like. A few high-school compositions will run from 8.0 up to 9.0 and a few from 4.0 down to 3.0.
It is the purpose of this paper to measure roughly the difference between the paragraph-writing of boys and girls in high school and that selected as the specially good performances of recognized masters of English prose.
The facts are, very simply, as follows: specimen 258 is by Washington Irving; specimen 217 is by Hawthorne; specimen 296 is by Thackeray. These were chosen for the author by a teacher of English as such samples of the better work of these authors as made convenient units for isolated estimation. Specimen 231 is by a college Freshman; specimen 294 is by a high-school pupil; specimens 221, 225, 196, 245, 329, 338, and 519 are high-school specimens, covering the range from 8.0 to a little below 5.0.
Now specimen 519 has the least merit of any of these. Call its merit, for the present, x. Then, by the judgments of 160 members of the English Section of the Illinois Conference of 1912, the specimens are ranked in the following order and have the following amounts of merit.
(H.S.) 519 is of merit x.
(H.S.) 338 is judged better than 519 by 56 1/4 per cent of the judges, and is of merit x+ 0.23 (1.00 being defined above).
(H.S.) 329 is judged better than 338 by 76.6 per cent of the judges, and is of merit x+. 23 +1.08, or x + 1.31.
(H.S.) 196 is judged better than 329 by 66 1/4 per cent of the judges, and is of merit x + 1.31 + .62, or x +1.93
(Thackeray) 269 is judged better than 196 by 65 per cent of the judges, and is of merit x+1.93 +0.57, or x +2.5.
(H.S. and H.S.) 221 and 225 are judged one a trifle worse and one a trifle better than 296 (48 1 per cent and 53.1 per cent), averaging practically the same merit of a +2. 5.
(Coll, Fresh.) 231 is judged better than 296 (Thackeray) by 70 per cent of the judges, and is of merit x+2.5 +.78, or x+3.28.
(H.S.) 294 is judged better than 231 by 551 per cent of the judges and is of merit x+3. 28+. 21, or x+3.49.
(Hawthorne) 217 is judged better than 294 by 631 per cent of the judges, and is of merit x+3.49+ . 52, or x+4..
(Irving) 258 is judged better than 217 by 55 per cent of the judges, and is of merit x+4.01 +.21, or x+4.22.
( 553)
Now, assuming the freedom of these 160 judges from any unfairness in favor of the high-school specimens or against the standard writers (whatever prejudice there was worked probably in the other way), it is clear that the difference between the two groups is, in certain important senses, very small. Specimen 5rg, which has little more merit than the worst of high-school compositions, is only one and a half times as far below the Thackeray passage as that is below the Irving passage. It would in fact be very easy to find many paragraphs in the "standard" essayists, historians, and novelists that would have been credited with less than x+2.25 merit by this group of teachers of English. A very fair percentage of high-school compositions would be credited above x+2.25. The paragraph-writing of pupils in our high schools and that of the world's hundred best English writers undoubtedly overlap considerably in merit. Assume for the sake of illustration that the average writing of a fourth-year class is of merit x+2.0 and that the average writing of the hundred masters is of merit x+3.75. Then if a number of samples of the former were paired with a number of samples of the latter and judged by this group of teachers, the superiority of the latter would be far from obvious. In fact, in the long run, twelve out of a hundred of the teachers of English would rank the high-school specimen above the master's. Now x+ 2 is certainly not much too high for fourth-year work in a good school and x+3.75 is certainly not much too low for the average paragraph of the masters of English prose.
The fact may be stated more simply if the difference between x and zero—that is, the absolute merit of specimens such as No. 519—is determined. If these one hundred and sixty judges had ranked also in the same way specimens running from No. 519 down to some as bad as No. 580, 580 would have been found to be about equal to X-4.1. That is, No. 580 would have been put not quite as far below No. 519 as No. 258 was put above it. If the one hundred and sixty judges had each made up artificially a paragraph that represented his notion of zero, or just not any, merit-the merit of a paragraph in which merit in English composition is just barely beginning to be observable-these zero specimens would on the average have been not much, if any, worse
( 554) than specimen 580. We are fairly safe in saying that x is at least 3.5 and not over 4.5 above absolute zero in the opinion of these one hundred and sixty judges. Call x equal to 4.0. Then the absolute values of the specimens are:
519, representing nearly the lower limit of high-school paragraph-writing, is 4.
338 is 4.23. 329 is 5.31. 245 is 5.79. 277 is 5.93.
296 (Thackeray) is 6.5.
221 (H.S.) and 225 (H.S.) are about 6.5,
231 (Coll. Fresh.) is 7.28. 294 (H.S.) is 7.49.
217 (Hawthorne) is 8. or. 258 (Irving) is 8.22.
Consequently, in the judgment of high-school teachers of English, the worst tenth of paragraph-writing of high-school pupils is still nearly half-way from zero toward the best the world knows. What we rightly consider a mediocre composition, such as 245, still represents nearly three-fourths of the progress possible.
I will not continue with similar comparisons nor draw any of the many, and, as I think, important practical conclusions which these measurements warrant, but will only restate the measurements themselves in a metaphor.
If ten men, A, B, C ... . J, ran one at a time a hundred yards in 10,10 1/2, 11,... . 14 ½ seconds, respectively, and if, after each successive pair had run, we had to judge (without watches or counting) which ran faster, the judgment of a hundred and sixty observers, even if well trained, would often err. The quicker runner of the two would get only a plurality unless the difference in time was great. We should, applying just the same treatment to the judgments that was applied to the compositions, come out with a difference between A and J of perhaps only 4, 0, and find that the fastest running in the world was not, after all, remote from an average high-school boy's performance. If we decided that merit in running began at a rate of five feet or less a second we should find that our average school boy had already made three quarters or more of the progress possible. Teachers of athletics would disagree very widely in the "marks" that they gave to the same feat of running; and we should quarrel bitterly over the respective merits of A and B ! We should reflect, perhaps with surprise, that the world pays enormous sums in money and fame for a difference so small that one person out of four cannot see it!
( 555)
It may be well to meet one criticism which will be thought of by many of the hundred and sixty teachers who gave the judgments —the criticism that these judgments (made in forty minutes) were hasty and necessarily inaccurate. This was the fact, but its only effect on the issue was to make all the differences represented by 1.00 larger than they would have been with more time and care. All the relations shown by careful judgments would he the same as those stated here, but the values themselves would range more widely. The difference between 580 and 519 would be perhaps 5.0 instead of 4.1 and the difference between 519 and 258 would be perhaps 5.2 instead of 4.2. All the differences would be swollen in the same proportions. The difference noted by seventy-five out of a hundred judges working rapidly and called r . oo here, would, with more care, be noted by say eighty out of a hundred and so be called 1.25 .25 in a study made with very careful judgments. The unit 1.00 means an amount of difference observable by three-fourths of certain specified judges under certain specified conditions. Improve the fineness of discrimination of the judges or the conditions under which they judge, and 1.00 means a smaller difference. Samples 94, 519, 534, 196, and 221 were given values of 3.7, 4.7, 5.7, 6.7, and 7.7 by such expert and careful judgments. By the one hundred and sixty judgments considered here sample 519 is put only 2.5 below sample 221.
The second purpose of this paper is to measure the amount of error to be expected in grading specimens of English writing by a scale such as that furnished by samples 94, 519, 534, 196, and 221 of values 3.7, 4.7, 5.7, 6.7, and 7.7, respectively. It is obvious that, since 1.00 represents a difference which three out of four careful judges will fail to see, the average error in giving any specimen a number in comparison with the scale must be large.
For example, specimen 231 was ranked as:
Worse than 338 by 31 per cent of the 160 judges
"" 329 but better than 338 by 61/8 per cent of the judges
" “277 “ 329 by 15 per cent of the judges
“”296 " 277 by 5 5/8 per cent of the judges
""294 " 296 by 25 1/2 per cent of the judges
Better " 294 " worse " 217 by 14 1/2 per cent of the judges
“”217 " 258 by 3 1/8 per cent of the judges
""258
by 26 7/8 per cent of the judges
( 556)
That is, a specimen which the general opinion of the one hundred and sixty ranks as 7.27 is ranked below 4.2 by some; and, since it is ranked above 8.22 by over a fourth of the judges, would probably be ranked as nearly 10.0 by others.
If a new sample (call it N) is really of merit 5.7, for example, even careful judges will tend to regard it as worse than 4.7 in one case out of four, and as better than 6.7 in one case out of four (the same judge, of course, cannot make both of these errors). So (except for the influence of the scale as a whole) one-fourth of the grades assigned to N will be below 4.7 and one-fourth above 6.7; the median error will be 1.00and the average error about 1.18. The effect of the scale as a whole is complex, and I will not figure out the probabilities for it. A judge comparing our supposed N of real value 5.7, might, for example, regard it as worse than both 519 (4.7) and 196 (6.7), rating it 4.3, if these two and it were the only means of estimate, but might regard it as equal to 534 (5.7) if N and 534 were the only means of comparison. If the whole scale is given and if he is converted to the belief that 534 is half-way between 519 and 196, and recognizes also that N is very much better than 94 (3.7) and not very much worse than 196 (6.7), then he may judge N to be 5.0 or 5.5, improving his judgment markedly.
Professor Hillegas is measuring the errors made in using such a scale; and Mr.
Johnston reported at the Illinois Conference a most interesting series of such
measurements.[2]
Three facts will be
proved as such studies progress. First, the errors will be large; second, they
will diminish with practice in using such a scale and with improvements in the
scale itself; third, they will —at least, after sufficient practice—be smaller
than the errors now made by teachers in grading paragraph-writing for general
merit. The reason for the last fact is that at present a teacher, in grading a
composition for general merit, uses a subjective, personal scale of values
which, in the nature of the case, cannot, on the average, be as correct as one
due to the combined opinions of a hundred or more judges who are on the average
as competent as he is. The teacher now adds the errors of his personal
subjective scale of values to the errors of comparing a specimen therewith. A
scale such as has been
(557) referred to here, and such as Professor Hillegas has worked out,
eliminates the former errors altogether and, if the teacher has had enough
practice with it, cannot increase, and probably will decrease, the errors of comparison.
SPECIMENS OF ENGLISH WRITING REFERRED TO IN THE TEXT
94.
SULLA AS A TYRANT
When Sulla came back from his conquest Marius had put himself consul so sulla with the army he had with him in his conquest siezed the government from Marius and put himself in consul and had a list of his enemys printy and the men whoes names were on this list we beheaded.
196.
ICHABOD
CRANE
Ichabod Crane was a schoolmaster in a place called Sleepy Hollow. He was tall and slim with broad shoulders, long arms that dangled far below his coat sleeves. His feet looked as if they might easily have been used for shovels. His nose was long and his entire frame was most loosely hung to-gether.
217
SELECTION FROM HAWTHORNE
Oh that I had never heard of Niagara till I beheld it! Blessed were the wanderers of old, who heard its deep roar, sounding through the woods, as the summons to an unknown wonder, and approached its awful brink, in all the freshness of native feeling. Had its own mysterious voice been the first to warn me of its existence, then, indeed, I might have knelt down and worshipped. But I had come hither, haunted with a vision of foam and fury, and dizzy cliffs, and an ocean tumbling down out of the sky---a scene, in short, which nature had too much good taste and calm simplicity to realize. My mind had struggled to adapt these false conceptions to the reality, and finding the effort vain, a wretched sense of disappointment weighed me down. I climbed the precipice, and threw myself on the earth feeling that I was unworthy to look at the Great Falls, and careless about beholding them again.
221
GOING DOWN WITH VICTORY
As we road down Lombard Street, we saw flags waving from nearly every window. I surely felt proud that day to be the driver of the gaily decorated coach. Again and again we were cheered as we drove slowly to the post-masters, to await the coming of his majestie's mail. There wasn't one of the gaily bedecked coaches that could have compared with ours, in my estimation. So with waving flags and fluttering hearts we waited for the coming of the mail and the expected tidings of victory.
(558)
When at last it did arrive the postmaster began to quickly sort the bundles, we waited anxiously. Immediately upon receiving our bundles, I lashed the horses and they responded with a jump. Out into the country we drove at reckless speed--everywhere spreading like wildfire the news, "Victory!" The exileration that we all felt was shared with the horses. Up and down grade and over bridges, we drove at breakneck speed and spreading the news at every hamlet with that one cry "Victory!"When at last we were back home again, it was with the hope that we should have another ride some day with " Victory."
225
Before the Renaissance, artists and sculptors made their statues and pictures thin, and weak looking figures. They saw absolutely no beauty in the human body. At the time of the Renaissance, artists began to see beauty in musclar and strong bodies, and consequently many took warriors as subjects for their statues. Two of the statues that Michel Angelo, the great sculptor and artist, made, Perseus with the head of Medusa, and David with Goliath's head, are very similar. They show minutely and with wonderful exactness every muscle of the body. Michel Angelo was a great student of the body, especially when it was in a strained position. The position of the figures on the tomb of Lorenzo the Great is so wonderful that one can almost see the tension of the muscles.
231
A FOREIGNER'S TRIBUTE TO JOAN OF ARC
Joan of Arc, worn out by the suffering that was thrust upon her, nether the-less appeared with a brave mien before the Bishop of Beauvais. She knew, had always known that she roust die when her mission was fulfilled and death held no terrors for her. To all the bishop's questions she answered firmly and without hesitation,. The bishop failed to confuse her and at last condemned her to death for heresy, bidding her recant if she would live. She refused and was lead to prison, from there to death.
While the flames were writhing around her she bade the old bishop who stood by her to move away or he would be injured. Her last thought was of others and De Quincy says, that recant was no more in her mind than on her lips. She died as she lived, with a prayer on her lips and listening to the voices that had whispered to her so often.
The heroism of Joan of Are was wonderful. We do not know what form her great patriotism took or how far it really led her. She spoke of hearing voices and of seeing visions. We only know that she resolved to save her country, knowing though she did so, it would cost her her life. Yet she never hesitated. She was uneducated save for the lessons taught her by nature. Yet she led armies and crowned the dauphin, king of France. She was only a girl, yet she could silence a great bishop by words that carne from her heart and from her faith. She was only a woman, yet she could die as bravely as any martyr who had gone before.
( 559)
245
I am going to Princeton partly because it was my father's college. I also prefer to go to a college away from home. You get the college life much more that way. My main reason is on account of the great advantages held forth in the preceptorial system. The preceptorial system is organized as follows. Imagine a class, junior for example of perhaps three hundred, divided into sections of twenty five each. For each of these sections there are six preceptors, men engaged to head groups of four or five to talk over their work with them and give them points and suggestions about it. The advantage of this is that the man gets a great deal more individual attention in this manner than he otherwise would. Princeton has high standards of intellectuality as well as athletics.
258
SELECTIONS FROM IRVING
In the meantime, the seasons gradually rolled on. The little frogs which had piped in the meadows in early spring, croaked as bull-frogs during the summer heats, and then sank into silence. The peach-tree budded, blossomed, and bore its fruit. The swallows and martins came, twittered about the roof, built their nests, reared their young, held their congress along the eaves, and then winged their flight in search of another spring. The caterpillar spun its winding-sheet, dangled in it from the great button-wood tree before the house; turned into a moth, fluttered with the last sunshine of summer, and disappeared; and finally the leaves of the button-wood tree turned yellow, then brown, then rustled one by one to the ground, and, whirling about in little eddies of wind and dust, whispered that winter was at hand.
294
Among the beautiful islands on the Canadian side of the St. Lawrence River, there is a deep and narrow channel which separates three small wooded islands from a large fertile one. Of the three islands the largest is rocky and covered with a growth of stately pines and waving hemlocks, and a carpet of moss and ferns. On the second there is quite an assortment of trees, whose foliage during the fall turns to many shades of gold and red, which colors are greatly enhanced by the dark green background of its neighbor. On the third there is a thick growth of brush, with an occasional small tree. These three islands are so close together, that fallen trees and logs make it possible to walk from one to another.
296
SELECTIONS FROM THACKERAY
How one loves to see the burly figure of him, this thick-skinned, seemingly opaque, perhaps sulky, almost stupid man of practice, pitted against some light adroit man of theory, all equipt with clear logic, and able anywhere to give you why for wherefore! The adroit man of theory, so light of movement, clear
(560) of utterance, with his bow full-bent and quiver full of arrow-arguments-surely he will strike down the game, transfix everywhere the heart of the matter; triumph everywhere, as he proves that he shall and must do ? To your astonishment, it turns out oftenest no. The cloudy-browed, thick-soled, opaque practicality, with no logic-utterance, in silence mainly, with here and there a low grunt or growl, has in him what transcends all logic-utterances; a congruity with the unuttered. The speakable, which lies atop, as a superficial film, or outer skin, is his or is not his: but the doable, which reaches dawn to the world's center, you find him there!
329
When Abraham was twenty one the family moved to Decatur where he made his first public speech. Here he built a boat and went to New Orleans where for the first time he saw slaves. Then he vowed he put and end to it someday if he could. When he returned he went to New Salem where he was postmaster and store clerk He was then elected to the legislature. He studied law and when twenty eight he was admitted to the bar. Then after a few years he was elected president and office which he filled as few men would at his tinte. When he was elected his troubles began. He was against slavery the states left the union. At the war which freed the slaves came. During this war Lincoln showed his kind heartness by pardoning so many men. He did not like to see these men shot leaving their wives and families fatherless.
338
This man who is the chief character of this story, is the stingest man in town one day before Christmas and the nicest man on Christmas, and all this comes from a dream. His name is Soloman and in his dream he dreams of coming home to his old cheap looking home, in an old side alley and, as he gets to the door this gosts head appears and as he open the door it departs, lighting a match to go up stairs with, not fearing the gast, and then starts up stairs and he had no sooner reached the top step when there was and awful clammer of chains and bells, As he walks into his roam he hears the sound coming up the stairs nearer and nearer to his room every minute, And after he got in bed and blew out the light, he heard the gast walk right in his room and call him so he got up, being scared and afraid the gost would harm him, the gast told him to sit down beside him which the did, And then he said that he was Soloman partner and had died twenty years ago.
519
De Quincy
First: De Quincys mother was a beautiful women and through her De Quincy inhereted much of his genius.
His running away from school enfluenced him much as he roamed through the woods, valleys and his mind became very meditative.
(561)
The greatest enfluence of De Quincy's life was the opium habit. If it was not for this habit it is doubtful whether we would now be reading his writings. His companions during his college course and even before that time were great enfluences. The surroundings of De Quincy were enfluences. Not only De Quincy's habit of opium but other habits which were peculiar to his life. His marriage to the woman which he did not especially care for. The many well educated and noteworthy friends of De Quincy.
534
FLUELLEN
The passages given show the following characteristic of Fluellen: his inclination to brag, his professed knowledge of History, his complaining character, his great patriotism, pride of his leader, admired honesty, revengeful, love of fun and punishment of those who deserve it.
580
LETTER
DEAR SIR: I write to say that it aint a square deal Schools is I say they is
I went to a school. red and green and brown aint it hito bit I say he don't
know his business not today nor yeaterday and you know it and I want Jennie to
get me out.