Social Psychology

Chapter 8: Social Stimulation — Language and Gesture

Floyd Henry Allport

Table of Contents | Next | Previous

Forms of Social Stimulation. The social behavior of human beings falls naturally into two classes. The first class comprises that behavior which affords stimulation to others; while the second consists of the characteristic responses which one makes to such stimulation. In the present chapter and the following one we shall discuss the first of these two classes; that is, we shall examine the behavior of the individual in its capacity for stimulating others to react. The forms of social stimulation may be classified in a number of ways. They' may be treated according to the sense organs by which they are received, such as 'auditory' for languages and cries, `touch' for pressures in a crowd, and so on. Or they may be grouped according to whether they are usually 'direct,' as language in conversation, or 'contributory,' as facial expressions and movements of others in a crowd. Some stimuli also are used for social control (language, gestures), while others merely enable us to adapt ourselves to the presence and characteristics of those who provide them (sight of others, physiognomy). For convenience, however, the most important forms of stimulating human behavior may be classified under the three main headings of Table III. This table makes use also of the principles of classification just mentioned.

In this chapter we shall consider the first and most important group of social stimuli, namely, vocal expression —  both that of inarticulate sounds and actual speech. Because of their close genetic relation to language, gestures will be included in the same discussion.


The Organs of Speech — General View. Audible speech is made up of two components, tone and noise. The structures for producing them are located in a succession of air passages leading from the lungs to the lips. Tone is set up by the expired air current

(170) setting into pendular vibration the vocal cords of the larynx. Noise results from non-pendular vibrations produced by frictions or explosions of air currents at various parts of the mouth cavity. A general view of the organs of speech in longitudinal section is seen

Table III. Social Stimuli
I. Vocal Behavior
Inarticulate Sounds
Ear Direct
II. Facial and Bodily Behavior
 Facial and Bodily Expression in Emotion
 Facial Posture in Repose
 Bodily Posture
Eye Direct and contributory Controlling and Self-Adaptinge
III. Minor Stimulations (Non-expressive behavior and mere presence)
   Sight of Others, Contact, Noise, Odor, Humidity, etc
Various Exteroceptive Senses Contributory

in Figure 15. The expired air from the lungs passes through the trachea (windpipe) into the larynx (Figure 15, ,23) where, the vocal cords (4) being properly adjusted, it produces a tone. Issuing upward through the vestibule of the larynx (20) it is deflected upward and backward by the epiglottis (18), and passes into the pharynx (19) and thence out through the oral or mouth cavity (12).

During speech the velum, or soft palate (Figure 15,15), with its dependent projection, the uvula (16), is raised so that it extends backward until it almost touches the back wall of the pharynx. In this way the nasal chambers are cut off from communication with the pharynx, and the air current is deflected forward through the mouth. The nasal chambers are

(171) separated from the mouth cavity also by the hard palate (11). In ordinary breathing the soft palate drops down close to the tongue, and, almost meeting the upraised epiglottis below, separates the mouth cavity more or less completely from the pharynx. The inspired air therefore passes

Figure 15. The Organs of Speech

(172) through the nostrils, down the pharynx (13, 19), into the larynx and trachea, finally reaching the lungs. In expiration it follows the same course. The epiglottis is a movable fold. It is lowered closing the entrance to the larynx during the swallowing of food. It is raised during respiration and speech.

The Larynx. The larynx consists essentially of a cartilaginous framework, or box, roughly cylindrical in shape, with both ends

Figure 16 Diagrammatic View of the larynx

left open. Two cartilages form the main part of this framework. The upper and larger one is the thyroid. It is not a complete cylinder, but is open behind; and is between the shape of a U and a

Figure 17, Diagram of the Larynx(173) V in cross-section. It is placed above the smaller cricoid, or signet-ring-shaped cartilage; and its sides project down and enclose the latter posteriorly, making a joint upon which it is free to rotate back and forth. Across the interior of this framework are stretched two elastic folds of mucous membrane, the vocal cords, which, being continuous with the lining of the trachea and larynx, form with this lining a kind of roof over the windpipe, with an adjustable slit, the glottis, lying between the cords. The thyroid cartilage is represented from different viewpoints in Figures 16 and 17, T.C., and in longitudinal section in Figure 15, 2. The cricoid is shown in Figures 16 and 17, C.C., and in Figure 15, 1, 22. The pivotal joint of the thyroid upon the cricoid is located at p in Figure 16. The vocal cords are designated in Figure 15, 4, and in Figures 16 and 17, v.c. Figure 17, G, shows the position of the glottis.

Surmounting the cricoid cartilage is a pair of small, triangular-based, pyramidal cartilages called the arytenoids (Figures 16 and 17, A.C.). The forward points of their bases, the vocal processes (Figures 16 and 17, vp), serve as points of attachment for the posterior ends of the vocal cords. Each vocal cord runs from its point of attachment inside the thyroid cartilage (Figures 16 and 17, a) to the vocal process of its arytenoid. The portion of the glottis lying between the cords is known as the ligamentous glottis (Figure 17, lg); while the shorter portion lying between the arytenoids is called the cartilaginous glottis (Figure 17, cg).

Figure 18 Larynoscopic Views of the Larynx with Different Positions of the Glottis(174)

A fair understanding of the functions of the larynx may be gained from three lines of inquiry: (1) How is this mechanism, which serves ordinarily for noiseless breathing, converted into a tone-producing instrument? (2) What determines the pitch and loudness of the tones produced? (3) How do the larynx and related structures cooperate in producing vowel sounds?

Laryngeal Tone Production. When the glottis is open, as in Figure 18, B, it is in position for quiet breathing. Vocal utterances in this position are merely whispers, lacking in that true 'voice quality' which depends on the formation of laryngeal tones. In order to produce true speech the cords must be brought close enough together to give periodic vibrations, like reeds, when the air is driven upward between them.

This effect is produced in the following way: The open glottis is triangular in shape, being wider in the cartilaginous than in the ligamentous portion (Figure 17). The articulation of the arytenoid upon the cricoid cartilage (Figure 17, x) is not a definite joint. It serves either as a pivot or as a gliding surface according to the action of the controlling muscles. If its action is pivotal, and if the vocal processes be rotated inward until they meet, the ligamentous glottis will be closed, and the cords brought close tog ether. This rotatihg mo ve ment is produced by the contraction of two sets of paired muscles, the thyro-arytenoid (Fig-

(175) -ures 16 and 17, thyr.a.m.) and the anterior crico-arytenoid (ant.c.a.m.). In combination with this movement the transverse arytenoid muscle (tr.a.m.) by contracting drags the two arytenoid cartilages bodily together, so that, by a kind of interlocking action, they close the cartilaginous glottis.[2] The glottis as a whole is thus closed or narrowed, and is ready for the production of tone (Figure 18, A).

In opening the glottis the muscles just mentioned relax, and a pull is exerted by their antagonists, the posterior crico-arytenoids (Figures 16 and 17, post.c.a.m.), which, by a lever-like action about x, rotate the vocal processes outward and separate the cords. Extreme outward rotation gives a wide, rhomboidal opening characteristic of labored breathing (Figure 18, C). Inward or outward rotation about x also produces a stretching and tightening effect upon the cords. By a vertical lover action about the same point the vocal processes and cords are also raised (anterior and posterior crico-arytenoids) or lowered (thyro-arytenoids). (See Figure 16.) The nature of the laryngeal tone is probably influenced by these changes. The intrinsic muscles of the larynx, being in a continual state of tonic contraction, constitute an equilibrium of forces. Slight alterations of nerve impulse disturb the balance and produce minutely graded changes in the condition of the cords and glottis, with resulting differentiations of sound.

Pitch and Intensity of Laryngeal Tones. Variation in the pitch of the voice is produced chiefly by the action of the crico-thyroid muscle (Figure 16, c.thyr.m.). By its contraction the thyroid cartilage is either rocked forward and downward, or else pulled bodily forward, on its gliding articulation, p. An are with p as center through a (Figure 16) describes the course taken by the point of attachment of the vocal cords (a-d) when the thyroid cartilage is tilted. The straight dotted line (Figure 16, d-vp) indicates the new position of the vocal cords, and shows that they are now stretched to a greater length and therefore rendered more tense. Increase in the tension of a vibrating string, of course, produces a rise in pitch.[3] The range in pitch of tones producible by the average human larynx is from two to two and one half octaves.

The tones of the larynx are enhanced in their strong, sonorous quality

(176) by a system of resonating cavities to whose confined air masses the vibrations are conducted. The ventricles of the larynx (Figure 15, 5), vestibule, pharynx, and nasal and oral cavities are all important resonators. A high tone requires a smaller resonator than a low one. The throat and mouth passages are therefore shortened in high tones by raising the entire larynx, a movement produced by muscles connecting the thyroid cartilage with the hyoid bone above it (Figure 15, 6). By the contraction of muscles running from the thyroid down to the sternum the larynx is lowered for low tones, and the resonating cavities correspondingly lengthened.[4]

Muscles surrounding the pharynx also cooperate in modifying resonance spaces. There are two special adjustments of the larynx, one for very high (head) tones, the other for deep (chest) tones. Head tones resonate in the head cavities, and are produced with the glottis slightly open, the cords tense and thin, and vibrating only in part. Chest tones resonate through the windpipe and chest cavity, and are made with the cords fairly lax but pressed together, and vibrant throughout their whole extent.

Intensity or loudness of voice depends on the strength of the blast from the lungs, which governs the amplitude of the cordal vibrations.

The Formation of Vowels. The vowels of speech are modified laryngeal tones of varying pitch and quality. The peculiar quality by which the various vowels are distinguished is produced by specific alterations of the size and shape of the resonance chambers. In this way a resonator is produced capable of emphasizing the particular overtones or accompanying tones characteristic of a given vowel.

In U (pronounced as oo in boot) the larynx is depressed, the soft palate highly raised, the front part of the tongue flattened and the back part elevated, and the resonance chamber further prolonged by protruding and rounding the lips. In A (as in father) the larynx is somewhat raised, the mouth open more widely, the soft palate less sharply elevated, the entire tongue depressed, and the lips normal. In O (as in go) and in A (all) the shape of the mouth cavity and lip positions are intermediate between those of U and broad A. E (as in eve) employs a high-pitched, closed. and shortened resonator. The lips are drawn back against the teeth, and the tongue raised and carried forward until it almost touches the hard palate, leaving a large pharyngeal space behind. The soft palate and larynx are considerably elevated. The vowels A (am), E (bet), and A (pay) form a

(177) graded series between broad A and long E, the tongue being brought successively forward and upward, the larynx raised, the mouth opening lessened, and the lips drawn back.

Articulate Speech. Consonants. Vowels, as we have seen, are modified laryngeal tones which contribute to language that sonorous and sustained quality called `voice. Within the cavity of the mouth are produced the characteristic noises which are blended and joined with the laryngeal tones in articulate speech. In themselves these noises are weak and unsustained, serving merely to initiate or terminate the vowel sounds. They are called consonants. The most important organ for the articulation of consonants with vowels is the tongue. It consists of a mass of muscle tissue capable of movement in any direction, or of modifying its own shape and surface. Muscles attached to the skull, hyoid bone, and lower jaw (Figure 15, 9) draw it respectively upward and backward, downward and backward, and downward and forward. Most consonants are produced by frictions or explosions of the air caused by bringing some part of the tongue into proximity or contact with the teeth, upper gums, or hard or soft palate.

Consonantal sounds are usually classified as fricative and explosive. The former result from the friction of the air in passing through a small opening, such as that made between the tip of the tongue and the teeth in pronouncing th. The latter are minute explosions caused by the air rushing in when two hermetically opposed surfaces are quickly separated. K, for example, is made by a sudden separation of the back of the tongue from the soft palate. Initial p is produced by the expired air forcing apart the closed lips. Terminal explosives (p in dip) are caused by the sudden clapping together of the lips or other surfaces. Fricative sounds may be prolonged for some time, while explosive sounds are momentary. Among fricatives we may further distinguish the open or aspirate (breathing) sound (h), the more closed and stridulous sh, f, or s, and the vibratory r.

The consonants are also classified according to the place where they are articulated. We thus have the labials, p, b, and w, produced by the lips; the dentals, t, d, r', s, and th, articulated as explosives or fricatives by the tongue against the upper teeth, gums, or forward hard palate; the labiodentals, f and v; the marginals, l and y, in which the tip of the tongue approximates the hard palate, and the air passes out over its sides; the palatals, ch, j, and sh, formed between the tongue and hard palate; and the gutturals, k, g, and ng, formed between the tongue and soft palate. M, n,

(178) and ng, though articulated and functioning as consonants, arc actually voiced elements of vowel character in which the mouth cavity is closed and the soft palate lowered, allowing the air to pass out through the nose. They are called nasals.

There is, finally, a distinction according to whether consonants are produced as obstructions of tone, and hence have a certain voice quality (called sonants), or are simply breathed or made by mouth opening and closing (called surds). For each place of articulation there is a pair, a sonant consonant with its corresponding surd. Thus we have b (sonant) and p (surd), g (sonant) and k (surd), etc. Table IV summarizes the above classifications.

Table IV. English Consonants [5]
Place of Articulation Oral Nasal
Explosive Fricative Continuous Tonal
Surd Sonant Surd Sonant Sonant
Lips p b . . w m
Lips and teeth . . . . f v . .
Tongue and teeth . . . . th (thin) th (thy) . .
Tongue and hard palate (forward) t d s z, r n
Tongue and hard palate (back) ch j sh zh, r . .
Tongue, hard palate and soft palate . . . . . . y, l . .
Tongue and soft palate k g . . . . ng
Various places h . . . . . . . .


Gesture Language in Infants. The earliest form of communication in infancy is not speech, nor indeed vocal expression of any sort, but gesture. The language of gesture develops from natural and serviceable movements originally of purely individualistic significance. The head-shaking gesture illustrates the genetic process. At the beginning the baby turns his head away so as to prevent undesired substances which touch his lips from entering

(179) his mouth. This is the stage of simple avoidance or withdrawing. By conditioned response the sight of the undesirable object later calls forth the same reaction, and the effect is now avoidance in advance, or refusal. The movement serves as a sign which is readily understood and reacted to by the person offering the rejected substance. Since the action serves thus to control the behavior of others in a manner useful to the individual it is fixated according to the principles of are fixation in learning. It is now used as a sign; in other words, it has become a gesture. The movement therefore has passed from a simple avoiding response of no social significance to a truly expressive one, valuable in the control of the social environment for the prepotent interests of the individual who uses it. We have observed the operation of this same principle in the social behavior of the lower animals. It is one of the fundamental laws upon which the acquisition of all habits of communication is based.

The final stage in the head-shaking reaction is that expressing dissent (refusal of acceptance) toward a purely declarative statement. The use of the gesture for mere negation in the indicative mode, however, scarcely develops until after the period of infancy. To the baby every negation is a refusal of some object or proposal: the only mode used is the imperative. Other gestures are used in infancy, such as holding out the hand toward a desired object, or tugging at the hands or clothing of an adult. They are all socially controlling stimuli of an imperative sort, established as satisfiers of prepotent demands within the social sphere.

Pre-linguistic or Laryngeal Stage of Vocal Expression. While the gesture repertory is expanding there is also progress in the strength and variety of tone produced by the larynx. The cry of the newborn child, evoked by hunger or organic distress, is weak, rhythmic, tremulous, and unvaried. The sounds most frequently heard are short a, as in at, and u, as in up, articulated with a few simple consonants apparently formed by random articulatory movements (for example, `nah,' `wuh,' 'ha,' etc.). Within the first month marked variations occur in tempo, loudness, pitch, and vowel quality of the vocal utterances —  all expressive of the development and differentiation of shades of feeling and emotion.

(180) In the crying of his son at three months of age the writer could discern at least five emotional varieties: the quick exhalations of fretting and annoyance, whining or entreating, the long drawn, detached sleepy cry, spasmodic inspirations of sobbing after hard crying, and the rasping and crescendo cry of anger. By the age of four months thwarting of efforts to feed evokes a quick and decisive anger cry. Laryngeal expression is acquired with far greater facility than the difficult movements of articulation. The baby consequently has mastered all the vowels in the language (and more) long before he can articulate them clearly with the various consonantal noises.

The laryngeal stage, or period of cries, in human expression is comparable to the vocal behavior of the lower animals. The primitive glottal reflexes and coordinations early acquired become part of the general emotional response of the individual. Among animals we have seen that these cries are responded to in appropriate ways by fellow creatures who learn their significance. Similarly in man, the early glottal sounds of infancy acquire significance for the parents as suggesting certain emotional states and needs of the child, and thus bring about the appropriate ministrations. These sounds therefore take on a social significance which is not innate either in parent or child but a product of experience in reactions between them.

As in the case of infra-human vocalization and human gesture we find here a transition from a purely individualistic emotional response to expressive behavior, that is, to behavior as a means of communication and social control. The anger cry, if found effective, quickly assumes the role of an infantile imperative. A spoiled child of three or four, if suddenly thwarted in his wishes, lapses into his earlier method of shouting and screaming, substituting for the unavailing word symbol the more primitive and vehement method of control by the larynx. The `stereotyped tantrum' is a pre-linguistic form of social control. There is an early development, as the baby grows, from mere crying, through whining, coaxing, scolding, and finally yelling; all of these stages appearing before true speech habits are acquired.

Somewhat later there evolves a distinctly different wail sugges-

(181) -tive of a hopeless and 'hurt' feeling. It is accompanied by a facial expression of intense grief. Though appearing significantly in cases where strong desires are blocked by a parent, it is also evoked by severe physical pains, such as a pinched finger or bruised head. Beginning as a mere emotional outburst, it rapidly assumes a value for social control as an appeal for sympathetic ministrations. An additional adaptive value is added in the good `hurt cry' which a small boy works up in order to have his bullying elder brother punished. Even when fully grown the ‘hurt feeling' retains for us the significance of a desire both for sympathy and for the punishment of the offender by having his injustices recognized by others.

Behavior of this sort serves the same purpose for the acquisition of objects or ends as the gesture of head-shaking serves in their rejection. The mode in all cases is imperative, and the effect is the securing of some infantile form of adaptation through the control of others. Laryngeal expression may be regarded as a kind of vocal gesture of infancy.

The Development of Language: Stage I. — Random Articulation with Fixation of Circular Responses. The marvelously intricate and versatile speech mechanism described earlier in this chapter is at birth, like other motor mechanisms, simply a crude possibility. Further growth of the nerves and muscles must combine with practice to produce a repertory of sounds adequate for language. With such development as a basis the social environment furnishes the stimuli necessary for the acquisition of perfected speech habits. The earliest used consonants, which, according to Miss Blanton, occur during the first month of life, are chiefly nasals and gutturals, such as m, n, ng, g, and k (also h, w, and y). These represent easy mouth positions adopted probably as random movements. They are articulated with various long and broad vowel sounds, and with some diphthongs (double vowels), as in "gow" (writer's son at two months).

( Overlapping with the period of laryngeal expression and finally succeeding it there appears the stage of random articulation, the babbling and cooing of the child during its second and third half years of life. In this period the early consonants are repeated with better control and supplemented by new ones. The dental and

(182) labial explosives, p, b, t, and d, are soon acquired. The fricatives s, f, v, and th are more intricate and come later. L, which requires inversion of the tip of the tongue, may require three years to perfect. R also is difficult as an initial sound. Examples of early mispronunciations are "whing" (sing), "yight' (light), etc. Double consonants cause much difficulty, the second consonant generally being slighted; for example, "p'ease" (please).

With random articulation we enter upon a new phase contrasting somewhat with that of pure laryngeal utterance. The latter is imperative in mode. It arises with some strong, unpleasant emotion due to thwarting or discomfort; and it rapidly assumes the function of social control. 'Baby talk' on the other hand is spontaneous and indicative of a pleasant mood. It is a form of play, a part of the diffuse outflow of energy, rather than an effort at the control of others. If stronger emotions enter the field, bringing in the functions of the sympathetic nervous system, the pleasant prattle at once gives place to the inarticulate cries of the earlier period.[6]

Too much attention has been paid to the acquisition of vocabularies, and too little to the study of the pre-verbal stage of random articulation in infants. This stage not only affords the material for language but gives the practice necessary for the control through the ear of the muscles of speech. The chief significance of the vocal play of babies seems to be in establishing circular reflexes between the sound of the syllable and the response of speaking it.[7] Let us suppose, for example, that the baby utters the syllable da. By so doing he stimulates himself through two channels. He receives certain kinaesthetic sensations from the movement of the vocal organs, and certain auditory sensations from the sound which he produced. It is with the auditory stimulation that we shall be chiefly concerned. Returning to the brain centers these afferent impulses are, or tend to be, redischarged through the same motor pathways as those used in speaking the syllable itself. There are two possible methods of explaining this. We may suppose that the synapses connecting the afferent impulse with the motor outlet

(183) of speaking da, having been recently used, are in a state of relatively lowered resistance, and are therefore readily put into operation again. Or we may infer that, in some cases at least, the return stimulations are received while the speaking response is still going on (as in a prolonged vowel sound), and the motor synaptic resistances for da are completely overcome because discharge through those synapses is actually taking place. We have here the exact situation for the formation of a conditioned response. The response da becomes circularly conditioned by the sound da; and this sound when later heard will tend of itself to evoke the response of speaking it. This latter explanation is probably the true one.[8] While the babe is practicing the syllabic elements of his future vocabulary he is therefore also fixating ear-vocal reflexes through which a spoken sound may directly evoke its enunciation. Articulation has now advanced to a stage where it is capable of being controlled through the auditory receptor. The process just described is illustrated diagrammatically in Figure 19, A.

Stage 2. —  Evoking of the Articulate Elements by the Speech of Others (so-called `Imitation'). At this point the social influence enters the process of language development. If the ear-vocal reflexes have been sufficiently established for the sound of a word to call forth the response of articulating it, it is no longer necessary that the child himself should speak the stimulating word. It may be spoken by another. The effect will then be that of the child repeating the sounds which he hears others utter. This stage is suggested in Figure 19, B. It is, of course, assumed that only such speech responses as have been acquired through growth and practice will be evoked in this manner. The child does not imitate or duplicate the speech of his elders. There is evoked simply the nearest similar ear-vocal reflex which, with his present limitations of pronouncing, he has been able to fixate. The word "doll,' spoken by

(184) the parent, would probably be repeated da (a as in father). In this manner whole phrases far beyond the learner's comprehension may be reiterated rote fashion with as fair accuracy as the speech habits already acquired permit. It is essentially a parrot stage. In

Figure 19 The Development of Language Habits in the Infant

popular parlance it is known as 'learning by imitation.' The term 'imitation' is however both inexact and misleading, for it suggests that the process is one of learning the speech reactions of others by voluntarily copying them; whereas it is really the touching off of

(185) previously acquired speech habits by their conditioning auditory stimuli.[9]

Discussion of the Theory involved in Stages 1 and 2. The reader should bear in mind that the process thus far described is largely hypothetical. Precise physiological data are wanting; but in their absence we may review certain lines of evidence in support of the hypothesis.

(1) If vocal responses are circularly fixated, with the sound of speaking them serving as stimulus, we should expect that reiteration of the same syllable over and over would be a necessary result. The baby would learn to mimic himself as a prerequisite for repeating sounds made by others. The facts support this supposition. Reduplication of syllables (da-da-da, etc.) in a tireless manner is a common phenomenon of baby talk. Later many objects are named with doubling of syllables (for example, wah-wah for water), and longer phrases are reiterated as a kind of play.

(2) Only sounds which have been already pronounced in random articulation can be evoked by the speech sounds of others. That is, only those sounds can be evoked which have had a chance to become circularly fixated as ear-vocal reflexes. The spoken word "pencil" was repeated by the writer's son as punka (c and l sounds not yet acquired). The phrase "What is that?" involving difficult consonants, was reproduced as uh i a. The words "down," "doll," and "clock," when spoken to him, were all repeated as da. Ba, similarly, was his reproduction of "box," "bath," "bottle," "block," and "bye."[10]

(3) There exist in the central nervous system mechanisms adequate for the circular fixation of vocal habits. Leaving out of account the cortex, relatively undeveloped in infancy, there are adequate connections between the auditory nuclei of the brain stem and motor fibers controlling the organs of speech. Neither high intelligence nor conscious imitation are necessary for the use of this apparatus. The cal-weal connection is direct and im-

(186) -mediate. The evidence for this is at hand in cases of echolalia in idiots and aphasic patients.[11] These 'human parrots' accurately reiterate whole phrases spoken in their hearing without the slightest comprehension of their meaning. We are probably dealing here with sub-cortical mechanisms representing early formed and circularly fixed responses comparable to those of the baby.

(4) It is well known that congenital or early deafness is usually accompanied by mutism. Deaf-mutes are able to articulate in the manner of the random infantile period (baby talk); but they cannot, without special methods, learn the use of spoken language. Since the ear-vocal reflexes were not and cannot be acquired, some other form, such as eye-vocal reflexes, must be substituted if the knack of speaking words is to be imparted to them. The lack of the usual, early formed, circular vocal reactions is responsible for their mutism.

Without pursuing this question further we may tentatively accept the foregoing explanation of the so-called 'imitative stage' of language development.[12] Word habits have been formed which are capable of being put into effect by the sound of the same words spoken within hearing. The next step is to convert these parrotlike reactions into true language. This step like the preceding is achieved through social agencies.

Stage 3. —  Conditioning of the Articulate Elements (evoked by others) by Objects and Situations. As soon as the stage is reached in which the parent can evoke repetitions of words from the infant at will, the process of teaching him to name objects begins. It does not suffice to say "doll" and hear the child repeat da. The doll

(187) itself is held up for inspection while the learner repeats the word pronounced by the parent or nurse. A conditioned response is thus formed; the afferent visual impulse front the doll discharges its energy through the motor pathways of the speech pattern of pronouncing the word. The object itself thus becomes a stimulus adequate for evoking the response of speaking its name. Figure 19, C and D, illustrate schematically this conditioning process. Stages two and three are practically synchronous in the actual development of the child. We have separated them in the description only for the sake of clearness.

Progress from this point is rapid. A child may learn in this manner to speak the approximate names of several hundred objects while he is still laboring over the exact pronunciation of difficult consonants. The naming, or vocabulary-acquiring, process begins early in the second year and increases by ever-lengthening strides up to six years, at which age the average child has a vocabulary of about three thousand words.

Our explanation thus far has involved only the control of the speech reactions of the child by the adult. Social control, however, soon operates in the reverse direction. The child learns to use his naming habits as demanding habits. Suppose he sees a new and interesting doll out of reach on a shelf. Manipulative tendencies cause him to reach for it. Failing in this, the usual law of trial and error brings into play all possible movements. One of the readiest and easiest of these movements is the pronunciation of the word "doll "-a reaction which is moreover elicited by its recent association (conditioning) with the sight of an object of that general sort. The word is therefore spoken, and the pleased parent presents the doll as a reward. The manipulative drive now proceeds unhampered, and the arcs involved in this solution of the problem are fixated for future use. By simple vocal expression the child thus learns to control others. He increases vicariously his own stature, his power, and his sagacity by enlisting these attributes of adults in the service of his needs. Little wonder that his linguistic progress is rapid!

The naming reaction can be conditioned not only by the sight of an object but by other stimuli inherent in the general situation.

(188) The word "doll" may have been evoked at a time when the child was handling the toy, `talking' to it, or even running to get it. The proprioceptive stimulations arising from these acts therefore become adequate conditioning stimuli for producing the response of speaking the word. . In all relations in which the doll itself was formerly experienced the word "doll" may now be called up in consciousness and evoked as an audible or a 'thought' response. At any future time therefore when the child may recall or have the tendency to manipulate such an object through habit, he will be likely to say "doll." The attendant again produces the object; and the arcs involved in this solution are fixated as before. The learner has now reached the advanced stage of demanding objects desired but not seen. Verbs, adverbs, and particles, such as "give," "down," "again," "move," and "no," are acquired and used in the same fashion. Having been learned through social agents in connection with attitudes, postures, and situations, they are now used to control these agents with respect to the situations they represent.

In the learning of language then, as in the stages of laryngeal and gestural expression, we find that social control is a cogent factor. With increasing development, however, other considerations enter. In addition to naming and demanding objects the child begins to talk about them. He discourses to his toys and about them. He verbally reviews bits of the day's experience as he lies in his crib in the evening, and in so doing substitutes word responses for the overt movements he originally employed in living them. In other words, language becomes for him a vehicle of thought.

Development of Response to Language. A few words may be added concerning the understanding of language by the infant, a function which precedes its actual use by some weeks or months. Speech sounds of others stimulate the child in many ways beside the eliciting of ear-vocal reflexes. They control his behavior in consoling him, diverting his attention, and offering signs by which he knows that he is to be tended in various ways. Language serves to condition the prepotent activities of the baby in the same way that the incidental growls or sex sounds condition the Withdrawing or approaching responses of the lower animals. Experiments show

(189) that dogs respond very little to words as articulated symbols, but chiefly to the pitch, intensity, and quality of the voice. The earliest effect of vocal stimuli upon the baby is through these same laryngeal components. An infant will cry at a scolding tone of the parent long before the words themselves are understood. By the end of the first year the response to commands, or to the direction of attention, that is, to some part of the child's body, indicate that he is beginning to understand the meaning of articulate word symbols.[13]

The final achievement of linguistic development is the response to language by the use of language, as in answering a question. This occurs late, usually after a fair mastery of speech has been obtained. Aside from the intellectual difficulty involved, there appears to be a kind of inertia: the child is loath to quit the placid, irresponsible haven of ear-vocal reflexes for the uncharted sea of interrogation.


Infantile and Primitive Language. Although the old notion that the child in his development recapitulates the stages in the evolution of mankind is becoming obsolete, we may still profitably bear in mind some of the facts of child language in seeking to understand its development in the race. In certain aspects the same conditions and explanatory principles apply to each. (1) Neither the child nor aboriginal man was innately endowed with speech. The drives, moreover, and laws of learning by which it had to be acquired are the same for both. (2) Pre-linguistic man, as well as the modern baby, probably possessed as material for language development a set of random laryngeal and articulate utterances. The main differences between the two situations are (1) that the child learns the speech reactions already established in the vocabulary of his elders, while. the primitive man had to evolve a language of his own, and (2) that in the former case the language is mastered in

(190) the first few years of an individual's life, whereas in the latter whole tribes were busy for many generations contributing word symbols, and modifying and transmitting linguistic art, before an adequate language was achieved. These statements will be developed in the sections following.

Theories of the Origin of Language. Gesture. Speculation on the roots of language has yielded a considerable crop of theories. Three of the most significant are the gesture theory, the interjectional theory, and the onomatopoetic theory. The gesture theory of Wundt traces the origin of language to gesticulation. Stress is laid upon the precedence of gesture to speech in the infant, and upon the fact that gesture is an effective and spontaneously adopted means of communication among both primitive and civilized when the speakers have no language in common. Anthropologists report detailed, narrative conversations carried on by pantomime between Indians of distant tribes.[14] The American soldier in France had little difficulty in making his wants known through gesture and grimace. Deaf-mutes and aphasic patients are very skilled in this form of communication. Even idiots can be taught to obey commands given in gesture which would be meaningless in verbal form.

Gestures are of three kinds, emotional, demonstrative, and graphic. Bodily movements form a natural part of the primitive emotional reactions. The fist is clenched in anger. The hand is waved to one side and the foot stamped in impatience. Pointing the finger with the hand clenched and palm to the inside is a gesture of threat or accusation. Pointing with the palm down is merely for directing attention. Some of these gestures seem as immediate and innate as facial expressions. Their significance as stimuli is even greater: they are rarely misunderstood. Most emotional gestures ' belong to the 'halfway stage' of communication; that is, they are of more significance for the one who sees them than for the one who makes them. They afford the former a clue for adapting himself to the mental condition of the latter. Among the lower animals, who possess no language proper, they become important means

(191) for controlling other creatures. Demonstrative gestures consist merely of pointing to the objects one desires to call to attention, allowing the situation to make clear any control which the `pointer' desires to exercise in regard to them. The vocabulary in this case is, of course, limited to the range of objects within sight. This defect is remedied by the use of descriptive or graphic gestures. One may, for example, represent a house by movements of the hands suggestive of a sloping roof and walls; or he may denote a certain person or animal by mimicking his essential characteristics. Action may be described in a similar manner. A wide range of conversation is possible by the use of graphic gesticulation.[15]

Graphic Gesture in Relation to Infantile and Primitive Language. In several ways graphic gesture resembles the language of the infant and of primitive man. First, it does not lend itself to abstractions. Since all the movements are descriptive of specific things none of them qualifies as a conveyer of abstract meaning. The phrase "all men are mortal" would be difficult to render either in gesture language, or in infant or primitive speech. A word such as "make" can be expressed only by movements suggesting the making of some particular object. Concrete familiar terms are used in lieu of class concepts for new generalizations. Thus a savage, at first sight of a slate pencil, called it a "stone scratch something." The writer's small son, on being initiated into the delights of whipped cream, shouted "more piece o' milk!" So particularistic are primitive languages that some of them have no general pronouns indicating the person in all his relations. Separate words must be used to denote "he sitting," "he running," "he absent," and the like. The descriptive resemblance to graphic gesture is thus clearly shown.

Secondly, the concreteness of these early languages is shown in the flow or succession of images employed. The significance of Santa Claus would be _explained by the two-year-old by such impressions as, "Santa drop down chimney —  snow on Santa Claus —  Santa put toys in stocking —  Santa go away —  come again next

(192) Christmas." In a similar manner the Indian might relate the coming of the white man. Everything is impression; interpretation, feelings, motives, cause and effect must all be supplied from the context. The order of words in gestures indicates the same impressionistic treatment. The sentence "The angry teacher strikes the child" would be rendered "Teacher, angry —  child, strikes." Infantile word order is somewhat similar. The elemental languages thus resemble gestures in lying closer to the level of immediate sensory experience than do the abstract expressions of civilized adults.

The third resemblance is in defect of syntax. In gesture, and in much of infant language, the tense must be inferred from the situation. So also with mood: a look of interrogation converts the indicative gesture into a question. A determined or angry countenance gives the mimetic gesture the force of a command. Similarly with infant speech, the single word "doll" may denote tenderness for the object, meditation about it, or an angry desire to have it from the shelf, according to the tone with which it is pronounced and the accompanying gestures. Single words used by children to convey whole commands or other meanings are called 'sentence words.' There is an analogy in primitive speech, though not an exact analogy, in the holophrase, a single word denoting a complete action or situation. Thus in Aztecan the word onictemacac means "I have given something to somebody." In dispensing with parts of speech, and in presenting a total situation in one symbol, the holophrase might be called a 'word gesture.

To summarize: we have seen that gesture exists in both the infant and the aboriginal adult as an elementary means of communication, and that genetically in both child and race vocal language is peculiarly gestural in its structure. Wundt's theory is further supported by the fact that many primitive tribes combine grimace and gesticulation as an integral part of spoken discourse. It is said that in some cases tribesmen can hardly converse with one another in the dark. Although the gesture theory is thus supported by ethnological and genetic observations, it must, however, be remembered that gestures are visual stimuli, while words are auditory. The similarities between gesture and early language bespeak

(193) the primitive state of the sign-making function underlying both; but they do not explain the transition from manual signification to vocal. Language possesses enormous advantages over gestural expression, advantages which made it certain that in the course of evolution it would replace the latter as an entirely new variety of communication.

The Interjectional and Onomatopoetic Theories. The interjectional theory bases the origin of language upon primitive ejaculations of an emotional sort, which were probably common among aboriginal men as among animals. In so far as emotional growls and cries are products of the larynx there is probably a sound basis for this theory. Tonal differentiations play a large part in primitive tongues, in many cases changes of pitch and quality giving a modified or entirely different meaning to a word whose form otherwise remains the same. Civilized languages also show many traces of an interjectional stage. Changes of intonation are used in fairy stories; the voices of the father, mother, and baby bears, for example, being portrayed by a kind of `vocal gesture. "Yes," spoken with a rising inflection, and in widely different languages, asks a question; with a falling tone it denotes certainty. "Ah!" may be so intoned as to convey a feeling of pain, pleasure, surprise, admiration, or reproof. The limitation of the interjectional theory is that it can carry us no further than the laryngeal stage. It offers no foundation for articulate speech.

The onomatopoetic theory ascribes aboriginal speech to the imitation by man of natural sounds, such as the roar of the wind or the cries of animals. Attention is called to the rich variety of onomatopoetic words in the vocabularies of primitive and infantile language. Many of these are probably, however, of recent origin. At best this theory merely states a source for some linguistic elements; it does not explain how the elements were acquired. In answer to this last question the following wholly tentative theory is advanced.

A Social Behavior Account of the Origin of Language. There is a fair agreement among philologists that a laryngeal, or glottal, period comparable to the cries of animals and babies existed for a long time in the history of man before the rise of articulate speech.

(194) We have indicated in Chapter VII how the prepotent behavior of animals is conditioned by the cries of anger, fear, and sex desire of their fellows, and how the makers of these sounds thus learn to use them for controlling the responses of other animals. It is fully conceivable that a similar development occurred in the human race, and that self-adaptation and control of others by inarticulate laryngeal sounds evolved as the earliest language of mankind. Social control, heretofore neglected by philologists, must therefore be recognized as a potent factor in the origin of language.

The real problem, however, arises in explaining the transition from this narrow emotional repertory of the glottis to the enormous array of articulated consonantal and vowel groupings which constitute the most primitive of tongues. Only through the use of words can language achieve its true role as a symbolization of objects. We may begin with the very probable assumption that aboriginal man, like the infant, was possessed of some sort of articulating mechanism, and that he was capable of producing random articulated syllables. The first two stages of our theory of infant language would then apply. That is, he would fixate in himself certain circular reflexes of the ear-vocal sort, and these responses would therefore be capable of being evoked by hearing the same sounds spoken by another. Evidence for the existence of the circular process is seen in the extensive reduplication of syllables which is even more characteristic of words in primitive than in infantile vocabularies.

With the third stage, however, we seem to reach an impasse. The conditioning of syllable responses by the sight of objects, as described in the naming habits of the child, presupposes a social agent who knows the names of objects and can teach them in this way to the learner. There is of course no one who knows the language in question prior to its coming into existence. Our answer to the dilemma is that the first word response occurred and was fixated by a chance articulation : spoken by an individual in association with some object or situation, and in the sight and hearing of another individual. The ear-vocal reflex of the spoken syllable would be then conditioned in the speaker by the sight of the object; and, what is equally important, it would be evoked in another

(195) individual and similarly conditioned. Here then we have the basis for the use of the same word-sign or name by two or more persons, the essence, in other words, of language itself. Success in communicating and controlling one's fellows with reference to the object would serve to fixate this conditioned ear-vocal reflex as a permanent habit. With the advancement of human intelligence mankind probably learned to profit by this accidental discovery and, grasping the significance of the principle involved, began to apply it, at first unconsciously and then more or less deliberately, in the coining and adoption of new word-signs. Like roast pigs in Lamb's Essay words were eventually found possible of achievement by design as well as by accident. Here no doubt entered the influence of appropriateness in the social fixation of object-names. Onomatopes were naturally chosen in great number because they seemed to fit their objects so well; and in a sense the objects denoted taught man their own names through the noises they produced. If the foregoing theory is correct, social stimulation and response lie at the very root of language, and deserve far more attention than they have received in philological discussions.[16]

Written Language. By the aid of writing social stimulation is extended through vast reaches of time and space. Published orders from the army Chief-of-staff may direct the movements of soldiers in the opposite hemisphere. Mosaic law still has its potency in controlling human thought and action. Though subject to modification of meaning through interpolation and through loss of the effects of the intonation and personality of the writer, every bit of language read represents an influence exerted upon one individual by the linguistic mechanisms of another.

In both the child and the race written language is acquired long after the spoken form, which it merely symbolizes. Picture-writing was the earliest form of chirography. Writing, like gesture and speech, was originally graphic in character. The pictures were

(196) gradually reduced to bare conventional symbols of the objects they formerly depicted (an example is the modern Chinese alphabet). Inasmuch as articulated auditory stimuli had proved more versatile than the visual language of gesture, written language advanced by providing visual symbols for the articulated sounds. The first stage of phonetic writing was the rebus. An impression given visually was interpreted by the reader in an auditory fashion, and a different meaning assigned. Thus the word "male " might be represented by a picture of a coat of mail. In this way abstract words, impossible to picturize directly, could be visually represented. The final stage of writing, and one of the greatest social achievements of all time, was the invention of the Phoenician alphabet, in which each language sound has its arbitrary symbol.


Language, the major form of social stimulation, has evolved through that very type of situation in which it exercises its present social function, namely, the stimulation of one person by the vocal reactions of another. Let us briefly sum up the process. It began both in the infant child and the infant man by gestures. These comprised natural emotional expressions, pointing, and descriptive movements. Vocal expression meanwhile came into play. Alike in animals, children, and the race it first took the form of variable cries produced chiefly by the vocal cords. By learning the significance of gestures and tonal interjections animals, children, and men learned to adapt themselves —  that is, to behave appropriately —  toward their fellows. On the other hand in using these stimuli they soon learned that the behavior of others could be controlled for their own interests. Incidental vocal acts were then deliberately performed as coercive signs.

A more refined form of control through the use of sounds as symbols was the next step. A set of word symbols far more elaborate than the range of laryngeal and gestural possibilities was needed. Consonants articulated by the tongue and other mouth parts with the glottal vowel sounds fulfilled this need. Beginning with random articulation, according to the theory advanced, control of the elements by the ear was gained through circular self-

(197) stimulation. The speech elements were then evoked by others and attached as conditioned responses to afferent impulses from objects and situations. In the child these conditioned word habits are the legacy of social inheritance. In the history of language they probably arose fortuitously by chance association, and were developed by human invention. Brought into the service of the prepotent needs, the use of words rapidly extended from mere nomenclature to demanding and controlling others with respect to objects and situations denoted.

Every human advance, whether it be by learning, problem-solving, or invention, must be based ultimately upon some prepotent need (see Chapter III). The control of others in the service of such needs is clearly the drive behind the original acquisition of language. This, rather than the desire to communicate, ‘instinct to express,' or other alleged social instinct, has been the guiding principle. To this drive a somewhat later but important allied drive was added, namely, the effort to control the non-social environment. Primitive man did not, of course, speak words to animals, trees, and metals; but he spoke words to himself about them. He used implicitly pronounced language symbols in representing their properties and relations, and in predicting certain things concerning them. In other words, language helped him, as it helps the infant, to learn to think, and to develop a practical and a scientific culture. This use of language, however, was a later development. In the very origin of speech the leading drive was probably the immediate incentive of social control.

Turning from the social foundations of vocal expression to its current value as a social stimulus, we enter upon a field covering the major segment of human life. Making and responding to language stimuli, oral and written, has become deeply rooted in our most vital interests. We can scarcely conceive what human culture, or even human nature itself, would be without this function. The institutions upon which the social order rests are really systems of traditional and recorded language. Education is the socialization and training of the individual through language symbols. The edicts of government and public opinion, in rumor or print, direct his thought and conduct through the same medium.

(198) These forms of control are 'institutionalized'; through them, by means of language, each individual is trained and controlled for the good of all.

In the more personal relations language retains, in a subtle form, its pristine function of control. In conversations we strive to impress upon others our experiences, attitudes, and feelings. In letters we do the same, and also politely request our correspondents to perform services for which we "feel ourselves deeply indebted." The novelist and dramatist control the flow of emotion and imagery in their auditors to suit their own purposes. Even the professor in delivering a scientific lecture controls the thought processes of his students; for communication of ideas is a form of social control.

In most of these instances, however, the community of interest and thought and the pleasure of contact and discussion are so absorbing that the control factor is obscured. Modern man has; become socialized both in the character of his demands upon others and in his willingness to meet reasonable demands made upon him. Give and take has become a pleasure in his social life. Hence actual control through language stimuli may be readily brought about without his being aware that he is either employing or submitting to it. Language, therefore, is no longer regarded as a coercion, but as a form of intercourse through which human nature finds its fullest expression.


Meyer (Von), G. H., The Organs of Speech.

Foster, M. A., A Textbook of Physiology (6th edition, revised), part iv, book In, ch. 7 (sees. 1 and 2).

Scripture, E. W., The Elements of Experimental Phonetics, chs. 17-19, 28, 29.

Webster's New International Dictionary (1911 edition, unabridged), pp. xxxviii-xlvii. (Webster's Collegiate Dictionary, pp. vi-xxvi; Iii-liv.)

Sapir, Edward; Language: an Introduction to the Study of Speech.

Watson, J. B., Psychology from the Standpoint of a Behaviorist, ch. 9.

Meyer, Max F., The Psychology of the Other One, ch. 14.

Brown, H. C., "Language and the Associative Reflex," Journal of Philosophy, Psychology, and Scientific Methods, 1916, XIII, 645-49.

Edman. I., Human Traits, ch. 10. Judd, C. H., Psychology, ch. 10.

Chamberlain, A. F., The Child: a Study in the Evolution of Man, ch. 8.


Tanner, A. E., The Child, ch. 16.

Buckman, S. S., "The Speech of Children," Nineteenth Century, 1897, XLI, 793807.

Romanes, G. J., Mental Evolution in Man, chs. 5-9, 12, 13, 17.

Wundt, W., Elements of Folk Psychology (Translated by Schaub), ch. 1 (sees. 5 and 6).

Marett, R. R., Anthropology, ch. 5.

Weiss, A. P., "Conscious Behavior," Journal of Philosophy, Psychology, and Scientific Methods, 1918, xv, 631-41


  1. There is difference of opinion regarding the function of this muscle. Some authors maintain that it draws the arytenoids as a whole toward the thyroid, thereby slackening the vocal cords. The contraction of its internal portion is also supposed to thicken the cord itself.
  2. The combined action of these' adductor' muscles is much like that of a sphincter.
  3. Tension being constant, pitch varies inversely with the length of vibrating bodies. For this reason (that is, because of greater diameter of the larynx) the voices of adult males are pitched lower than those of females.
  4. These movements may be detected by placing the finger on the "Adam's apple" while singing the scale or pronouncing vowels of different pitches.
  5. Adapted, by permission, from Webster's Collegiate Dictionary. (Copyright, 1898, by G. & C. Merriam Company, Springfield, Massachusetts.)
  6. Cf. the theory of antagonistic emotions explained in Chapter IV.
  7. The general mechanism of circular reflexes was described on p. 39.
  8. Direct proof of this process is, of course, difficult to obtain. There are, however, a number of exact analogies, of which the micturition reflex is perhaps the most convincing. When the bladder is partially filled, the mere sound of running water is a sufficient stimulus to cause either an increase in bladder tonus (desire to urinate), or the act of voiding itself. Here, as in the vocal reflexes, the sound produced by the act performed in the past by the individual himself has acquired, by conditioning, a stimulus value for evoking the act itself. Feeling impelled to cough when we hear others cough is a similar example.
  9. Although the theory thus far discussed was developed independently by the writer, he does not claim to be the first one to have advanced it. A concise statement oŁ the principles involved may be found in Smith and Guthrie: General Psychology in Terms of Behavior, p. 132.
  10. Cf. E. L. Thorndike: Educational Psychology, Briefer Course, p. 43.
  11. L. S. Hollingworth: "Echolalia in Idiots," Journal of Educational Psychology, 1917, VIII, 212-19.
  12. A rival theory asserts that every Vocal response pattern is connected innately with the sensory pattern produced by the sound of the word in question. Special instinctive mechanisms of imitation supply the ear-vocal connections which we have assumed to be developed through a conditioned circular response within the experience of the individual. 'rhe four points in the discussion above might all be construed to fit this theory. (Cf. Professor Hollingworth's article, loc. cit., in regard to echolalia.) It would be necessary, however, to meet the criticisms in regard to maturation theories and inheritance of "perceptual dispositions" raised in Chapter III. The maturation theory is particularly awkward, compared with the circular reflex theory, in connection with the second point of the discussion. Instinctive imitation is at best a speculative hypothesis, while cases of circularly fixated, ear-motor reflexes are clearly established (see footnote to p. 183).
  13. Romanes states the case as follows: "While the understanding of certain tones of the human voice extends at least through the entire vertebrated series, and occurs in infants only a few weeks old; the understanding of words without the assistance of tones appears to occur only in a few of the higher mammalia, and first dawns in the growing child during the second year." (Mental Evolution in Man, p. 124.)
  14. For a good account of gesture language see Romanes: Mental Evolution in Man, ch. 6.
  15. The natural readiness of graphic gesture in daily life is notorious: instance the riotous mimicry of children, or the old `sell' of the practical joker who asks his fellow man to tell him what an accordion is, and then pokes fun at the naive descriptive gestures made by the hands of the victim.
  16. The theory, though speculative, is not without empirical support. There are many authentic cases of originating word-names, and even languages, among groups of very young children. A pair of identical twins, who through similarities of structure and habit and through constant and affectionate comradeship were predisposed toward identical ear-vocal reflexes, evolved between them a fairly complete language understood only by themselves. For details of this interesting case see Romanes: Mental Evolution in Man, pp. 138-44.

Valid HTML 4.01 Strict Valid CSS2