Connected speech


If anything in the first part of this guide is unfamiliar to you, you should probably take a little time to refresh your memory concerning the essential concepts in phonology.  You can open that guide in a new tab by clicking here.  You should also have worked through the guide to consonants and the guide to vowels (both new tabs) before tackling this.
It is also assumed, in what follows, that you can read and write phonemic transcription.


Isolated or in a stream?

Connected Speech phenomena occur where words meet.  The first distinction to get clear is that of the pronunciation of a word in isolation and in a stream of speech.  For example, if you read the words on this list aloud, one at a time, you will probably be pronouncing them in what is called their 'canonical', 'citation' or 'isolation' form.  Here's the list to try.  If you can, transcribe the words on a piece of paper as you pronounce them.  Click here when you have done that.

are been
have that
from and
ten bottles

Now memorise this sentence and then say it aloud at normal speed, contracting any words you can.

I have been to town and here are the ten bottles of beer I said that I would get from the shop.

That probably would have been pronounced something like this:

/aɪv bɪn taʊn ənd hɪər ə ðə tem ˈbɒt.l̩z əv bɪər ˈaɪ ˈseðət aɪd ˈɡet frəm ðə ʃɒp/

Look at the parts in black in that transcription and compare them to the transcription of the isolated forms of the words.  What do you notice?  Click here when you have an answer.


The features of connected speech

There are six main areas to understand.

  1. Weak forms and stress patterns
    We saw examples of these in the sentence above.
    Three notes:
    1. The most common weak forms use the schwa (/ə/) so, for example:
      is pronounced /fə/
      is pronounced /ə/
      is pronounced /tə/
      is pronounced /bət/ (before a vowel), or /bə/ in other environments
      and so on.
    2. There are other weakenings, such as the replacement of the /iː/ in been with the shortened /ɪ/ sound.  The word our in its full form is pronounced /ˈaʊə/ in isolation but is usually weakened to /ə/ or /ɑː/ in connected speech.
      Most of these weak forms affect structural words rather than meaning-carrying words but the reduction of the sound at the end of father with the elision of the /r/ before a non-vocalic sound (in British English) is also an example of weakening and another feature of connected speech (elision).
      For a list of the commonest weak forms in English, click here.
    3. One reason for the frequency with which weakened forms occur in English is to do with its timing.  It is inaccurate to describe any language as being wholly syllable or stress timed but English certainly falls at the stress-timed end of the continuum.
      The following is the theory.
      In some languages, such as French, Italian, Spanish, Cantonese and Mandarin, every syllable is perceived as taking up the same amount of time.  This is the so-called 'machine gun' sound of these languages.  So we get:
          I ... went ... to ... Lon ... don ... with ... my ... bro ... ther

      That's syllable timing.
      In other languages, notably English, Dutch, Farsi and Scandinavian languages, some syllables take longer to utter than others and this results in a reduction of the syllables in between.  So we get
          Iwentto ... L o n d'n ... withmy ... b r o the(r)
      That's stress timing.
      There is, in fact, a third form of timing: Mora timing.  In Japanese, e.g., a vowel (V) takes the same time to utter as a consonant (C) plus a vowel so V takes the same time as CV and CVV takes twice as long as CV.
      Here's a list but there is a cline from stress- to syllable-timed languages.  It is not an either-or distinction.
      ARABIC (with variations)
      CHINESE LANGUAGES (also tonal)
      THAI (also tonal)
      VIETNAMESE (also tonal)
    4. Special or contrastive stress
      Occasionally, stress may be moved on an utterance for contrastive reasons as in, for example:
          A: Did she say she came from London?
          B: No, she said she came TO London.

      in which the stress in B's contribution is unusual because the speaker is placing emphasis on the preposition and it is pronounced forcefully in its full form to make that clear.  In normal circumstances the word is pronounced as /tə/ in connected speech but here it is /tuː/.
  2. Assimilation
    This occurs when a sound is altered because the speaker is anticipating the following sound or influenced by a previous one (or both).
    There are three possibilities and some important observations:
    1. Anticipatory assimilation
      In our example, above, ten bottles sounds like tem bottles because the speaker is anticipating the bilabial voiced consonant /b/ and changes the alveolar nasal /n/ to the bilabial /m/ to make pronouncing the following /b/ sound easier.
      Try saying
          his son and his daughter
      It is pronounced like this:
          /hɪz sʌn ndɪs ˈdɔː.tə/
      The 's' in his daughter is not voiced as it is in his son.  (We drop the 'h' on the second his as well (that's also elision).)
      (Anticipatory assimilation is sometimes, slightly confusingly, called regressive assimilation because the influence works from right to left in the phrase.)
      Anticipatory assimilation, by the way, explains the tendency in English to mess with prefixes, using 'im-' before words beginning with bilabials (so we have impossible, impolite, immobile etc. rather than *inpossible, *inpolite, *inmobile).  On the other hand, words beginning in alveolar sounds such as /t/ or /d/ or velar sounds such as /k/ and /ɡ/ will normally take either un- or in- (so we have intolerant, undefined, unconnected, ungrateful etc.).  This is not an absolute rule, unfortunately, because exceptions such as unmoved, unpleasant etc. are common.
      Anticipatory assimilation also has comedic possibilities:
          I've got a job at a bowling alley.
          Ten pin?
          No, permanent.
      If you have understood the joke, you have appreciated anticipatory assimilation.
    2. Progressive assimilation
      Sounds may change because the speaker is influenced by the preceding sound, i.e., the influence is working from left to right in the phrase.  For example, try saying
          There's not much cider left
      quickly and focus on how the 'c' in cider is pronounced.
      If you say cider individually, the 'c' is pronounced /s/ as one expects (/ˈsaɪ.də/).
      However, in this environment, the influence of the /tʃ/ at the end of much means that the 'c' in cider is pronounced as if it were 'sh', as /ʃ/.  The transcription is, then, not

      A simple example of progressive assimilation occurs with the pronunciation of a plural 's' in English.  For example, words ending in unvoiced consonants such as /t/, /k/ or /p/ will make the plural 's' pronounced as /s/:
          hats and coats (/hæts.ənd.kəʊts/)
          talks and walks (/tɔːks.ənd.wɔːks/)
          tops and tips (/tɒps.ənd.ˈtɪps/)
      but words ending with voiced consonants such as /d/, /ɡ/ or /b/ will have the 's' pronounced as /z/:
          odds and sods (/ɒdz.ənd.sɒdz/)
          lugs and mugs (/lʌɡz.ənd.mʌɡz/)
          bags and logs (/bæɡz.ənd.lɒɡz/)
      It's even easier to spot the difference in
          cats and dogs (/kæts.ənd.dɒɡz/)

      A similar pattern may be observed with the pronunciation of the regular past-tense ending in English.
      After unvoiced consonants, the -d or -ed ending is usually pronounced as /t/ as in:
          asked (/ɑːskt/)
          spaced (/speɪst/
          tapped (/tæpt/)
      but following voiced consonants it is voiced as /d/ as in:
          clubbed (/klʌbd/)
          fazed (/feɪzd/)
          dragged (/dræɡd/)
    3. Reciprocal assimilation
      Here, sounds influence each other and may fuse together.  For example, try saying
          Won't you come with us
      quickly and note how won't you is pronounced.  It is not /wəʊnt ju/ except in slow careful speech but is actually pronounced /wəʊntʃu/.  What has happened is that the 't' and 'y' sounds have coalesced to make the /tʃ/ sound.
      (Reciprocal assimilation is sometimes called coalescent assimilation, for this reason.)
    4. There are lots of possible assimilation changes in English.
      Assimilation happens like this (after Field, 2008:150):
      Before these sounds this sound assimilates to for example transcription
      /m/, /b/, /p/ /n/ /m/ then bake it /ðem.beɪk.ɪt/
      then put it /ðemˈpʊt.ɪt/
      then mix it /ðe.mɪks.ɪt/
      /t/ /p/ or /ʔ/ that mixture /ðəʔ.ˈmɪks.tʃə/
      that bread /ðəp.bred/
      that paper /ðəʔ.ˈpeɪ.pə/
      /d/ /b/ or /ʔ/ mad man /mæʔ.mæn/
      mad boy /mæʔ.ˌbɔɪ/
      mad policy /mæb.ˈpɒ.lə.si/
      /k/, /ɡ/ /n/ /ŋ/ bean cakes /biːŋ.keɪks/
      bean good /biːŋ.ɡʊd/
      /t/ /k/ or /ʔ/ that cake /ðəʔ.keɪk/
      but go /bək.ɡəʊ/
      /d/ /ɡ/ bed clothes /beɡ.kləʊðz/
      /j/ /t/ /tʃ/ might you /maɪtʃu/
      /d/ /dʒ/ had you /hədʒu/
      /ʃ/ /s/ /ʃ/ glass shop /ˈɡlɑː.ʃɒp/
      /z/ /ʃ/ has shut /hæ.ʃʌt/
      The change noted above when had you is pronounced as /hədʒu/ occurs frequently and is sometimes called the Yod coalescence.  Other examples include:
          would you as /wʊdʒu/
          could you as /kədʒu/
      and so on, instead of the more careful /wʊd.ju/ and /kəd.ju/.
    5. Voicing and devoicing
      Assimilation, both progressive and regressive, also affects voicing (sometimes known as sonorisation).  For example:
      • the s following an unvoiced consonant will be pronounced as /s/ so we get hat and hats (/hæt/ and/hæts/), make and makes (/ˈmeɪk/ and /ˈmeɪks/) and so on.
      • following a voiced consonant, however, s is usually voiced from /s/ to /z/ so we get rug and rugs (/rʌɡ/ and /rʌɡz/), cab and cabs (/kæb/ and /kæbz/) and so on.
      • some speakers carry this over to other sounds, particularly the /θ/ and may pronounce, for example, baths as /bɑːðz/ and youths as /juːðz/.  Others will retain the /θ/ in the plural forms.
      • regressively, the /v/ in, for example, have is often devoiced before a voiceless consonant such as /t/ so the pronunciation of have to is /həf.tuː/ and love camping is /ˈlʌf.ˈkæmp.ɪŋ/.  Not all speakers do this and many retain the voiced /v/ in such expressions.
      • a teaching point is that in some languages, German, Dutch, Polish and Russian, for example, a final consonant is always devoiced so, e.g., bag, club, has, had and cave may be pronounced as /bæk/, /klʌp/, /hæs/, /hət/, /keɪf/, respectively, instead of /bæɡ/, /klʌb/, /hæz/, /həd/ and /keɪv/.
  3. Elision
    A clear example of this is the tendency in English to use contracted forms, leaving out whole sections of words (hasn't, can't, wouldn't've etc.), but there are other examples such as:
        the loss of the /d/ in sandwich (/ˈsæn.wɪdʒ/)
        the pronunciation of library as /ˈlaɪ.bri/ or comfortable as /ˈkʌmf.təb.l̩/
        the elision of the central vowel in business as /ˈbɪz.nəs/ (compare busyness pronounced as /ˈbɪ.zɪ.nɪs/)
        the dropping of /h/ sounds in rapid speech, as in give it to him rendered as /ɡɪv.ɪt.tu.ɪm/.
    Essentially, five kinds of elision are recognised (as well as the initial /h/ elision):
    1. Function word reduction
      occurs when all or part of a function word such as of is elided as in
          cup of coffee
      being pronounced
          cuppa coffee
      (/kʌpə ˈkɒ.fi/
      In many cases the word and is reduced to 'n' as in tea 'n' cakes (/tiː n̩ keɪks/).
    2. Polysyllabic word reduction
      occurs in our example of library as /ˈlaɪ.bri/ and also in many other longer words such as probably (/ˈprɒbli/), comfortable (/ˈkʌmf.təb.l̩/) etc.
    3. Cluster reduction
      occurs when a consonant cluster, such as the one at the end of sixths, is simply difficult to pronounce.  The result is usually something like /sɪkθs/ or even /sɪkfs/.  Learners whose languages do not allow the same clusters as English are often tempted to use cluster reduction inappropriately, for example, pronouncing crisps as /krɪps/ rather than /krɪsps/.  For more see the guides to syllables and phonotactics and the guide to teaching troublesome sounds (new tab for both links).
      It is usually /t/, /d/, /p/ and /k/ which are elided in this respect, so, for example:
          text message
      becomes /teks.ˈme.sɪdʒ/
      becomes /mɪst/
      becomes /ɡlɪms/
          and asked can be pronounced /ˈɑːst/.
      A word that causes persistent problems is clothes because learners feel they should have a go at the consonant cluster at the end /kləʊðz/.  In rapid speech, however, the word is often pronounced /kləʊz/ with the elision of the /ð/.  If learners always say it that way, they will never be misunderstood and it's a good deal easier for them.
      The same phenomenon is observable with the unvoiced /θ/ sound so asthma is pronounced as /ˈæ.smə/.
      Occasionally, elision can become fixed in the language so, for example, the confection now known as ice cream was originally iced cream but the /t/ sound of the letter 'd' was routinely elided and the phrase took on its current spelling.

      There is some overlap and some debate about whether certain phenomena are examples of assimilation or simple elision.
      For example, in the table above, we have classified the dropping of the /s/ sound when it precedes /ʃ/ as a case of assimilation.  So we get, e.g.:
          face shape
      pronounced as
      rather than
      At first sight this appears to be a case of elision because the /s/ is not changed, it is omitted entirely.  However, there is some evidence that the /ʃ/ sound is lengthened in connected speech so the correct transcription might properly be
      retaining both instances of the phoneme and clearly constituting a change rather than an omission.
      We can avoid the debate altogether and simply refer to both phenomena as simplifications, of course.
      For teaching purposes, a technicality like this is not something on which to dwell.
    4. Adjacent sound elision
      When the sound at the end of one stretch of language is the same as the one at the beginning of the next item, they are usually reduced to a single sound in connected speech so, for example:
          I'm meeting Mary
      is pronounced as: /aɪ.ˈmiːt.ɪŋ.ˈmeər.i/ not /aɪm.ˈmiːt.ɪŋ.ˈmeər.i/
          Don't take that table
      is pronounced as /dəʊn.teɪk.ðæ.ˈteɪb.l̩/ not /dəʊnt.teɪk.ðæt.ˈteɪb.l̩/
      In the transcription here, we have removed the first of the sounds but you can decide whether it is the first or the second which is elided.
      Speakers are not consistent in this and some will retain both sounds or, when it is possible, as with /m/ to extend the sound slightly.  That is not possible with stops such as /t/, /k/ /d/ etc. but occurs with fricatives like /f/ and /s/ and with the nasal sounds.  When it happens both phonemes appear in the transcription so, e.g.,
          She makes sandwiches
      can be transcribed either as /ʃi.ˈmeɪk.ˈsæn.wɪdʒ.ɪz/ or as /ʃi.ˈmeɪks.ˈsæn.wɪdʒ.ɪz/
    5. Full elision
      We saw above that certain combinations assimilate differently but others can result in the full elision of a sound.
      This elision affects two sounds in particular, both alveolar stops, /t/ and /d/ and occurs when they fall between two other consonants only.
      For example. in:
          host presenter the pronunciation is /həʊs.prɪ.ˈzen.tə/ and the /t/ is elided.
          band master the pronunciation is /bæn.ˈmɑːst.ə/ and the /d/ is elided.
      This does not occur invariably and careful speech will reveal the sounds.  However, in rapid speech such elisions are common.  It is also common for the sounds to be assimilated (see above) rather than elided.
  4. Catenation
    This usually occurs when the consonant sound at the end of one word joins the vowel at the beginning of the next so we get, for example
        an orange
    pronounced as
        a norange
    (/ə nˈɒ.rɪndʒ/)
        right arm
    becomes something like
        rye tarm
    (/raɪ tɑːm/).
    Note, too, the way the pronunciation of
        the boys of Eton
    differs from
        the boys have eaten
    in rapid speech.
    A by-product of catenation, incidentally, is the phenomenon variously known as false splitting, misdivision, false separation or coalescence in which a word such as apron, originally from the Old French naparon, is falsely separated into the Modern English an apron.  There are other examples in the guide to word formation.
    In British English, the final 'r' on many words is unsounded so, for example, harbour is pronounced as /ˈhɑː.bə/, whereas in AmE, the standard pronunciation includes the /r/ sound and the pronunciation is /ˈhɑːr.bər/.
    However, when a word ending in 'r' immediately precedes a word with an initial vowel, we get a phenomenon known as the linking /r/ and the sound is produced so, for example:
        My father asked
    will be pronounced as
        /maɪ.ˈfɑːð.ər.ˈɑːskt/ in BrE
    and as
        /maɪ.ˈfɑːð.r̩.ˈæskt/ in AmE.
  5. Juncture
    refers to boundaries between words and awareness of it allows us to distinguish between, for example:
        I scream
        ice cream
        my turn
        might earn
    Usually, the distinction between these pairs is recognisable by either stress:
        /ˈaɪ.skriːm/ vs. /ˈaɪ.ˌskriːm/
    or whether a consonant is aspirated:
        /maɪtʰɜːn/ vs. /maɪtɜːn/
    or by noticing the syllabic structure:
        /maɪ.tɜːn/ vs. /maɪt.ɜːn/
    The detail of how we identify the juncture between words is actually usually redundant because the context almost invariably makes clear what is meant and should be understood.
    Other examples of juncture provided by Roach (2009: 116) include:
    might rain vs. my train
    (in the first, the /r/ is voiced and in the second it is voiceless)
    all that I'm after today vs. all the time after today
    (in the first, the final /t/ on that is unaspirated and in the second the initial /t/ on time is aspirated).
  6. Intrusion
    is, in contrast, the addition of sounds in connected speech.  The three sounds usually intruded are the approximants (semi-vowels) /w/, /j/ and /r/.  Consider the pronunciation of these phrases and note the transcriptions (the intrusive sounds are in red).
    an intrusive /w/:
        go on (/ɡəʊw ɒn/)
        hoe in (/həʊw.ɪn/)
    an intrusive /j/:
        I ate it (/aɪj et ɪt/)
        fly it (/flaɪj.ɪt/)
    an intrusive /r/:
        law and order (/ˈlɔːr ənd ˈɔː.də/)
        Victoria and Albert Museum (/vɪk.ˈtɔː.rɪər.ənd.ˈæl.bət.mjuː.ˈzɪəm/)
    An intrusive /j/ sound may occur in individual words so, for example, British English speakers may insert /j/ in words such as tune, fortune, produce, century, due, new, nature, mixture, picture, creature, opportunity, situation, actually in which the /t/, /n/ or /d/ sound is followed by a /j/ not shown in the spelling.
    (In BrE, the transcriptions are:
    /tjuːn/, /ˈfɔː.tʃuːn/, /prə.ˈdjuːs/, / ˈsen.tjuː.ri/ (or /ˈsen.tʃə.ri/), /djuː/, /njuː/, /ˈneɪ.tʃə/, /ˈmɪks.tʃə/, /ˈpɪk.tʃə/, /ˈkriːtʃ.ə/, /ˌɒ.pə.ˈtjuː.nɪ.ti/, /ˌsɪ.tʃʊ.ˈeɪʃ.n̩/, /ˈæk.tʃuə.li/
    but in AmE, they are usually:
    /ˈtuːn/, /ˈfɔːr.tʃən/, /prə.ˈduːs/, /ˈsen.tʃə.ri/, /duː/, /nuː/, /ˈneɪ.tʃər/, /ˈmɪks.tʃər/, /ˈpɪk.tʃər/, /ˈkriːtʃ.r̩/, /ˌɑː.pər.ˈtuː.nə.ti/, /ˌsɪ.tʃuː.ˈeɪʃ.n̩/, /ˈæk.tʃə.wə.li/.)
    There is a perceptible trend in BrE to follow the AmE pronunciation to some extent so many British speakers will pronounce. e.g.:
        actually as /ˈæk.tʃuə.li/, rather than /ˈæk.tjuə.li/
        situation as /ˌsɪ.tjʊ.ˈeɪʃ.n̩/, etc.  Not all speakers of BrE do this.
    In some speakers' production, the intrusive sound is avoided and replaced by a glottal stop so, for example, we may find
        Go out
    produced as /gəʊʔaʊt/ rather than /ɡəʊ.ˈwaʊt/,
        The gorilla and me
    produced as
    rather than
        I am here
    produced as
    rather than
    Intrusion, too, has comedic possibilities.
    In the 19th century it was common for ships' biscuits to be attacked by small insects called weevils.  In a famous scene from Patrick O'Brian's book concerning the era, we find the following exchange:
        You see those weevils, Stephen? said Jack solemnly.
        I do.
        Which would you choose?
        There is not a scrap of difference ... there is nothing to choose between them.
        But suppose you had to choose?
        Then I should choose the right-hand weevil; it has a perceptible advantage in both length and breadth.
        There I have you, cried Jack. Don't you know that in the Navy you must always choose the lesser of two weevils?

    (O'Brian 1970)
    If you have got the joke, such as it is, you have understood the nature of the intrusive /w/ in the lesser of the two evils.

    Erroneous intrusion:
    Learners whose languages do not have many (or any) consonant clusters are often tempted to intrude a vowel, often a /ə/, /ɪ/ or a /e/, between elements of a difficult cluster.
    Many Arabic speakers, for example, may pronounce screwdriver as /ˈsekəruː.dəraɪ.vər/ rather than /ˈskruː.draɪ.və/, i.e. 6 instead of 3 syllables.  Japanese speakers may do likewise.
    Speakers of many other languages will produce crisps as /krɪspəs/ or /krɪspes/ instead of /krɪsps/ and we saw above that clothes is often produced as /kləʊðez/ or /kləʊðɪz/ instead of /kləʊðz/.
    Speakers of other languages, notably French and Italian, are also tempted to intrude a redundant /h/ sound and pronounce, e.g.:
        He is my ally
    when it should be

There are more examples of connected speech phenomena in the course on learning to transcribe (new tab).

