logo  ELT Concourse teacher training
Concourse 2

Transcription: a teach-yourself course


This guide also forms Strand 6 of the Teacher Development section.

This guide concerns transcription, not a description of the sounds of English.  For a description of how the sounds of English are made and what mouth parts do, see the in-service guides to pronunciation.
This mini-course is:


The components of the course

This is what the course covers.
If you cannot transcribe or your transcription skills are shaky, do the whole course.
If you are here for a specific area of transcription or returning for some revision, the following will help you find what you need.
Clicking on -index- at the end of each section will return you to this menu.

Why learn to transcribe? The phoneme chart Consonants Voicing
Vowels Marking stress and syllables The schwa and thriphthongs Connected speech
Intrusive sounds Assimilated sounds Elided sounds The glottal stop
Dropping the /h/ Syllabic consonants Transcribing intonation More practice



The sounds transcribed here are those of an educated southern British-English speaker.  That is not intended to imply that the dialect is somehow better than others.  It is one of the conventional ways to do these things.
American English pronunciation and any of the other multiple standard forms of the language would be different, especially but not solely, concerning the vowel sounds.


Why should you learn to transcribe?

There are teachers of English language who can lead successful careers in the classroom without ever using more than a minimal amount of phonemic transcription.  Some use none at all.  There are, however, five good reasons why knowing how to transcribe sounds is a useful skill for a teacher and knowing how to read transcriptions is a useful skill for learners.  Here they are:

  1. Independence
    1. For the teacher, the ability to transcribe what is heard allows rapid identification of troublesome sounds and other issues that need to be brought to the attention of learners.  One can, of course, rely on a pronouncing or other form of dictionary to do the work but that is time consuming and not always possible.  Freeing yourself from the need to consult a website or a dictionary for the pronunciation of words allows you to focus on what's important.
    2. For the learner, the ability to read the transcriptions of pronunciation in a dictionary, mono- or bi-lingual, gives autonomous access to how the word should sound without reference to the spelling or to a model.  Many people, dazzled by the spelling, are unaware that, for example, no and know are identically pronounced or that the words right, rite and write also share a single pronunciation.
      By the same token, it may not at first be obvious that in the words troupe, bought, should, cough and tourist the combination of the letter o and u is differently pronounced in each case.
      It is, of course, possible to model the pronunciations in the classroom but the ability to note them down in phonemic script is a valuable learning tool.
  2. Systematicity
    Phonemic transcription is independent of the language insofar as it is systematic.  Many attempts have been made to spell English words phonetically but without some unambiguous system of symbols such attempts fail.  Unless we can rely on a generally accepted system, there is, for example, no easy way to show the pronunciation of diphthongs and the difference between long and short vowels without resorting to a range of odd and obscure marks over letters such as ç, â, œ and so on.  Having a single system works.
  3. Reliability
    As you may know, spelling in English is not a reliable guide to how a word is pronounced so, even if a learner can correctly recognise and produce the different pronunciations of the o in love and move that is not a guide to knowing how shove or hove are pronounced at all.  Access to the phonemic script allows learners instantly to relate the pronunciation of words to one another and not to pronounce hove as if it rhymed with love or shave as if it rhymed with have.
    Teachers who can transcribe have the ability to make this clear and learners who can read transcription are able to make a note of the difference.
  4. Ambiguity
    Even when words are spelled the same, they may be differently pronounced (a phenomenon known as homography) so we get, for example, entrance meaning a way in and a verb meaning bewitch or hold someone's complete attention.  The words are very differently pronounced.  Other examples will include row, minute, live and hundreds more.  Being able instantly to spot the difference is a skill learners need to develop if they are sensibly to use a dictionary and teachers need instantly to point out when teaching.  The best way to do that is via phonemic transcription.
  5. Professionalism
    The ability to use a simple, if technical, area of linguistics is an indicator of professional competence.  An inability to read or write a transcription of how something is pronounced is a handicap when it comes to teaching pronunciation and most learners expect formal pronunciation work to be part of what happens in the classroom.

If even some of that sounds convincing, read on.

- index -


The sounds of English: phonemes, allophones and minimal pairs

We are talking about English sounds here.  The study of language sounds (phonemic analysis) is language specific.  This mini-course is concerned with the transcription of English sounds.
You will not, therefore, find mention of the vowel /ɯ/ (which occurs in Turkish, Korean, Irish and many other languages or /ɾ/ which is the Spanish trilled /r/ sound that does not appear in English but is common in, e.g., Japanese and other languages.  The chart below does not, therefore, describe all the sounds of language, just the ones that are used in English (and not all of them as we shall shortly see).

In English the sounds /p/ and /b/ are phonemes because changing one to the other affects the meaning of a word.  The sentences
    I gave him a bat
    I gave him a pat
are distinguished from each other only by the initial sound in bat or pat..  However, the sense of each sentence is very different.
This is called the Minimal Pair Test:
If you change a single sound in a word and make a new word, the sound you have changed is a phoneme in that language.
In other languages, most varieties of Arabic, for example, these two sounds, /b/ and /p/, are not phonemes and changing one to the other will not change the meaning of a word (but it might sound odd).
Here are some examples of minimal pairs formed with other consonant sounds.  Don't worry about the transcriptions at this stage; they are just for reference.  Focus on the left-hand column in each of these lists.
words transcriptions
You can readily see that in each case there is only one change to the transcription so any two of these words form a minimal pair.  In this case, we have changed the first sounds to make things simpler but, of course, we can change any sound in a pair to make a minimal pair.  For example:
words transcriptions
are also examples of minimal pairs.
Vowel sounds can also form minimal pairs, so we get, for example:
words transcriptions
which are examples of vowel-sound minimal pairs in English and any two of the words form a pairing with only one change to the vowel.
In some languages, not all these vowel sounds are produced and in some, there are vowels that English does not use at all.  For example, in Greek there is no distinction between a long 'e' sound as in feet and the short sound in fit.  Speakers of the language often use either, both or a sound midway between them without changing the meaning of a word.  The sounds are, therefore, not minimal pairs in that language.
You can see that there are, in fact, two vowel sounds in the words boat, bite and bait but the vowel in these cases is called a diphthong and for the purposes of identifying a minimal pair, that counts as a single vowel.  You may also have seen that the letter 'j' in jut and jet in the first set is formed from two symbols (/dʒ/) and that, too, counts as a single sound.
Allophones are slightly different pronunciations of certain phonemes which do not affect the meaning of what is said (although it may sound odd).  We saw above that /p/ and /b/ are not phonemes in most varieties of Arabic as are, incidentally, /f/ and /v/ in some varieties.  Changing one for the other does not affect the meaning of what you say.  You can, of course, pronounce both sounds in Arabic and a hearer may notice the difference but will not assume that you are using a different word, just slightly different ways of making the sound.  In this case, the two sounds are allophones.
An allophone is any of the various ways to say a phoneme in a language, which do not contribute to distinctions of meaning.  In other words, there may be many ways to pronounce a sound, all slightly different, but the differences do not change the meaning.
All languages have a number of allophones.  For example, in English the sound /t/ can be pronounced with and without a following /h/ sound.  Compare the sounds in track and tack.  In English, these sounds are not phonemes because you can change /t/ to /th/ without changing the meaning of a word.  In some languages, Mandarin, for example, /t/ and /th/ are separate phonemes and swapping them around will change the meaning of what you say.  The same applies to /k/ vs. /kh/ (ski vs. cat) and /p/ vs. /ph/ (spin vs. pot).  In Ancient Greek, the two different pronunciations of 'p' were full phonemes, and affected the meaning of words.  In Modern Greek that distinction has been lost although speakers still produce both sounds, depending on their accents.
The /l/ sound in English also has two allophones, the light [l] as in lap and the dark version (which has the symbol [ɫ]) and occurs at the end of words like moveable.  The word lull has one of each, the light 'l' at the beginning and the dark 'l' at the end.  It is transcribed fully as [lʌɫ] but, if we are dealing only with English, because the sounds do not form minimal pairs and are, therefore, not counted as phonemes in this language, the transcription can be left as /lʌl/.
Allophones of vowels are also quite common.  For example, in Standard British English, the word nurse is transcribed with a long vowel (as /nɜːs/) but in rapid speech the vowel may be shortened to give /nɜs/.  No-one listening will mistake the word or assume that the word with a shorter vowel carries a different meaning so the transcription need not distinguish too carefully.  The sounds are allophones.
In Standard American English the word is transcribed as /ˈnɝːs/ with the tiny /r/ denoting that the 'r' sound is pronounced by most American-English speakers but, again, that is an allophonic, not phonemic, difference because the word remains the same with the same meaning.  In some varieties of British English, too, the /r/ will be pronounced so we will have /nɜːrs/ as the transcription.  In similar varieties, the words beauty and booty may be pronounced identically as /ˈbuː.ti/ although the standard form for beauty is /ˈbjuː.ti/ and for booty, it is /ˈbuː.ti/.  It makes no difference to meaning if your dialect does not distinguish.
Minimal pairs:
As we saw above, pairs of words which are distinguished only by a change in one phoneme are called minimal pairs.  For example, hit-hat, kick-sick, fit-bit, sheep-ship, jerk-dirk, hot-cot, love-live etc. are all distinguished in meaning by a single change to a vowel or a consonant.  That's in English, of course.  It bears repeating that what is an allophone in English may be a phoneme in other languages and vice versa.
Minimal pairs can also be distinguished by where the stress falls.  For example:
If you stress the word export on the first syllable, you are referring to the noun: EXport.  Stress the second syllable and you refer to the verb: exPORT.
Stress the word convict on the first syllable and you refer to a resident of a prison: CONvict.  Stress the second syllable and you refer to act of finding someone guilty of an offence: conVICT.

Click here to take a short test to see if you can match minimal pairs.  There are no transcriptions in this test so you will have to say the words aloud or to yourself to find the pairs.
You can click on the other answers to see what feedback you get.

- index -


English phonemes

Here's the list you'll learn.  If you want to download this chart as a PDF document to keep by you as reference, click here.


- index -



The consonants are the easiest so we can start there.  Most of them are actually the same as the written form but remember that spelling in English is not a reliable guide to pronunciation.



Voicing describes how phonemes may be different depending on whether the vocal cords vibrate or not at the time of pronunciation.  (There are those who will aver that the technically correct term is vocal folds not vocal cords.)
For example, the /k/ sound is made without voicing but the /ɡ/ sound is made with the mouth parts in the same place but with voice added.  Here are some examples of words containing voiced and unvoiced consonants.  The consonant in question is underlined, in bold.  Say them aloud and you will hear the differences.

Unvoiced Voiced Say:
pie buy I went to buy a pie
jar char The Japanese chap
fine vine He grew a fine vine
sip zip He said 'z' or I used to use that
cape gape Have you got a cot?
hat had I had a hat
wreath wreathe Breath in and then let out the breath
mesh leisure They splashed about with pleasure

In all the words above, the place of articulation (i.e., where in the mouth the sound is made) is identical for both pairs of consonants.  All that changes is whether or not the vocal cords or folds vibrate.
If you put your hand on your throat and say the words sue and zoo, you will see what is meant and feel a slight vibration on the second word (/s/ is unvoiced but /z/ is voiced).
Try saying the words and examples in the table above out loud and you will see that you need to pronounce the voiced consonants with a vibration of the vocal cords and a little more energy than the sounds in the unvoiced cases.

Of the consonants, 16 form pairs of voiced-unvoiced sounds:

Unvoiced Voiced Minimal pairs
/p/ /b/ pit vs. bit
/tʃ/ /dʒ/ cheep vs. jeep
/f/ /v/ fat vs. vat
/s/ /z/ sing vs. zing
/k/ /ɡ/ Kate vs. gate
/t/ /d/ tuck vs. duck
/θ/ /ð/ teeth (plural noun) vs. teeth (verb)
/ʃ/ /ʒ/ ruche vs. rouge

You have to listen out for voicing when you are transcribing because voiced and unvoiced consonants are full phonemes in English.

Click here for a little test to see if you can match voiced and unvoiced sounds by saying some words aloud.

- index -

To get us started with transcribing consonants, take a piece of paper and transcribe the consonants in these words, using the right-hand side of the chart.  Look at the example words and check to see if the pronunciation is the same as the words in this test.
Click on the table when you have done that.

guide 2

All the other sounds are transcribed using ordinary English alphabetic letters taking on their usual pronunciation.

Now transcribe the underlined CONSONANTS only in these words.  Do not worry now about the rest of the words.

TobaCCo GooGle CHair JumP BaFFle VaN BaDGe DaDDy PaTH THiS MaZe SHaVe HaNG HuLL RaBBle Way YaNK

When you have done, click to reveal: eye

When you have done the section on vowels which follows, you will be asked to try that again transcribing the whole of each word.
You can then click here eye to reveal the full transcriptions of the words.

If you would like to try an exercise in transcribing the consonants you hear rather than ones you read, click here.

- index -



Here's a list of the vowels in English (authorities may differ slightly about how many there are, incidentally).

/iː/ sleep
/æ/ sat
/ɪə/ here
/ɪ/ kid
/ʌ/ blood
/ʊə/ sure
/ʊ/ put
/ɑː/ part
/ɔɪ/ boy
/uː/ goose
/ɒ/ hot
/eə/ lair
/e/ Fred
/i/ happy
/eɪ/ lace
/ə/ about
  /aɪ/ price
/ɜː/ verse
/əʊ/ boat
/ɔː/ fought
/aʊ/ south

What do you notice about the difference between the first two columns and the third column?
Click to reveal: eye

pure vowels

If you haven't already done so, to do this exercise, you may want to download the chart as a PDF document so you can have it at your elbow.  Click here to do that.

Using the chart, transcribe the following words and then click on the table to check your answers.

pure vowels 

If you didn't get the final vowel of ago, or the first one of happy, that doesn't matter (yet).  In the first case the initial vowel was the schwa, transcribed as /ə/, and in the second case, the final vowel is transcribed as /i/ and lies between the short vowel in sit (/sɪt/and the longer one in seat (/siːt/).
Try another short recognition test by clicking here.


There are 8 of these and they are combinations of pure vowels which merge together.  We have, e.g., /ɪ/ + /ə/ (the sounds we know from bid and ago) following one another to produce /ɪə/ as in merely (mee-err-ly).  You can usually work out what the diphthong is by saying the word it contains very slowly and distinctly.

Using the chart, transcribe the following words and then click on the table to check your answers.

test 2

There is another test of your ability to recognise all the diphthongs here.

You have now transcribed words using all the vowels and consonant sounds of English.

As a check of your knowledge, try the following.

Using the chart, transcribe the following words and then click on the table to check your answers.

test 3

Did you get it right?  One thing to notice is that in rapid connected speech, the transcription of come with me would probably be /kʌm wɪ miː/ without the /ð/ because we usually leave it out.  You may also, depending on how you say things, have had /iɡ's/ or even /ik's/ at the beginning of exactly.  That doesn't matter too much but note the convention for marking the stress on multisyllabic words: it's a ' inserted before the stressed syllable.
There is also the convention of putting a stop (.) between syllables (as in, e.g., sentence ('sen.təns).  Your students may not need that but many find it helpful.  More on that in a moment.

- index -


The schwa

The most common vowel in the spoken language has no letter to represent it.
It is, of course, the humble schwa.  If you teach no other phoneme symbol, teach this one.  Including it in your transcriptions is simply a matter of listening out for it and making sure that you aren't being influenced by the spelling of words.  You should also note that the schwa only occurs in unstressed syllables.  You can't stress the schwa.
The schwa may be how any of the traditionally spelled vowels are pronounced:

vowel a schwa in transcribed
a asleep /ə.'sliːp/
e different /'dɪ.frənt/
i definite /'de.fɪ.nət/
o prosody /'prɒ.sə.di/
u tedium /'tiː.dɪəm/
ou tedious /'tiː.dɪəs/
io nation /'neɪʃ.ən/

The schwa also occurs routinely in function words like and, of, for, to etc. which can be transcribed as /ənd/, /əv/, /fə/, /tə/ etc. as that is how they are produced in connected speech.  This is called weakening.

How many schwa sounds can you detect when you say and transcribe this sentence?  Click on the bar when you have an answer.

schwa test

As a check, go back to the section on consonants and try the final test again but this time, transcribe the whole of each word, putting in the correct vowel transcriptions, the stress marks and the schwa.
Click here to go back.

Now you can get a little practice in transcribing the vowels you hear in some simple words.  Click here to do that.



There are those who argue (Wells, for example) that there is actually no such thing as a triphthong in English.  They take the view, roughly summarised, that the vowels in, e.g., player break into two syllables so what we have is simply a diphthong followed by another vowel so the transcription should be:
    /ˈpleɪ.ə/, not /ˈpleɪə/
and that means the diphthong /eɪ/ as in day followed by the schwa in the second syllable.
Wells puts it like this:

I would argue that part of the definition of a true triphthong must be that it constitutes a single V unit, making with any associated consonants just a single syllable.
Given that, do we have triphthongs in English? I claim that generally, at the phonetic level, we don’t. I treat the items we are discussing as basically sequences of a strong vowel plus a weak vowel.
Wells, 2009

Roach, on the other hand, argues differently and states that:

The most complex English sounds of the vowel type are the triphthongs. They can be rather difficult to pronounce, and very difficult to recognise. A triphthong is a glide from one vowel to another and then to a third, all produced rapidly and without interruption.
Roach, 2009:29 (emphasis added)

Crystal, states:

The distinction between triphthongs and the more common diphthongs is sometimes phonetically unclear.
Crystal, 2008:497

This is not the place to pit two esteemed phoneticians against each other so we'll stick with the simplest explanation, the one proposed by Wells, and suggest that what is sometimes called a triphthong is, in fact a glide from a diphthong to another vowel, the schwa and that there are (or can be) two syllables in such pronunciations.
Here, we will recognise five of these combinations of sounds.  Whether whomever you are transcribing produces all five is a matter of the accent and background of the speaker as well as how carefully and slowly the words are spoken.
Here's the list:

  1. /eɪə/ as in player or mayor.  Start with the diphthong /eɪ/ (as in say) and glide from the end of that to the /ə/.
  2. /aɪə/ as in liar or shire.  Start with the diphthong /aɪ/ (as in nice) and glide to the /ə/.
  3. /ɔɪə/ as in soil or loyal.  Start with the diphthong /ɔɪ/ (as in toy) and glide to the /ə/.
  4. /əʊə/ as in lower or knower.  This one has a schwa at both ends.  Start with the diphthong /əʊ/ (as in coat) and glide to the /ə/.
  5. /aʊə/ as in tower or our.  Start with the diphthong /aʊ/ (as in mouth) and glide to the /ə/.

Try a little practice with these words and the click to reveal an answer: eye

As far as transcription is concerned, you do not have to take sides in the Roach-Wells debate and can equally well have the transcription with the syllable-marking '.' or without.  It just depends on whether you hear the sound as a single vowel or two syllables and that will vary from speaker to speaker.
See the next section for how we recognise syllables.

- index -


Marking stress and syllables

As we saw, the main stressed syllable is conventionally indicated by ' before the syllable (e.g., /'sɪl.əb.l̩/).
It is sometimes helpful to mark secondary stress in longer words like incontrovertible by a lowered symbol like this:
in which you can see a small ˌ before the /k/ sound indicating that the second syllable carries secondary stress and the main stress falls on the fourth syllable and is shown by the 'vɜː in the transcription.  Most learners find just one stressed syllable enough to cope with.
If we want to show that non-phonemically, we might write:
on the board with an underline lower-case for secondarily stressed syllables but bold, underlined CAPITALS for the main stress.

However, before we can decide where to put the stress mark, we need to identify the syllables in an utterance.  That is not always as easy as it sounds.
A syllable is a unit of pronunciation having one vowel sound, with or without surrounding consonants.
By that definition, all of the following are single syllables:

of various kinds (there is a guide to the difference on this site which you can access here (new tab)).
You can transcribe these individual words without any stress or syllable marks because there is no stress to note and only one syllable in question.  In connected speech, of course, we may need to insert a stress mark if the word carries stress in a longer string of text.  The transcription of those words is, therefore:
    or: /ɔː/
    go: /ɡəʊ/
    ask: /ɑːsk/
    bus: /bʌs/
and there are no other markings.

However, words or utterances of more than one syllable pose a problem because the transcription needs to show both the division into syllables and the place where the stress appears.

Syllables first.
Count the syllables in denationalization and then transcribe the words, putting a '.' between the syllables.
Click to reveal: eye

The rules for deciding where a syllable starts and stops are quite complex in English but there is a rule of thumb we can use to decide, for example, how to divide a word like tumbler.
We could have:
so how do we decide?
Here are the rules:

  1. If there is a choice, attach the consonant to the right-hand syllable, not the left:
    That would mean that the transcription would be:
    and that's fine, but why don't we attach both consonants to the right-hand syllable and it isn't:
    Here, we need rule 2:
  2. If attaching the consonants to the right-hand element produces a syllable which is forbidden as the beginning of a word in English, move one of them to the left.
    In English, no word can begin /mb/ (although that is allowable in some languages).  We can, however, have a word beginning /bl/, of course, such as black, blur, block etc.
    Therefore, applying both rules, we end up with

Now the stress marking.
Once we have applied the rules (or used a bit of common sense and intuition) we can divide multisyllabic items up conventionally and then decide where the stresses fall.

Now transcribe these words and mark the main and secondary stresses on your transcriptions.
Click eye to reveal the answer when you have done that.

- index -


Connected speech

Transcribing connected speech spoken at normal speed rather than someone reading from a list of words, requires the inclusion of a variety of new factors.  Four are considered here.


Intrusive and linking sounds

There are three sounds which speakers insert between vowels in connected speech.  They need to be included in your transcriptions.  They are:

intrusive /r/
Try saying law and order.  You will hear a /r/ sound like this:
    /lɔːr ənd 'ɔː.də/.
Now transcribe:
    The media are
    I saw uncle Fred
and you'll get the same phenomenon.  Click eye to reveal the answer when you have done that.
intrusive /w/
Try saying I went to evening classes and note what happens between to and evening.  The transcription is:
    /'aɪ 'went tuw 'iːv.n.ɪŋ 'klɑː.sɪz/.
Now transcribe
    do it
    do or die
and you'll see the same effect.  Click eye to reveal the answer when you have done that.
intrusive /j/
Try saying I agree.  You will hear a /j/ sound between the words.  The transcription is:
    /'aɪj ə.'ɡriː/
This effect is common with words ending in 'y'.  Standing alone, the transcriptions of fly, lay and they are /flaɪ/, /leɪ/ and /ðeɪ/ but in combination with following vowels we get the intrusion.
Now transcribe
    fly over
    lay it down
    they aren't
and you will get a similar effect.  Click eye to reveal the answer when you have done that.

You may see an intrusive sound put in superscript (r w j) and that's a good way to draw your learners' attention to the sounds.  There is, however, a case to be made that you don't have to teach these at all because they are the inevitable effects of vowel-vowel combinations in speech.  They aren't, of course, only applicable to English.

Try this next mini-test.  Click on the table to get the answer.

intrusion test

There are times when you have to listen extremely carefully to hear whether a speaker is actually producing the intrusive sound or inserting /ʔ/, a glottal stop (see next section).
For example, many will pronounce
    Go out
as /gəʊʔaʊt/ rather than /ɡəʊ.ˈwaʊt/,
    The gorilla and me
as /ðə.ɡə.ˈrɪ.ləʔənd.miː/ rather than /ðə.ɡə.ˈrɪ.lə.rənd.miː/
    I am here
as /ˈaɪʔæm.hɪə/ rather than /ˈaɪ.jæm.hɪə/.

A further issue to listen for is the linking /r/ sound.
In British English, the final 'r' on many words is unsounded so, for example, harbour is pronounced as /ˈhɑː.bə/, whereas in AmE, the standard pronunciation includes the /r/ sound and the pronunciation is /ˈhɑːr.bər/.
However, when a word ending in 'r' immediately precedes a word with an initial vowel, we get the linking /r/ and the sound is produced so, for example:
    My father asked
will be pronounced as
    /maɪ.ˈfɑːð.ər.ˈɑːskt/ in BrE and as
    /maɪ.ˈfɑːð.r̩.ˈæskt/ in AmE.

If you listen carefully to some British English speakers pronouncing words such as tune, fortune, produce, century, nature, mixture, picture, creature, opportunity, situation, actually you may hear and intrusive /j/ sound after the /t/ or /d/ not shown in the spelling.
Therefore, the transcription is actually:
    tune /tjuːn/
    actually /ˈæk.tjuə.li/
    situation /ˌsɪ.tjʊ.ˈeɪʃ.n̩/
etc. although ˈæk.tʃuə.li/ and /ˌsɪ.tʃʊ.ˈeɪʃ.n̩/ are also heard.  Not all speakers do this.

- index -



The guide to connected speech contains more detail on the different forms of assimilation.  For the purposes of transcribing sounds in connected speech, the various types are not as important as the ability to step away from the written word and transcribe only what you hear.
You must be aware, however, that not all speakers will pronounce everything the same way and the phenomena listed here are not consistently produced by everyone.  Much will depend on how careful the speaker is and what variety of English they use.
Assimilation describes the alteration of sounds under the influence of other sounds in the vicinity.  The guide to the area has this table:

Before these sounds this sound assimilates to for example transcription
/m/, /b/, /p/ /n/ /m/ then bake it /ðem.ˈbeɪk.ɪt/
then put it /ðemˈpʊt.ɪt/
then mix it /ðe.ˈmɪks.ɪt/
/t/ /p/ or /ʔ/ that mixture /ðəʔ.ˈmɪks.tʃə/
that bread /ðəp.bred/
that paper /ðəʔ.ˈpeɪ.pə/
/d/ /b/ or /ʔ/ mad man /ˈmæʔ.mæn/
mad boy /ˈmæʔ.ˌbɔɪ/
mad policy /mæb.ˈpɒ.lə.si/
/k/, /ɡ/ /n/ /ŋ/ bean cakes /ˈbiːŋ.keɪks/
been good /biːŋ.ˈɡʊd/
/t/ /k/ or /ʔ/ that cake /ˈðəʔ.keɪk/
but go /bək.ˈɡəʊ/
/d/ /ɡ/ bed clothes /ˈbeɡ.kləʊðz/
/j/ /t/ /tʃ/ might you /ˈmaɪtʃu/
/d/ /dʒ/ had you /ˈhədʒu/
/ʃ/ /s/ /ʃ/ glass shop /ˈɡlɑː.ʃɒp/
/z/ /ʃ/ has shut /hæ.ˈʃʌt/

In the last case, the assimilation of /s/ and /z/ to /ʃ/, some would aver that the /s/ and /z/ sounds are simply being omitted and that's elision, the topic of the next section.  Others believe that the /ʃ/ sound is, in fact being extended to nearly double its usual pronunciation so this is a case of assimilation.  The distinction, such as it is, is not vital for teaching purposes.
Now try some short transcriptions, focusing on assimilated sounds.  These follow the same order as the examples in the table above.
Click on the eye to reveal the answers as you do each one.

  1. golden box
    children must
    put by
    had managed


  1. fine castle
    sit comfortably
    had covered


  1. paint yellow
    would yet

  1. less sugar
    was sure

- index -



It is important, too, to listen carefully for what is not pronounced and this also involves releasing oneself from the spell of the written word and hearing only what is being said, not what one expects to be said.
Again, the guide to connected speech has more detail in this area but here it will be enough to present some examples:

Again, speakers vary in this with some being more careful and correct and others less so (or sloppy as writers to newspapers often describe them).  You have to listen hard to hear what is really being said.

Now try some short transcriptions, focusing on elided sounds.  These do not follow the same order as the examples above.
Click on the eye to reveal the answers as you do each one.

  1. bag of potatoes

  1. he shouldn't've been there

  1. pass that to her

  1. sevenths

- index -


The glottal stop

At the back of your mouth there is a part of your larynx called the glottis and this is where the glottal stop is produced, hence its name.
A glottal stop is formed by briefly blocking the airflow at the back of the mouth and then releasing it.
The symbol for this sound is /ʔ/ and we have seen a lot of examples of how some sounds are replaced by the glottal stop above.

In rapid speech a glottal stop is sometimes inserted instead of a consonant.  For example, the usual transcriptions for football and Batman are /'fʊt.bɔːl/ and /'bæt.mən/ but many people will pronounce them /'fʊʔ.bɔːl/ and /ˈbæʔ.mən/, inserting the stop, /ʔ/, instead of the /t/.
Try transcribing
    put on, pick up, hit him
as they might sound in casual rapid speech

We can have also butter as /'bʌʔ.ə/ not /'bʌt.ə/ or fatter as /ˈfæʔ.ə/ not /ˈfæ.tə/ in some common dialects (London and Scots, for example).

See also the use of the glottal stop to avoid a linking /r/, /w/ or /j/ sound, above.

- index -


/h/ dropping and /ŋ/ to /n/ conversion

Dropping the /h/ on him is not always sloppy speech; it is very commonly acceptable.  And it is very common (but not in all dialects).
The /h/ in I have, when not contracted, is often replaced by an intrusive /j/ as in /'aɪj æv/ and this happens frequently elsewhere, too (they have, we have, e.g., rendered as /'ðeɪjəv/, /'wijæv/).  Notice, too, the tendency to pronounce have as /həv/ in they have but as /hæv/ in we have.
Hello is often pronounced /hə.'ləʊ/ sometimes /hæ.ˈləʊ/ but often /ə.'ləʊ/ or /æ.ˈləʊ/.  It may be safer to stick with /haɪ/.

Similarly, in many dialects the final /ŋ/ in words ending with -ing is often rendered as /n/ but this is generally considered low status.  We get, e.g., /'ɡəʊɪn 'aʊt/ instead of /'ɡəʊɪŋ 'aʊt/.  Oddly, some high-status British accents also make this conversion, exemplified by the so-called huntin', fishin' and shootin' set (the /'hʌnt.ɪn 'fɪʃ.ɪn ən 'ʃuːt.ɪn set/).

Try transcribing:
    Daisy has a dog
as it might sound in casual rapid speech

Now try:
    Are you going out tonight?
as it might sound in casual rapid speech

- index -


Crushing the schwa: syllabic consonants

If you listen very carefully to how someone pronounces a word like stable, you may hear two or three possible pronunciations:

The first is more likely to appear in quite rapid speech and the second sounds rather formal and slow.  The third is an intermediate stage in which some people will hear a schwa but aver that it is simply shortened.  That would be transcribed with the symbol for the schwa raised to signify its comparative shortness.
What is happening is that the schwa between /b/ and /l/ is being crushed in normal rapid speech so that the final /l/ sound constitutes a syllable on its own.  Usually, syllables are defined by vowels but, in this case, a consonant alone is the syllable because the schwa is all but impossible to hear.
To transcribe this phenomenon, you need to place a dot before the syllable and insert a small mark below it to signify that it is a syllabic consonant (see above).

There are, in English, three types of syllabic consonant and they affect /l/ (the example above), /n/ and /r/.  Here are some examples.

syllabic /l/
The example of stable above is a case but this type of schwa crushing occurs very frequently with the suffix -able meaning with the ability so, for example, we get:
capable /ˈkeɪ.pəb.l̩/
definable /dɪ.ˈfaɪ.nəb.l̩/
computable /kəm.ˈpjuː.təbl̩/
uncle /'ʌŋk.l/̩
and so on.
syllabic /n/
This is not such an obvious phenomenon so two transcriptions of many words are possible.  It all depends on how rapidly and clearly the words are pronounced.  The faster the production, the more likely it is that the final /n/ will constitute a syllable on its own.  Like this:
darken /ˈdɑːkən/ or /ˈdɑːk.n̩/
open /ˈəʊ.pən/ /ˈəʊp.n̩/
dragon /ˈdræ.ɡən/ /ˈdræɡ.n̩/
This is frequent in many varieties of English with the noun-forming suffix -tion.  So we have, e.g.:
meditation /ˌme.dɪ.ˈteɪʃ.n̩/ or /ˌme.dɪ.ˈteɪʃ.ən/
definition /ˌde.fɪ.ˈnɪʃ.n̩/ /ˌde.fɪ.ˈnɪʃ.ən/
exception /ɪk.ˈsep.ʃn̩/ /ɪk.ˈsep.ʃən/
This does not occur with the nasalised sound /ŋ/.  But if the sound is converted to /n/ it may so hunting could be transcribed as /'hʌnt.n̩/ in some varieties of English.
syllabic /r/
Again, this is not always obvious and does not occur in most varieties of British English.  However, in some American and other standard varieties in which a final /r/ is pronounced even when not followed by a vowel sound, it may be syllabic.  So, in rapid speech we may encounter, for example:
indifference /ɪn.ˈdɪ.fərəns/ or /ɪn.ˈdɪf.r̩əns/ or /ɪn.ˈdɪ.frəns/
brother /ˈbrʌð.ə/ /ˈbrʌð.r̩/ /ˈbrʌð.ər/
reverence /ˈre.və.rəns/ /ˈrev.r̩əns/ /ˈre.vrəns/
If you are transcribing the voice of a speaker of AmE, you will need to listen out for a syllabic /r/.

Try transcribing these words as they might sound in casual rapid speech:
    paddle, fiddle, doable
    lighten, fasten, chosen

and in an New York accent
    shudder, hunter, dangerous

- index -


Transcribing intonation

Unlike individual sounds and issues of connected speech, there is no universally accepted or conventional way of transcribing intonation.
In the classroom, most teachers develop their own ways of doing this depending on the features they want to highlight.

  1. Some use slanted and carefully positioned boxes, like this:
    which works well for showing the rise and fall of tone across a sentence but less well for showing the tonic stress.
  2. Some use slanted or straight lines, like this:
    intonation 2
  3. or wavy lines:
    intonation 3
  4. and these can be extended to include some recognition of the stressed syllables, like this:
    intonation 3
  5. As a more consistent way of showing the patterns, you may like to try having a keyed transcription, like this:
    intonation 5
    in which level = level tone starts here, down = step down in pitch here, up = step up in pitch here and rising = rising tone here.
    Letters in UPPER CASE underlined and in bold show stress.
  6. In the section on intonation in this site a variety of formats is used depending on what it is that needs describing.  Some of it looks like this:
    representing a sharp rise, a level tone, a rise-fall, a fall-rise, a rising tone and a falling tone respectively.

Whichever system or systems you adopt, you need to make sure that your learners understand its implications.

There is no exercise on this because

  1. you are free to choose, adapt or invent your own system
  2. people usually disagree about how the intonation actually works and
  3. there is simply no evidence that we can equate intonation to speaker emotion or intention on a simple one-to-one basis.
    There are no arguments for teaching intonation in terms of attitude, because the rules for use are too obscure, too amorphous, and too easily refutable.
    (Brazil, D, Coulthard, RM, & Johns, C, 1980, Discourse Intonation and Language Teaching, Harlow: Longman, p120)

You are not advised to use both phonemic transcription and intonation diagrams together because that muddies the water and confuses your learners.

- index -


/iːd mɔː 'præk.tɪs/?

You can easily get as much practice as you like by opening a book at random, selecting some words and transcribing them.
You can then go online or to a dictionary and check your answers.  A good source for that is PhoTransEdit.

If you would like to get some practice transcribing spoken language, follow the link at the end of this guide to Audio transcription practice.

Lastly, try transcribing this sentence and then check your answer here: eye

The pronunciation section of the in-service index on this site has separate guides to consonants, vowels, connected speech and intonation as well as a guide to syllables and phonotactics which discusses syllabic consonants and much else.

- index -

Audio transcription practice consonants only This is the test you did at the end of the section on consonants (new tab)
Audio transcription practice vowels only This is the test you did at the end of the section on vowels (new tab)
Audio transcription practice 1 for some practice in transcribing what you hear in short sentences
Audio transcription practice 2 for some more (and more difficult) practice in transcribing what you hear
The pronunciation section of the in-service guides for more guides to aspects of pronunciation
Weak forms for a list of weak forms in PDF format
Three more tests for a set of three recognition-only tests

Crystal, D, 2008, A Dictionary of Linguistics and Phonetics (6th edition), Oxford: Blackwell Publishing
Roach, P, 2009, English Phonetics and Phonology: A practical course, 4th edition, Cambridge: Cambridge University Press
Wells, J, 2009, phonetic-blog.blogspot.com/2009/12/triphthongs-anyone.html