logo ELT Concourse teacher training
Concourse 2

Testing and assessing vocabulary


It is almost a truism that vocabulary is tested directly or indirectly in all tests of a learner's language ability.  It is difficult to conceive of any type of test which is lexis free.  Even when a test item looks like this, for example:

Select the correct answer:
    John couldn't get in because he ____________ left his keys at the office.
    a) was leaving
    b) had left
    c) would leave
    d) has left

and is presumably designed to test the subject's knowledge of past-tense forms in English, it requires the test taker to understand the connection between keys and the ability to get in somewhere, the meaning of the modal auxiliary verb, the meaning of the adverb in get in and the logical connection implied by the word because.
Without that lexical knowledge, it's hard to demonstrate grammatical knowledge and get the right answer.
So, if vocabulary knowledge is routinely a part of testing all the other sorts of language ability ...


... why test vocabulary separately?

This is not the place to set out the different purposes that tests fulfil, whether they are achievement, diagnostic, proficiency or progress tests.  Nor is this the place to discuss the motivating factors that tests sometimes enhance.  We are concerned here with testing vocabulary in particular, not testing in general.
Guides to general areas of testing and lexis are linked in the list of related guides at the end.

There are a number of good reasons for testing vocabulary discretely from other skills and abilities.

  1. Backwash:
    Explicitly testing vocabulary often results in teachers paying more attention to its teaching and being more consistent and discerning about what items they focus on.
    Backwash may also have an effect on the learners.  If they know that vocabulary is going to tested discretely, they may well be motivated to review what they have encountered and consigned to notebooks, probably in no particular order.  They may even be persuaded to revisit and reorganise their vocabulary notebooks.
  2. As a measure of overall ability:
    Vocabulary knowledge has been shown to be a good indicator of a learner's overall ability in a language so, for diagnostic and placement purposes, vocabulary testing is a useful tool.
  3. Face validity:
    Some learners make very great efforts to acquire vocabulary because they recognise, quite rightly, that although it is difficult to communicate without grammar, it is impossible without words.  If we do not test vocabulary in an identifiably discrete way, learners may not feel that their abilities are being fairly assessed.
  4. Depth vs. breadth:
    Testing vocabulary incidentally, in a mix of other test types, may give us some measure of the breadth of learners' lexical knowledge (i.e., the size of their lexicons in crude terms) but is unlikely to provide anything like the precision we require if we want to measure the depth of their knowledge of lexis.  This means testing vocabulary separately so we can get some estimation of how well items are known, not just how many are recognised.
  5. Learning is remembering:
    Vocabulary learning is not subject to a rule-based approach in the same way that learning grammar rules and applying them can be.  There are distinct patterns, of course, such as collocational aspects, multi-word verbs, synonymy, homonymy and so on but, essentially, learning vocabulary means remembering vocabulary and testing it is a strong motivating factor in encouraging learners to review and recycle what they have encountered.
  6. Revision and review:
    Vocabulary is an area where it has been shown that multiple exposures to lexemes in context are required before items can be said to have been acquired.
    Vocabulary testing provides an opportunity, when giving feedback, to review, recycle and extend learners' knowledge.
  7. Spacing:
    It has been shown that it is better to space out vocabulary learning and recycling rather than concentrate it in blocks of intense effort.  In the trade, this is known as distributed practice.  Such practice, it is argued, allows short-and long-term memories to integrate.
    Testing at regular intervals allows the teacher to space out the learning and recycling at gradually increasing intervals.


What to test: depth vs. breadth

Hughes asserts that for the purposes of a placement test, i.e., "in essence a proficiency test" (Hughes, 1989:147):

All we would be looking for is some general indication of the adequacy of the student's vocabulary.

If that were all there was to it, we simply need to focus our test on the vocabulary items we consider most frequent and useful for our learners, perhaps drawing on something like the general service word list or an academic word list and design a test to see if our learners can accurately understand (by some kind of matching task) and use (through a form of gap-fill testing) the items we have targeted.
That will give us a rough-and-ready indication of the breadth of their passive and active vocabulary.

Another possibility, of course, lies in asking how well learners know vocabulary items, not how many they know.  This will mean focusing on phenomena such as pronunciation, collocation, colligation, word formation (morphology), word grammar (transitivity, countability etc.) and perhaps some other factors concerning hyponymy, synonymy, simile, metaphor, style, register and idiomaticity.  This is what is meant by focusing on depth of understanding as well as on breadth of knowledge.
For this to work, we need to be a little more imaginative in how we construct test items, as we shall see.


What to test: targeting the test

What you test is dependent on why you test, i.e., what the test is designed to tell you.

  1. Achievement and progress tests
    will focus, if they are to be fair, only on the items taught or encountered on the course.
    They are either (or both):
    1. Formative and frequently carried out to identify what needs to be recycled and reviewed.
    2. Summative and carried out at the end of a course to see how well the items have been acquired as a way of evaluating the success of the programme.
  2. Proficiency, placement and diagnostic tests
    will focus getting an estimate of the size and depth of learner's knowledge and will depend, usually, on some kind of sampling.  If, for example, we have identified 2000 words that a learner at A2 level should know, a test of 100 randomly selected words from the list will represent a sample of 5% which is actually rather good statistically, although a sample of 200 words would, of course, be twice as good and twice as time consuming.
    When learners are at C1 level, however, they are expected to know around 4000-5000 words and random sampling becomes much more difficult because to attain a 10% sample rate, we would need a 400-item test which would take at least 3 hours to complete if one allows 30 seconds per item.
    That is impractical in most settings and explains why vocabulary testing in public examinations is often integrated into testing other aspects of language knowledge and ability.


Measuring breadth: vocabulary size

This is the attempt to discover how wide the test taker's vocabulary is in terms of understanding and using lexemes.  It is not a particularly sophisticated measure of language competence but there is evidence that breadth of vocabulary is a good indicator of general language proficiency.  However, there are provisos:

  1. Cross-language facilitation and interference:
    This is not an issue when all the learners share a common first language and no other apart from the target language.
    However, in groups where the learners come from a variety of language backgrounds and/or in which some members of the group may have learned other additional languages, the issue of cross-language influences begins to be felt.
    When learning English, for example, it is unlikely that learners from an Italic language background or who have learned an Italic language, no matter what their first language is, will have very much difficulty understand a word such as consolidation because a word which looks similar and carries the same meaning exists in most of these language (French, Italian, Spanish, Romanian, Portuguese etc., in all of which the word begins consolida- with only the ending differing slightly if at all from English).
    Speakers of Slavic languages will be slightly more challenged but a cognate word exists in most of them which can be identified with a little effort.
    However, learners from other language backgrounds, especially non-Indo-European ones will have no such support from their first languages and will need to have learnt the word from scratch.  Even in German, where a similar word exists, a more natural translation might be Vertiefung which bears no superficial relationship to the English word.  In other languages, the form of the word bears no relationship to the English word at all:
    samstæðu (Icelandic)
    укрепление (ukrepleniye) (Russian)
    sağlamlaştırma (Turkish)
    ukuhlanganiswa (Zulu)
    vakauttaminen (Finnish)
    pagpapatatag (Filipino / Tagalog)
    fanamafisana (Malagasy)
  2. Register:
    Learners who have certain interests and/or professions may find that some items are well known to them which are obscure to people with other backgrounds.  For example, a learner, whatever his or her first language, who happens to be a chemist will have little difficulty understanding a word like sulphate (or sulfate if you prefer AmE) which is similar if not the same in a very wide range of languages (but not all).  Other learners will be more challenged.
    Equally, a learner with a background in banking might well understand the terms direct debit, exchange rate, deposit etc., whatever her or his first language, where other learners will struggle.
    A learner who is particularly interested in motor racing will probably be familiar with bend, pit-stop, chequered flag and a number of other terms which are obscure to those without any specialist knowledge.

The moral is to try as far as possible in the selections of items to test to avoid bias of this sort and also, where appropriate, to focus only on the items which have been taught, or at least encountered, on the course.  That is feasible if one is designing a progress or achievement test, less so in designing placement, diagnostic or proficiency tests as we saw.


Ways of measuring vocabulary size

The following are not confined to measuring vocabulary size because they can be used to measure other aspects of lexical knowledge (if they are somewhat different designed).  Here are some examples and some comments:

Synonym tests are simple to design and administer.  For example:

Choose the item which is closest in meaning to fire:

  1. blaze
  2. combustion
  3. ignition
  4. eruption

This is a test which depends on knowing all the words and being able to match meanings.  A more searching test, sometimes, is to choose words and distractors which are close in form but not meaning.  For example:

Choose the item which is closest in meaning to search:

  1. seek
  2. clench
  3. reach
  4. trench

and a test like that can also be used to test whether learners can distinguish between homophones, like this:

Choose the item which is closest in meaning to feet:

  1. pause
  2. pores
  3. pours
  4. paws

The problem, of course, with test items like these is that there is no context so distractors must be very carefully chosen to eliminate any possibility that, in certain circumstances and with certain meanings, more than one correct answer may exist.

Definition tests are easy to design if you have a learners' dictionary to hand.  An examples are:

Which word means extremely frightened?

  1. scared
  2. horrified
  3. petrified
  4. afraid

Which definition of frown is correct?

  1. an expression showing anger or disapproval
  2. a gesture showing dislike
  3. raising your eyebrows to show surprise
  4. stretching your mouth to show dislike

There are problems with both of these test types because the first depends on understanding that the words are adjectives not verbs and the second, of course, depends on understanding the words in the definitions such as disapproval, gesture (vs. expression), eyebrows etc.

Gap-fill tests can get over the issue of a lack of co-text.  For example:

Fill the gap with the correct word from the list of four:
The computer program __________ much faster processing of information.

  1. empowered
  2. let
  3. enabled
  4. qualified

The drawback with this kind of test is that, although the distractors should not in theory contain words the test taker does not know, it is often very difficult to identify distractors which are conceivable (but wrong) rather than wrong (and obviously so).  Another issue is that the co-text should also not contain unknown or ambiguous items.

Gap-fill tests in which no alternatives are given, may also be a way of measuring productive ability rather than recognition.  For example:

Fill the gap with one word only:
Mary lost her key so she __________ mine to get into the flat.

The obvious drawback with this sort of test of vocabulary is that it is very hard to write a series of item in which only one possible word is allowable.  In the gap in this example, borrowed, took, stole, appropriated, nicked and a range of other possibilities are allowable and that complicates marking by introducing an element of judgement of appropriacy.  Would you allow purloined, for example?
A way around this is to redesign the task like this to give the first letter and an indication of how many letters the word contains:

Fill the gap with one word only:
Mary lost her key so she b _ _ _ _ _ _ _ mine to get into the flat.

but that, naturally, makes it easier.

Using pictures to elicit productive vocabulary is a technique commonly deployed.  For example:

Write the correct words for the sports next to the pictures:
_______________ 1
_______________ 2
_______________ 3

Unfortunately, this only works for lexical items that can be unambiguously identified from pictures and even then, some items may be representable by more than one word and that complicates marking.

Definition tests, too, can be used to measure productive ability, like this:

Fill the gap with one word only:
Electronic devices which are connected to an amplifier and fit over both ears to play sounds are called:
____________________ .

The drawback is that there are very few words which can be completely unambiguously defined in this way.


Measuring depth: vocabulary use

The first decision concerns the selection of the aspects of a word that you want to test.  In the general guide to teaching lexis, the following were identified as what it may be necessary to know in order to 'know' a word:

  1. what a word means – what it denotes and what it connotes (if appropriate)
  2. how it is connected to other words which mean similar things (e.g., buy, sell, bargain, discount etc.)
  3. what words it commonly goes with (collocation) so we know we can't have a high tree but prefer tall as the adjective, for example
  4. what other meanings it can have (e.g., shop, bank etc. can have different meanings and fall into different word classes)
  5. how the word changes depending on its grammar (e.g., shop, shops, shopping, shopped etc.)
  6. what grammar the words uses (e.g., does it take a direct object, an indirect object, both, a preposition, does it have an odd plural or an irregularity? etc.)
  7. how to pronounce the word.
  8. what kind of situations the word is used in and who might use it.  Is it, for example, typical of a certain register?

Depth of meaning also concerns passive and active vocabulary, of course.  Here are some example test items with a commentary:

Item 1, word knowledge:

Use the word complain in a sentence of a minimum of 8 words.  Your sentence must contain a subject and an object

Clearly, this sort of test requires subjective marking although the marker will only be looking for accuracy concerning the target and ignore the rest but it tests a wide range of knowledge because the test taker needs to be able to

  1. recognise the word class
  2. understand the meaning of the verb
  3. know that it is a prepositional verb usually combined with about or of
  4. use the verb transitively with an appropriate object

For small test samples, this kind of item can be revealing.  The test can also be done orally and that will include a check on whether it can be pronounced adequately.

Item 2, collocation:

Mark with a tick or a x which words on the left can be used with the words at the top.
The first one is an example.

  rain snow wind sunshine
heavy tick tick x x

For variety and a little more precision, test takers can also be invited to put a ? by any item they consider doubtful.

Colocation can also be tested on a scale of naturalness so we could have:
Item 3:

Mark these sentences with a 1, 2 or 3 using a tick.
1 means it is the most natural
2 means it is possible but unnatural
3 means it is very unlikely or impossible
You can use each number as many times as you like.

  1 2 3
weighty issue      
heavy issue      
bulky issue      
lions groan      
lions rumble      
lions roar      
out of control      
beyond control      
on control      

Collocations of many sorts can be tested this way because there is a cline from wholly unnatural to slightly and fully natural.

Item 4, formality:
Formality sensitivity can be tested in the same way:

Mark these sentences with a 1, 2 or 3 using a tick.
1 means it is formal
2 means it is neutral
3 means it is informal
You can use each number as many times as you like.

  1 2 3
please pass the salt      
give me the salt      
would you hand me the salt, please      
they tend to be annoying      
they are a pain      
they are irritating      
I'm averse to swimming      
I am disinclined to swim      
I don't like swimming      

Item 5, register:
Register sensitivity can be addressed in the same way:

Mark with a tick or a x which words on the left you would expect to hear in the settings at the top.
The first one is an example.

  IT business football theatre
spreadsheet tick tick x x

Item 6, paradigmatic and syntagmatic relationships:

Mark with a tick or a x which words on the left you can associate with the words at the top.
The first one is an example.

  delayed alteration computer light
late tick tick x x

This is not an easy test to understand in terms of what you have to do so learners need a little training to look for the two types of relationships at which it is aimed.
Again, for variety and a little more precision, test takers can also be invited to put a ? by any item they consider doubtful.
The test encourages the learner to try to recognise words of a similar nature and word class (paradigmatic relationships) as well as those likely to co-occur syntactically (syntagmatic relationships).

Item 7, colligation:
It is possible to test learners' understanding of word grammar in a number of different ways.  For example:

Mark with a tick or a x which phrases are correct.
Then, if necessary, write the correct form in the box on the right.

    tick or x Correction
1 I am sorry for late      
2 I allowed him to come      
3 She let him to stay      
4 I concealed it under the table      
5 I concealed behind the curtain      
6 They arrived the hotel      
7 He donated them the money      
8 We handed over the doorman the tickets      
9 We expected him to arrive late      
10 We hoped her to come early      
11 We can probable come      
12 It's difficult but please try      
13 It's hard but please attempt      
14 She's an unwell child      
15 I very almost was late      

Only four of the above are correct (2, 4, 9 and 12) and the others target specific aspects of colligation which are exemplified in the guide to the area, linked below in the list of related guides.

Item 8, word class:
This is a simple test of what is known about a word:

Mark with a tick or a x which words on the left can in the word class at the top.
The first one is an example.

  noun transitive verb intransitive verb
bank tick tick tick
blow out      

Item 8, lexical sets
Sensitivity to sense relations can be tested in the same way:

Word sets:
Mark the odd ones which are not in the same word set as the first word with a x.
The first one is an example.

late   delayed   overtime x behind x overdue  
change   alter   modify   cancel   postpone  
machine   device   tool   utensil   gadget  
light   fire   ignite   illuminate   show  
taxi   mini-cab   rickshaw   train   rental car  
minor   trivial small minimum small

Item 9, testing hyponymy:
This is a key set of relationships to test.  It can be done, for example by:

Mark with a tick which word includes the meaning of the four words on the left.
The first one is an example.

  facility shop building
post office
health centre
town hall
  container holder vessel
education building institution

Item 10, synecdoche, simile, metaphor and other matters:
We can also test more sophisticated and difficult areas of lexical relationships, like this.

Which words can best replace the underlined words in :
The White House has decided to impose tariffs on steel.

  1. the US senate
  2. the US President
  3. the President's office

Which words can best replace the underlined words in :
He became an actor.

  1. He went on the stage
  2. He studied acting
  3. He went into the film industry

Complete the similes:

  1. He's like a fish out of __________
  2. It's as fast as __________
  3. She's as angry as __________
  4. I'm as deaf as __________
  5. It went like a __________
  6. They purred like __________

I have a lot on my plate this week means:

  1. I eat too much
  2. I'm very busy
  3. I am worried by many things

Item 11, word formation:
The understanding of affixation can be test both receptively and productively, like this:

Mark these words as correct (tick) or incorrect (x).  If it is incorrect, put the correct form on the right.

  tick or x  

Productive ability can be tested this way, too, as in, e.g.:

Fill the gaps with the correct form of the base word.  Put a x where it is not possible to make a word.
The first one is an example.

  noun transitive verb intransitive verb adjective
snow snow x snow snowy

As you can see from the example of love here, items need to be carefully chosen because a range of derived words may be formable (lovable, lovely, loving, loved etc.).
Another way to do this is to populate a grid with some of the target stems or derivatives and get the learners to complete it with word or x.  Like this:

Fill the gaps with the correct form of the words.  Put a x where it is not possible to make a word.
The first one is an example.

noun verb adverb adjective
snow snow x snowy

A simpler way is something like:

Select the correct word:

  1. unpossible
  2. inpossible
  3. impossible

Select the correct word:

  1. dirtity
  2. dirtiness
  3. dirtfulness

Item 12, pronunciation
Although pronunciation is probably best tested orally for obvious reasons, that is not always practical especially if the test setter and the test taker are not in the same place.  It is possible to test it in writing, however.  For example:

Which word rhymes with hoped?

  1. dropped
  2. shopped
  3. soaped
  4. locked
  5. adopt

Which words contains the same sound of the 's' as in sugar (/ʃ/)?

  1. sword
  2. leisure
  3. school
  4. shame
  5. measure
  6. muscle

One can design items of this sort in which the test taker needs to select multiple possibilities and, if the test taker is familiar with the phonemic script, it makes life considerably easier so we can have, e.g.:

Which words contain the sound /uː/?

  1. sword
  2. foot
  3. lost
  4. loose
  5. sure
  6. should
  7. goose
  8. cruise

and one can add, "as in choose" to the rubric to make it clearer.

This may not be an ideal way of testing pronunciation and it is unlikely that one can focus on anything more than vowel and consonant pronunciation this way but it may be the only way in some settings.  Trying to test features of connected speech, with the possible exception of the weak-form schwa (/ə/), is very difficult.

If you follow some of the guides linked below, you may discover other phenomena concerning lexis which, with a little imagination, you can assess in ways similar to those exemplified above. 

Related guides:
idiomaticity which considers levels of transparency, strong collocation, binomials and so on
collocation a guide to a key area to see what you might be testing
colligation a guide with examples of colligation types that you may consider testing
synonymy which includes explanations of metonymy, synecdoche, simile, metaphor and hyponymy, all of which can be tested
lexical relationships for an overview of synonymy, hyponymy and other terms
testing and assessment a general guide to testing, assessment and evaluation with some key terms explained
the lexis index for a list of other guides in this area

Hughes, A, 1989, Testing for Language Teachers, Cambridge: Cambridge University Press
Schmitt, N, 2000, Vocabulary in Language Teaching, Cambridge: Cambridge University Press