logo ELT Concourse teacher development
Concourse 2

Doing classroom research: a short course


This guide is about finding things out for yourself.
It is part of the Teacher development section of the site but is linked from numerous other areas and guides because, sooner or later, most teachers start to become curious about what works best for them and their students and what their learners feel about the service they are receiving.
One can, naturally, read books about methodology and teaching techniques, procedures and approaches or take a teacher training course at an appropriate level and many of these resources will focus on what someone else thinks constitutes best and most efficient practice.
There is no harm in relying on the wisdom of others in this respect and most of us will do so at many points in a teaching career.
However, by their nature, these resources will not apply specifically to you, your learners or your and their aims so they will, at best, be generic responses to questions such as:
    How do I motivate my learners?
    What types of tasks appeal most and work best in the classroom?
    How do I ensure that my learners are getting what they need as well as what they want?
    How can I best prepare my learners to use the English they need in their lives?

    Why are my learners unhappy?
and so on.


The starting point: what do you want to find out?

This seems such an obvious question that it's hardly worth taking the time to spell things out but it is surprising how often otherwise good research projects end up being worthless because too little thought is given at the outset to what the project is for and, indeed, what type of project it is.
There are choices to be made and they fall into two main categories, like this, with the right-hand column containing examples of the kinds of questions to ask:


To explain a little:

are designed to find out What happens if ... .  In other words, they are designed to examine causes and effects and make some kind of measurement of outcomes.
come in two sorts (hence the change of colour for the last box in the diagram):
  1. Descriptive surveys include the first four examples and they are concerned with what not why or how.
  2. Analytic surveys relate to the last example and, while they are also concerned with what, they are simultaneously focused on why and how.


Six-step research methodology

Whether you are setting out to conduct an experiment or a survey of some kind, you need to follow a simple procedure.  There is nothing immensely time consuming or complicated about this but it is as well to set it out clearly.  Like this:

1Ask the question (see above for some examples)
You need to make sure that the question is answerable in less than a few decades so do not ask:
    How can I teach better?
but prefer something a little less ambitious such as:
    Can I teach grammar more interestingly?
    What do my learners feel about the correction techniques I use?
    Which of two possible procedures will work best for this target?


2Hypothesise an answer to the question
If you don't do this, then you will be stabbing in the dark.  For example, if you want to answer a question like:
    Will a Test-Teach-Test approach appeal to this group of learners?
then your hypothesis will be either:
    A Test-Teach-Test approach will appeal to this group of learners
    A test-Teach-Test approach will not appeal to this group of learners.
Then, when you gather results, you can see clearly which hypothesis the results support.
Hypotheses must be testable statements.  That is to say, they must be disprovable.

3The experiment or the survey
This is the heart of the project, of course, and needs to be very carefully designed bearing Steps 1 and 2 in mind.  More on this below.

4Observe the outcomes
For an experiment, this involves looking at the result.  For a survey, this involves assembling the data.
Many people find that assembling the data graphically is helpful because it is often easier to see patterns in graphics (and making the graphic often exposes them) than seeing patterns in columns of figures.

5Analysis of outcomes
If the experiment or survey has been well designed, this will be easy.  If it hasn't, it will be impossible because the data will be dirty and the outcomes unclear.

6Draw conclusions
Go back to step 2.
    Are the hypotheses supported or refuted?
Take them one by one.



You can see from the example questions above that experiments and surveys have different aims.  Now, then, is the time to ask yourself a few specific questions.  On the answers will depend everything that follows.  The questions are:

  1. Do I want to find out what happens if I change something in my classroom?
  2. Do I want to find out how well a new approach that I am using is working?
  3. Do I want to find out whether I can change something I currently do to make it more effective?
  4. Do I want to know what my learners think?
  5. Do I want to find out what my learners want to learn?
  6. Do I want to find out whether something I currently do is effective?
  7. Do I want to see how my learners' attitudes to what they do affect how well they learn?
thinkwrite Task:
Look at the questions again and write a yes or no answer to each one on a bit of paper.
Click eye open when you have answers.

We will take the areas individually in this guide but it should be remembered that we are not dealing with wholly watertight categories.

Nevertheless, thinking now about the sort of research you are concerned with will save a great deal of time and heartache later.



We will start with experiments because they often lead on to surveys or are the next logical step in a survey.
An experiment is an attempt to manipulate a single variable or set of very closely related variables to try to see what causal effects the variables have on results.


An example

If the torch you use to find your way around in the dark has stopped working, you can carry out an experiment to ascertain why this should be.  So, you proceed as we saw above in a 6-step process:

  1. Ask the question:
        Why has the torch stopped working?
  2. Hypothesise answers:
    1. The torch has stopped working because the battery is exhausted.  This is variable No 1.
    2. The torch has stopped working because the bulb has diedThis is variable No 2.
  3. Experiment:
        Change the battery for a new one.
  4. Observe:
        Does the torch now work?
  5. Analyse:
        Does the torch now work as well as it did before or not at all?
  6. Conclude:
    1. Yes, it works: hypothesis a. is confirmed.
    2. No, it doesn't work: hypothesis b. needs confirming or refuting so now we re-insert the old battery, change the bulb and repeat the observation.
      If the torch now works, we know that hypothesis b. is supported.
      If it doesn't, we need to look for variable No 3.
think Task:
Why did we re-insert the old battery in step 6b?
Click eye open when you have an answer.

In our professional context, of course, we are dealing with something very much more complex than a torch.  We are dealing with people and therein lies our problem: people are inherently variable.


A teaching example

Let's say, for the sake of this example, that we are setting out to discover why the vocabulary that we are teaching has not been retained as well as it should be by the learners in a particular group.  We know this is the case because we run weekly progress tests which include a test of the vocabulary to which the learners have been exposed.
Results for the vocabulary part of the test have been consistently disappointing.

At the moment we do these things in terms of teaching vocabulary:

  1. We encourage learners to keep a vocabulary notebook in which they write a short sentence or clause to set the lexeme within some co-text.
  2. We get the learners to translate the new words into their L1.
  3. We always introduce new vocabulary in a written text so the learners can see it in context.
  4. We run a topic-based syllabus so all the vocabulary for the week is in the same general lexical field.

Now we want to try something new so we can introduce a new phase and, for this experiment, we want to see if forcing the learners to use all the new vocabulary from a day's lessons in a short piece of written work will help to move the items from the short- to the long-term memory.
We must be very careful to keep the four elements above constant because we only want to see the results of a new procedure.
If we take out one of the things we already do or add in something else, we will never know whether removing that element or introducing a new one has the effects (if any) that we observe.
So here are our six steps:

  1. Ask:
        Why is vocabulary not being adequately learned?
  2. Hypothesise:
    Vocabulary is not being retained because we have failed to provide a task to incorporate new lexis in a piece of written work.
  3. Experiment:
        Provide such a task for each day's lessons for a week.
  4. Observe:
        When we test the learners on this week's work, is there any change in the results?
  5. Analyse:
        Are the learners better or worse at retaining the items?
  6. Conclude:
    1. The learners are better at retaining the items.  Our hypothesis is confirmed and we should continue with this procedure.
    2. The learners are worse or no better at retaining the items.  Our hypothesis is refuted.

Notice that even if conclusion 6b. is the outcome, the experiment hasn't failed because we have shown that the effect is non-existent so we can devote that amount of teaching and learning time to something else.
What this means is that we have to think of another way that vocabulary retention might be enhanced, implement it and run the experiment again.


A little scepticism is called for

think Task:
Can you see how this experiment might be flawed?
What other variables might be in play?
Click eye open when you have answers.

These are, of course, things that should have been borne in mind before we undertook the experiment so thinking back from the possible outcomes to the design of the experiment is often a useful technique.
As we said, the problem we have with classroom experimentation is that variables are multiple and it is difficult, some say impossible, to factor them all in or out of the equation.  However, we can try.

think Task:
Can you suggest how the effects of problems we have set out here can be minimised?
Here is the list again.  Please think a little before you click on the eye to reveal some suggestions.

Difficulty level
eye open
Current practice
eye open
The learners
eye open
Text difficulty
eye open
eye open
eye open
Word class
eye open
eye open
eye open
Long-term validity
eye open

It's not at all easy to achieve all these solutions to the possible issues but it may be the case that we can dismiss most of them quite easily and that allows us to focus only on the important ones.


Experimental fallacy #1

Post hoc ergo propter hoc
(After this, therefore because of this)

The Latin above is often referred to as the post-hoc fallacyJust because something is associated with something else, that does not necessarily lead to the conclusion that the relationship is causal.
It has often been noted that it rains more frequently at the weekend than during the working week.  That does not mean that the names we give to days of the week affect the weather directly.
We shall meet this fallacy again when we look at the attempt to carry out analytical surveys.

We may, through this little experiment, have discovered that setting a piece of written work has been associated with better retention of vocabulary items but we cannot be absolutely sure that we have discovered causality because association is not evidence in itself of causality.  It may be the case, for example, that any new task would have had the same effect and it was the novelty value that caused the increase in commitment to learning.
There are ten factors above (and you may have thought of more) that may have intervened to muddy the results of this experiment so we should be careful not to jump to conclusions.
It has been pointed out (see https://upload.wikimedia.org/wikipedia/commons/d/de/PiratesVsTemp%28en%29.svg for example) that we can plot global average temperatures against the number of pirates preying on the world's merchant shipping and see that as the number of pirates falls, the average global temperature rises in direct proportion.
That does not mean that more pirates keep the planet cool.


Experimental fallacy #2

Generalising from the particular

We may, if we have been careful to exclude all the complicating factors and variables we can't control, have discovered something useful about this group of learners.  It may be tempting to make a leap and suggest that instituting a task which forces learners to embed newly encountered lexis in self-produced written texts will generally result in better vocabulary retention for all learners everywhere.
Such a leap would be unfounded and perilous because we cannot make a generalisation of this sort without a great deal more evidence.
We can, and often do, make assumptions about small groups based on what we know about many groups in general but to work the other way and generalise about large numbers from small numbers is not a defensible procedure.
Were it, for example, to be found that the same results were obtained from five other groups in the same institution, we may be justified in suggesting (not claiming) that the task has a beneficial influence on vocabulary retention for the kinds of learners taught by the methods employed in that institution.  To generalise further would not be justified by the evidence.


Descriptive surveys

Descriptive surveys are experiments without the experimental bit.
With an experiment, we set up the factor that we want to investigate and then see what the outcomes are.
With a descriptive survey, we look at the here-and-now reality and investigate what it is and how people feel about it.
So, for example, with our little experiment above concerning vocabulary retention, we could alternatively just introduce a range of ways to help people retain the items and then ask them which one(s) helped them to learn the words they need.
Such a survey might be done via a questionnaire like this, for example:

In this, please score each technique from 1 (does not help me at all) to 6 (is extremely helpful)
Put a ✓ in one box for each technique only
# Technique 1 2 3 4 5 6
1 Writing a short text containing all the new words            
2 Using my vocabulary notebook            
3 Writing the item in a sentence or a clause            
4 Writing the translation next to the item            
5 Drawing a picture to remind me of the meaning            
6 Saying the word aloud many times            
7 Writing the word many times            
8 Using the new word whenever I have the chance            

  1. Why are we asking people to rate things on a scale rather than just say Yes it helps or No, it doesn't help me?
  2. Why are there 6 boxes on the right and not, say, five, seven or nine?
  3. What data will we get from this?
  4. What problems are there with this?
Click eye open when you have answers.

As you can see, a scaled matrix item like this one needs to be designed very carefully if it is going to be helpful and it needs to be administered with a good deal of care.
We could, providing we have the data from the vocabulary tests we administer weekly, then compare the responses to this survey with the results from the tests and try to match what people do with how well they retain the lexis.  We still will not have proven any kind of causality, of course, because it may be the case that people are doing things which are not very effective but, because they have good memories, still managing to retain the lexemes well.
More probably, we would use a survey like this to establish what people currently do as the basis for introducing some learner-training sessions to extend their repertoire in an effort to help them learn.
There are ways to overcome the disadvantages.


Problem #1

Learners don't always know what they think

It often requires a good deal of insight into one's own mental processes to answer questions about what helps you learn.  Not all learners have the skills to do that and, if some in the group do and others don't, there is going to be a mismatch of data types: the informed vs. the stab-in-the-dark answer.
This issue is to do with insight and it can be represented like this:


This sort of diagram was developed by Joseph Luft and Harrington Ingham to investigate how people see themselves and how others see them.  The name of the technique is a Johari window (derived from the authors' first names).
The contention is that we all have these four 'rooms' in our minds which relate to how we want to be seen, how we really are, how others see us (and we don't) and the subconscious ideas about ourselves which we are not able to articulate.
The four central areas are described by the 'Things ...' sentences around them.
For example, the Facade is the area of things that I know about myself but keep hidden from others.  The Blind Spot refers to things that others see in me which I know nothing about.  Insight is our ability to draw on our unconscious understanding to realise something about ourselves.
Originally, the technique was used for people to select 6 adjectives (from a list of 50 or so) to describe themselves and then their peers were asked to select 6 which described the subject.  Putting these adjectives into the grid, it is claimed, helps people to see themselves as others see them and be better able to use insight and introspection.  We may choose adjectives which we feel we want others to ascribe to us but the list may be very different from the list that others who know us would ascribe to us.  That's the difference between the Facade and the Blind spot.
For our purposes here it stands as a reminder that however carefully a questionnaire or interview process is designed, it ultimately depends on people knowing about themselves and, as the window demonstrates, few of us really do.
Asking a learner, therefore, to be insightful about something as complex as how they learn may be doomed from the outset because few of us can answer that question very accurately if at all.

Circumventing the issue is not easy but we can make inroads by discussing with our learners what they think before asking them to complete a questionnaire in the effort to bring some at least of our unconscious feelings into the conscious spheres.

either or

Problem #2

Fire or water

There is often a temptation when surveying people's views to ask either-or questions because we know that the data we get from such questions will be very easy to analyse.  We simply have to count the Yes and No answers and compare the numbers.
Unfortunately, the data are frequently useless because there is no room for manoeuvre.
Questionnaires, therefore, which work like this are usually almost a complete waste of everyone's time unless we want the crudest possible view of a situation, even when we include a third option like this:

Please respond by putting a ✓ in the correct box for you
# Technique Yes No ?
1 This lesson was very useful      
2 I learnt a lot of new words      
3 I would like to revise the lesson      
4 I want more lessons like this      

The issues are:
    How do you respond if you found the lesson useful but not very useful?
    What does
a lot mean? 10, 15, 20 or what?
    Does item 4 mean you want all future lessons to be like this or only some of them and if the latter, what percentage would be about right?

Only item 3 will give you some usable data because it really is an either-or response you are looking for.  You still need to decide, however, what percentage of Yes responses would lead you to planning the revision lesson.

Overcoming this problem is straightforward and it involves using a scale from a zero response to a maximum response.  If we design the scale rationally, we will be able to capture much finer grained data than simply Yes, No or Maybe.
We should, of course, not take this too far and have an over-extended scale in an effort to capture ever-finer data.  Around six for a scale works well but more than that begins to muddy the water because the gradations are too subtle.


Problem #3

Relativity and gradability

If our questions in any type of questionnaire include items which are subjectively interpretable, then the data we get may be muddied by different interpretations.
For example, question items such as:

  Never Rarely Sometimes Often Frequently
1 How often do you spend time reading through the notes you take in class?          

depend on the learners all having the same understanding of the adverbs and that is a very uncertain assumption.  Native speakers will often disagree about whether three times a week represents Often, Frequently or Sometimes and to expect learners of the language to be more consistent is unrealistic.
A better item would be:

  Less than once a week Once or twice a week Three to five times a week Five to ten times a week More than ten times a week
1 How often do you spend time reading through the notes you take in class?          

because that is measurable (but still subject to people actually being able to count how many times they do something with any accuracy).  It will not tell us how seriously they do this, unless we ask a supplementary question such as:

  Less than 1 hour a week Between 1 and 2 hours a week Between 2 and 5 hours a week Between 5 and 10 hours a week More than 10 hours a week
2 How much time do you spend time reading through the notes you take in class?          

and even then, we are relying on people being able to estimate accurately something that they have probably not been actively measuring.

Gradable adjectives are another obvious stumbling block in this kind of matrix grid questionnaire because everyone's definition of a short text, a difficult grammar item, a useful word or an important lesson will be different.
Asking learners to grade questions which themselves contain gradable concepts will not provide very useful (or usable) data so it is better to avoid terms such as difficult altogether and to define terms such as short carefully.
We can, of course always get the learners to grade the responses in something like:

  Very easy Easy About right Quite difficult Very difficult Impossible
1 I found this lesson ...            

which is one way forward but even then people's definitions will vary and what is About right for one learner may be Quite difficult for another who is at a lower level or wants to be challenged less.  We will not know from the results of this question which is which and we will also not know whether some learners actually prefer lessons which are easy while others like to confront difficult materials.


Problem #4

I don't understand the question

An added complication for us is that we are dealing with people whose first language is probably not the language in which we are writing the survey items.
In some situations, this is avoidable because all the subjects of a survey or experiment have the same first language and in that case, naturally, we can write the items in the respondents' first language and avoid any terrible misunderstanding.  That's not 100% a given because people are adept at misinterpreting questions and statements regardless of the language in which they are written.
We can get around this but it is a time-consuming business and involves the use of face-to-face interviews to explain the content of the survey and elicit the result that is fair to the views of the respondents.
Face-to-face interviews are open to bias and we should be careful to avoid that by duplicating the written survey as much as possible so we are not asking open-ended, uninterpretable questions but eliciting the same evidence that a written survey is intended to discover.
If our target respondent group is quite small, such an undertaking is perfectly possible and bias can be further reduced (or, at least, evened out) if only one person does the interviewing.
It is as well to break down the questions and the possible responses like this:
    How often do you spend time reading through the notes you take in class?
    Do you do this
        Less than once a week? (pause for an answer)
        Once or twice a week? (pause for an answer)
        Three to five times a week? (pause for an answer)
        Five to ten times a week? (pause for an answer)
        More than ten times a week? (pause for an answer)

Even when we think we have the answer right, it's as well to double check with something like:
    So, you spend time reading through the notes you take in class between 3 and 5 times a week.  Is that correct?


Problem #5

What is a 'good' result?

What's wrong with these survey items?

Please respond by putting a ✓ in the correct box for you Very easy Easy About right Quite difficult Very difficult Impossible
1 The work we do in class is ...            
2 The materials we use are ...            
  Very boring Boring Not very interesting Quite interesting Very interesting Extremely interesting
3 The texts we use are ...            
4 The topics of the lessons are ...            
  Very strongly disagree Strongly disagree Disagree Agree Strongly agree Very strongly agree
5 I like working in small groups and pairs.            
6 I think the homework we do in useful.            

On the face of things, this is quite a good set of items because care has been taken to ensure that the options apply to the types of statement so there are three sets of them.
However, when it comes to analysing the results how are we going to score the responses?
If we give all the left-hand responses a score of 0 and all the right-hand responses a score of 5 then the analysis will be made more difficult because a high score for items 1 and 2 will mean that the classwork and materials are too difficult but a high score for items 3 to 6 will mean that the learners feel very positive about the texts, topics, grouping and homework tasks.
To make matters worse, most teachers will be happy if learners score the first two items around 2 and 3 because that means that most find the difficulty level challenging but not overly so.  Lots of scores around zero or 5 will mean that the work and the materials are at the wrong level.

The moral of the story is to look at a survey questionnaire very carefully to make sure that the scales represent the same importance to us in terms of how positive or negative the outcomes are.  In the case above, this means analysing the outcomes very differently because scores of 0 and 1 are just as negative as scores of 4 and 5.
Alternatively, we can reorder the scales like this:

Please respond by putting a ✓ in the correct box for you Very easy Impossible Easy Very difficult Quite difficult About right
1 The work we do in class is ...            
2 The materials we use are ...            

and that will mean that scores of 0, 1, 2 and 3 are all negative in different ways but scores of 4 and 5 are satisfactory.  Analysis is still complicated, naturally, because responses of 1 and 3 mean things are too difficult whereas responses 0 and 2 mean that things are too easy.  Only responses 4 and 5 are positive in this case.

Properly designed survey materials as we saw above can give us good data but they do not tell us anything beyond what people think.  Any speculation about the reasons for people's reactions and the implications of them will be just that: speculation.


Qualitative data

Up to now, we have been dealing with gathering quantitative data in a survey and that is how things are normally arranged because we can analyse the outcomes mathematically as we saw above by assigning numbers to the various responses we invite people to give to statements or questions.
Occasionally, however, we want to gather qualitative data from a survey and for that we ask open questions rather than asking people to scale their answers or provide Yes-No-Maybe responses.
The obvious problem with attempting to elicit qualitative data is analysing the results.  For example, let's say we have asked, either via a questionnaire or in an open-class discussion, something like:

  1. What should be the focus of the next two weeks of the course?
  2. How do you prefer to work in class?

and we have noted the following responses:

  1. What should be the focus of the next two weeks of the course?
    1. More vocabulary
    2. More speaking and listening practice
    3. Revise the grammar
    4. Play more games
    5. Do a project
    6. No more tests, please
  2. How do you prefer to work in class?
    1. More slowly
    2. With a group of other students
    3. Alone
    4. By reading texts
    5. With the whole class
    6. In pairs

It is almost impossible to categorise the responses in any sensible manner because they are so disparate.  Some answers to the first question refer to course content, some to procedures, some to grammar and so on.  The answers to the second question vary in type, too, with some referring to grouping, some to speed and some to materials.  To make matters worse, we don't know what prefer really means because it could imply that the respondent felt it was effective or felt it was enjoyable (or both).
Deriving sensible data from this sort of list of answers is, it appears, a forlorn hope.
However, all is not lost because we do not have to stop there and we can use qualitative data to form the basis of a quantitative analysis survey by taking all the answers we have elicited and inserting them into a scalar survey, like this:

Please respond by putting a ✓ in the correct box for you Strongly agree Agree Disagree Strongly disagree
1 I would like more vocabulary work        
2 I want more speaking practice        
3 I need to revise the grammar we have learned        
4 I would like to play more language games in class        
5 I would like to do a project        
6 I do not want any more tests        
7 The lessons move too quickly for me        
8 I think working in groups is useful        
9 I prefer to work alone        
10 I think working in pairs is useful        
11 I prefer to work with the whole class together        
12 I think reading texts help me to learn        

and from the results of this sort of survey we can take things forward by assigning numbers to the responses and seeing which are most favoured by the learners in the group.
We can also identify outliers for which only a few learners have opted but they have opted very strongly one way or another.
What we have done, in effect, is moved from a qualitative questionnaire to one on which we can conduct a quantitative analysis of the responses.  The learners have, in other words, set the agenda for us.


Analytic surveys

Unlike descriptive surveys, analytic surveys are intended to discover a cause-and-effect relationship.  Our example here is the use of varying the procedural design of lessons to see what (if any) effect that has on the engagement and motivation of our learners.
We are assuming here that previous research has shown that heightened motivation leads to better learning, of course.
That may even be true.

base line

The base line

If we are looking for a causal relationship, we can only discover it by observing changes.  In this sense, analytic surveys are akin to experiments because we want to find out people's current views, then make changes and re-survey them to see what effect the changes had.  Because we are concerned here to measure motivation and engagement alone, that has to be the focus or we will get lost in the data.

There are two ways to set a base line in principle:

  1. We can select two identical groups of learners who have had teaching which is different in terms of the variable (the lesson procedures) which we are interested in investigating.
    One group will be the control group and one the experimental group.  The control group will (or should) tell us what would have happened if no changes at all had been made.
  2. We can use the same group and conduct a longitudinal survey by gathering base-line data from them, then varying the teaching procedures and re-surveying them to discover what changes, if any, are evident.
    In this way the group becomes its own control group (before and after the experiment in lesson procedures).

There are problems with both ways.

think Task:
What are the most obvious problems with these two ways of setting a base line?
Click eye open when you have answers.

Whichever approach we take, we have to try to minimise the effects of these issues but we need to maintain the awareness that we cannot hope to remove them altogether.

There are also advantages with both procedures:

  1. Choosing identical (or nearly so) groups means that we can tightly control what happens because we can conduct the survey quite quickly and elicit responses from both groups simultaneously.  This cuts down on the opportunity for extraneous events to affect our results.
    We have an identifiable control group which means that we can be reasonably certain that any changes between it and the experimental group are caused by the factor which we are altering (lesson procedures in this case).
  2. Using the same group for both surveys should mean that it is acting as its own control group because, if the members of the group are not changed or affected by other factors, we have arrived at the all-things-being-equal state that we seek in all data.
    It is also, in most situations, the only practical procedure.

Six steps again

When we conduct an analytic survey, we still need to take the six by now familiar steps.  The difference is that one of the steps is repeated, like this: 

  1. Ask:
        How can be enhance our learners' motivation and engagement with the lessons?
  2. Hypothesise:
    If we vary the procedural approach we take, mixing up Presentation-Practice-Production with Test-Teach-Test and Task-based learning procedures, the variation in approach will create greater interest and commitment to learning.
  3. Experiment:
        Continue with some PPP lessons but have an equal number of TTT lessons and TBL lessons.
  4. Observe:
        Survey the learners' levels of commitment before and after we make the changes.
  5. Analyse:
        Compare the results.
  6. Conclude:
    1. The variations increase motivation: hypothesis supported.
    2. The variations have no effect: hypothesis refuted.
    3. The variations have the effect of reducing commitment: hypothesis refuted.

These conclusions may well be valid but we shouldn't be too sure of ourselves because post hoc, as we saw, does not necessarily mean propter hoc.



How exactly one wishes to measure levels of commitment and engagement is variable and there are a number of ways to do it:

  1. We can use a survey questionnaire with the items designed to avoid all the disadvantages and potential problems we have outlined above.  We could, for example use something like:
    In this, please score each item from 1 (strongly agree) to 6 (strongly disagree)
    Put a ✓ in one box for each item only
    # Statement 1 2 3 4 5 6
    1 I enjoy all my lessons            
    2 I try very hard to learn in my lessons            
    3 I like the teachers' approaches to lessons            
    4 I look forward to coming to my lessons            
    5 I feel that I am making good progress            
    6 I am motivated to work hard            
    7 I always do my homework and other tasks            
    8 I never switch off in lessons and disengage            
    and, if we are careful, we can make sure that a high score on each statement reflects a higher level of motivation (providing, of course, that all the learners have understood what the statements mean).
    This questionnaire will have to be administered to both groups twice if we are using two parallel groups and to the same group twice if we are using one group only as its own control.
  2. We can rely on the impressions of the teacher(s) involved in the survey and for them, we might have a different questionnaire something like this:
    SURVEY 1
    In this, please score each item from 1 (I strongly agree) to 6 (I strongly disagree)
    Put a ✓ in one box for each item only
    # Item 1 2 3 4 5 6
    1 This group is strongly motivated to learn English            
    2 Results of informal assessment are encouraging            
    3 Results of formal assessments are improving            
    4 I am enjoying teaching this group            
    5 Teaching this group is easy            
    SURVEY 2
    In this, please score each item from 1 (I strongly agree) to 6 (I strongly disagree)
    Put a ✓ in one box for each item only
    # Item 1 2 3 4 5 6
    1 I have noticed more commitment and motivation in the learners            
    2 Results of informal assessment are encouraging            
    3 Results of formal assessments are improving            
    4 I am enjoying teaching this group more because they are engaged            
    5 Teaching this group has become easier            
    We will, of course, have to administer this twice for a survey which uses two groups but only once for a longitudinal study.
  3. We can use the services of a disinterested but informed observer who visits the group (or each group) at regular and frequent intervals as the survey advances and get them to fill in a simple form such as:
    In this, please score each item from 1 (I strongly agree) to 6 (I strongly disagree)
    Put a ✓ in one box for each item only
    # Item 1 2 3 4 5 6
    1 This group seem very motivated to learn English            
    2 All learners are engaged in all lessons all the time            
    3 All learners are keen to contribute            
    4 There is an excellent learning atmosphere            
    5 The teacher's job is quite easy with this group            
    This is, naturally, open to some subjective opinion but, providing we use the same observer each time, that does not matter too much because we are looking for differences, not absolute numbers.
  4. We can use other data-gathering instruments such as interviews and focus groups.

We can, of course combine the approaches and gather as much data as possible in the time available to us.



In theory, the analysis of results from a process like this is comparatively simple because we are not looking for absolute numbers, just trying to identify differences and be able to say whether they represent improvement or not.
All we need to do, therefore, is tot up the scores from all the surveys and see how they differ (if they do).
This is not the place for a course in statistical analysis so we have to rely on some less than perfect mathematical reasoning.  If, for example, we find that Survey 2 showed and increase overall of 10% over Survey 1, how significant is that?
The answer depends on how many people were involved and on the difference in real numbers.

If we had twenty learners in a group and 8 questions, the maximum we could have recorded is 48 X 20 = 960 and the minimum is 8 X 20 = 160.  The difference is 800.
If the total for Survey 1 was 400 and that for Survey 2 was 440, we have a 10% increase but, with 20 learners a total score of 40 can be obtained if 2 learners record 6 instead of 1 for all 8 items.  The rest of the learners may have returned exactly the same score on all the items in both surveys and, therefore, only 2 out of 20 learners (10% of the group) have been positively affected by the changes we made.  The two learners in question have been significantly affected but the other 18 are not affected at all.
By the same token, the figure of 10% could be reached if all 20 learners recorded just an extra 2 points each spread over the 8 items in the survey.  That would mean that some improvement is recognisable but it is slight and probably not significant because, on average, it represents only 0.25 extra points per item.
We have, therefore, to break down the results to reveal what has changed not just by how much it has changed.
For example, if that analysis reveals:

# Item Survey 1 Survey 2 Difference %
1 I enjoy all my lessons 30 45 50%
2 I try very hard to learn in my lessons 50 60 20%
3 I like the teachers' approaches to lessons 40 100 150%
then we are safe in suggesting that the changes have brought about a greater liking for the lesson approaches but that the effort learners are putting in is only slightly greater and probably insignificant (it could be achieved if two learners selected 6 instead of 1 on that question or five of the twenty selected 4 instead of 2).
We should be careful not to exaggerate the effects and with small groups in particular not to underestimate the disproportionate effect a few learners can have on the overall results.
In our example case, a quarter of the learners moving just 2 points up or down the scale can make a difference of 20% in the score for a single item.



The first thing to be clear about is that we can only draw conclusions about the groups of learners that we investigated.  We have no evidence at all that the results would be the same for different groups in different settings.

However, if we have been careful to control the variables sufficiently, we can draw some reasonable conclusions not only about what people felt but also about why their opinions differed (because we only changed one variable in the survey: the nature of lesson procedures).
That's quite a good outcome and should encourage us to develop the process for other groups to see if we can improve their engagement and motivation, too.


A case study

Finally, we'll take a real case study and see how a teacher can research her own classroom and discover not only what is going wrong but why.
The starting point for this little piece of research is:

When I conduct a Presentation-Practice-Production lesson, I frequently do not have time to do the third stage thoroughly and neither the learners nor I can judge how successfully the targets have been learned.
When I use a Test-Teach-Test procedure, I frequently have to cut the final stage short and rarely have time to do any kind of recap and summary in which the learners can judge their progress.
In both cases, I know I have a timing problem but I am not sure what causes it.

This is clearly an important issue because the failure to include a stage in which the teacher and her learners can assess progress means that learning has not been shown so it is possible that the course is moving on and leaving gaps in the learners' knowledge which will later negatively affect the teaching-learning process.
It also has the effect of potentially reducing the learners' commitment and motivation because they are rarely able to perceive progress.

To investigate this area, the teacher has decided to follow a six-step process and combine both a mini-survey and an experiment.  The survey will be different and quite subjective for her but the experiment will target what she hypothesises may be the root of the problem.

Here's how the process panned out:


Step #1: the questions

The teacher starts from this simple question:

What is causing the timing issue?

The question is deceptively simple, as we shall shortly see.


Step #2: the hypotheses

This teacher had three:

  1. The problem stems from trying to target too much so the presentation and practice stages (or the initial testing and teaching stages) are taking up too much time and unbalancing my lessons.
  2. The problem stems from the fact that I am talking too much and giving unnecessarily long explanations as well as having to repair activities as I go along because I have not been clear the first time.  This has the effect of slowing the pace of the lesson and wasting time.
  3. The problem stems from a combination of factors 1. and 2.

Step #3: the experiment and the survey

Here's the thinking:

I know that I have three hypotheses but if I try to tackle all three at once by changing what I do in two ways, I will never know which change affected the results.  In other words, I will be no better off because I still won't know the source of the problem.  That is a multi-variation issue.
So, I will take hypotheses 1. and 2. in turn to see what effect this has on my lessons.
I am anticipating three possible outcomes:

  1. Planning for less will mean I can cut down on the Presentation and Practice stages (or the initial Testing and Teaching stages) and that will allow more time for checking learning.
    Alternatively, this will have no discernible effect at all.
  2. Planning and monitoring my own explanations, talking time and instructions to make them more precise and economical will save time in the lessons and allow more time for later stages.
    Alternatively, this will have no discernible effect at all.
  3. Both actions 1. and 2. will have a positive effect (this is multi-causality) and need to be taken forward in tandem.
    Alternatively, neither action will have a discernible effect.

This is how I am going to run the research:

  1. A will look carefully at what I have planned for two lessons and re-plan the lessons to cover 50% more time.  That means that the targets will be stretched over three lessons instead of two.
    When I do this, I will also re-plan the presentation and practice stages (or the initial Testing and Teaching phases) to take account of the reduced targets and shorten the stages where possible.
  2. Next, I will look thorough the procedure of my lesson and write out only what I have to say and have to explain to keep the lesson on track.  I will do this for three lessons that I have already planned and delivered in the past so I know nothing else will be changed.
    I will write my explanations and instructions down and memorise them.  During the lesson, I will focus on what I am saying and monitor my own talk cutting it down as far as possible without losing effectiveness.
  3. Finally, I will run a series of three lesson in which I both cut down on the targets and economise on my talking time.

Step #4: observation

After each stage, I intend to do the following to see what the results are:

  1. I will keep a simple diary recording how long I am able to allow for the final stage of the lesson.  In most cases, I intend this to be around 30% of the time set for each lesson and I will check how close to that figure I get.
  2. I will then fill in this form (the numbers referring to the three lessons):
    # Lesson topics Planned time for Stage 3 Actual time for Stage 3 Difference % How sure am I that the learning was successful? (on a scale of 1 to 6)
    I will do the same for both behaviours: planning and target reduction and teacher talk reduction.
  3. I only have one class at this level and can't run a control group so I will resort to asking the learners how they felt.
    For that, I will use these surveys which, to save classroom time, I will send to them all by email.  I anticipate that I'll get prompt and honest answers because they are cooperative and helpful people (mostly).
    For the planning issue, I will use:
    In the last three lessons (including today's) I have tried to reduce the amount of new language I teach to give us more time to check that you can use it.
    In this, please score each item from 1 (strongly agree) to 6 (strongly disagree)
    Put a ✓ in one box for each item only
    # Statement 1 2 3 4 5 6
    1 I am confident I can use the new language we practised            
    2 I had time in the lesson to use the language I learned in a personal way            
    3 I felt the lessons were at the right speed for me            
    4 I don't need to revise the lesson because I think I can use the language OK            
    For the talking issue, I will use:
    In the last three lessons (including today's) I have tried to improve how I explain things to you and how I give instructions.
    In this, please score each item from 1 (strongly agree) to 6 (strongly disagree)
    Put a ✓ in one box for each item only
    # Statement 1 2 3 4 5 6
    1 I understood all the explanations            
    2 I always knew what to do and how to work on tasks and activities            
    3 I felt comfortable in the lesson and was not rushed or bored            
  4. For the lessons in which I both cut down on talking time and reduce the targets, I will rely on my own survey (above) to see if I have managed to allow for adequate time in the final stages of the lessons.

Step #5: analysis

I made the results into a graphic because that's how I understand figures best and here's what I discovered.
For my own observations and impressions:

1 2

From the learners' responses:

1 2


Step #6: conclusions

Although there's more to be done, I concluded the following:

  1. The greatest effect on increasing the amount of time for final stages of the lesson was clearly the action of reducing the targets to give more balance between stages.
    My own view was that learning appeared successful whenever I devoted more time to the final stage.
  2. Teacher talk changes had a much smaller effect.  The time allocated for the final stage of the lessons barely increased at all.  I conclude that my timing problems do not stem primarily from the way I explain and give instructions.
  3. When I reduced the targets of the lessons, learners overall seemed content that the challenge level was about right and it is significant that their response to Statement 2 (having time to use the language personally) scored the highest.  That's an encouraging sign.
  4. Varying my teacher talk, on the other hand, had only a small effect as far as we can judge (because we do not have figures from previous lessons).  Furthermore, a figure of less than 3 for the average of all responses to the whether the learners understood the explanations clearly signals a problem.  It may be the case that I cut the explanations too severely so I will return to my old ways and then ask the learners to fill in another form to see if I can get the average response up for this crucial item.

Going forward, I will maintain the practice of making sure that I limit the aims of the lessons I deliver so that the explanation and practice phases do not take up a disproportionate amount of time.


A summary and aide memoire


If you feel up to it, you can try a short, multiple-choice revision test on this guide.

Lee, SH, 2005, Constructing Effective Questionnaires. Available from http://www.davidlewisphd.com/courses/EDD8006/fall11/2006-Lee.pdf
Oppenheim, AN, 1992, Questionnaire Design, Interviewing and Attitude Measurement, London: Pinter Publishers Ltd.