Lies, Damned Lies, and Educational Statistics

One week till the well-deserved summer break, and I hope all fellow teachers get the opportunity to recharge. Summer does, of course, contain the two results days, which secondary teachers now look forward to with as much anxiety and trepidation as the students. Not least because their pay and career prospects now depend at least in part on the performance on one particular day of a group of largely unpredictable teenagers!

However, I’d go further : their pay and career also depends not just on the actual results of those teenagers, but on the ability of school “leaders” to understand and use data. I don’t know which is more terrifying.

This blog has two parts. The first part recounts a couple of stories of how, in our targets-and-results-obsessive education culture, statistics based on small sample sizes (like your class) can be desperately misleading, and offers a way in which you can defend yourself against overly simplistic statistical judgments about whether you’ve been a “good” or “bad” teacher. The second part goes a bit further and poses the question of whether “good” and “bad” teachers actually exist in anything like the way they are portrayed by media and Ministers, and invites readers to do their own simple statistical search for the “Good” teacher(s) in their school.


PART ONE

“Facts are stubborn, but statistics are more pliable.” (Mark Twain)

Let me tell you a story which still scars me a bit.

In 2008, we had a particularly horrible set of GCSE history results in my department. I knew it had been a weak cohort, and for two years (since seeing the class lists for the 2006 options) had been warning the Head that the results in 2008 would be grim. Still, after the predictably poor results, much “leadership” followed : interventions, a departmental review, governors’ monitoring. We seemed to have done worse on the A*-C ratio than most other departments. If you’re looking at a departmental A*-C rate of 52% against a school rate of 76%, then it looks bad.

I was frustrated. I’d known the results would be a drop from the previous year (81.4% A*-C, since you ask), but I wasn’t expecting the significant difference between us and other departments.

So, in search of explanations, for the first time I looked at the student level data, rather than the teacher data or the department data. What I found was a bit of an eye-opener: of the 65 students taking history that year, it was the worst result for just 8 of them. For 23 of them, it was their best result. For the remainder, they had scored the same grade in history as they had across a range of subjects. In other words, these specific students we taught had on average done better in history than in their other subjects, yet as a subject, we’d come out worse than most others, and had a real dive in our year-on-year figures. Essentially, these 65 students included a very high proportion of students who got low grades across the board in all their subjects. History was optional, and two years prior to these results, we’d had a poor recruitment take-up from the two Year 9 top sets. As a result, we had a cohort which was less able than the school average. The same students were achieving similar or worse grades in other subjects, but the overall percentages for those other subjects looked better because they also included the higher-ability students who had, in that one year, not chosen history GCSE. Could it be that simple ? It turns out that it could.

The following year, 2009, we had a much more normal ability distribution, and our results returned to normal. The next year, 2010, we recorded some excellent results, and in 2011, achieved more than 90% A*-C. I was receiving pats on the back, one governor congratulated me for “turning things around”, and everyone was happy. Yet the real reason, as I knew full well, was that in 2008 and 2009, the two Year 9 top sets had been taught by a very charismatic young male teacher, and we’d had a takeup of history approaching 80% of our most able 60 students (we’re a girls’ school) ! Since then, dear reader, you will understand that some of my most careful timetabling involves teacher selection for Year 9 top sets….

This was my introduction into how easy it is to misrepresent statistics in the field of spurious school improvement. It was also the first time I started to wonder what difference teachers – including me – actually made. If the results for which we were slammed, or lauded, were actually almost entirely determined by the ability of the students we were in receipt of in any given year, then where, exactly, did that leave these labels of “good” and “bad” ?

So since 2008, I analyse my GCSE and A-level results like this every year (and those of my departmental colleagues). I look at student level data and I try to perceive a pattern. Are they consistently getting worse or better grades in history than in their other subjects? If they were, then I’d have something to celebrate or worry about. But they don’t. One year, I might think we’ve done really well, then the next year, there’d be no similar effect. Yet in each year, we get results at both GCSE and A-level which look very good in absolute terms compared to both the national average and the rest of the school, but also usually in value-added terms on such measures as ALPS. The Department has a reputation of being very strong, and filled with “outstanding” teachers. But even in our best apparent years, where most of our students seem to be recording history as their best or joint-best result, there are always a few who are actually achieving higher grades elsewhere. Our main achievement seems to be being able to concentrate a particularly high proportion of well-motivated, able students in our subject, which means our students are getting high grades. But, by and large, those particular students are also getting high grades in all their subjects. There’s no consistent pattern. Yet there must surely be a consistent pattern if all this talk of “good” and “bad” teachers is to have any meaning.

bad-teacher-thumbnail

Let’s be honest, if bad teachers were really like Cameron Diaz, half the student population wouldn’t mind so much

Could it be that students, not teachers, are responsible for students’ exam results?

This stuff matters, because the comparison of teachers so endemic in our schools now is often based on non-comparable datasets. Take another example : in one school I know of last year, the results in one subject weren’t as good as the school wanted them to be. At first glance, simply comparing students’ actual grades to targets, there was one particular class which seemed to have done worse than the others. Immediately, a lot of attention – not all very pleasant – began to focus on the teacher of that class. Had he let the side down ? Was he a “weak” teacher ? Certainly in terms of simple statistics, this teacher’s class seemed to be underachieving.

Yet this teacher understood maths, and did some simple analysis which showed that, yes, there had been a significant number of students in his class who had undershot their target for his subject. But those same students had also undershot their targets across ALL their subjects. In other words, their performance in his class was entirely consistent with their performance in all their other subjects with all their other teachers. It was the students who had underachieved, not the teacher. Yet because a significant group of these underachievers happened to be concentrated in his class, his overall results seemed poor. Coupled with the culture of ranking teachers on a scale of “good” to “bad”, and the national narrative which consistently asserts that the main determinant of student outcomes is the teacher’s inputs, his reputation, his pay, even his career, was under threat – yet the evidence was very clear that his impact on those same students was the same as the impact of other teachers in the same school who were being congratulated for their strong aggregate performance. That teacher experienced last year what I experienced as a Head of Department back in 2008 – the way in which statistics can be misused and misinterpreted to produce inaccurate and damaging conclusions.

And this sort of statistical ignorance is becoming institutionalised. For example, I was shown some new SIMS software only last week (a data manipulation package which many schools use), which the Capita trainer said would allow me to “compare the effectiveness of my teachers” by sticking a numerical value on the collective difference between their students’ target grades and their absolute grades. The teacher with the biggest number must be the best, right ? Except they’re not teaching the same kids. What if that teacher with the biggest number just happened to have a class of students who were performing highly across all their subjects, while the poor bugger with the lowest number had a high concentration of consistently low-achievers across all their subjects? That numerical value in fact has no value at all. That won’t, however, stop it being used to wrongly and unfairly label or rank teachers, if it falls into the wrong hands.

So my recommendation to all colleagues, following the results, is to do this self-same student-level analysis. I also strongly urge school “leaders” to show the same degree of interest in the statistics. It’s not hard to do, takes only a few minutes, and ultimately, if you genuinely wish to compare teacher performance, then this is the only reasonable way to do it.

At this point, if all you were interested in was a handy backup plan in case your superficial results look a bit ropey in August, then you can stop reading. Best of luck to you and I hope your results are great.

teachers cartoon

If you’re still not bored beyond belief, and are interested in a bit of flash-mob crowd-sourced research, then read on….


PART TWO 

Do “good” and “bad” teachers actually exist?

“Hang on,” I hear you cry, “What if I do this student-level analysis of yours and then find out that actually the students are all achieving less in my class than they are in their other classes ?”

Well, that’s a theoretical possibility. It’s entirely possible that the reverse of the effect described above is actually happening : you could have a class or department recording stellar results, for which everyone is congratulated, but a student-level analysis might show that those students are all very high achievers across the board, and are actually notching up their lowest results in your subject – a fact hidden by the concentration of high-achievers in your subject.

I would offer this reassurance, however : I don’t think you’ll find that. What I think you’ll find is that there is no consistent significant difference in same-student performance across different subjects. Mind you, I’m basing that on my own analysis of my own school’s results, and I’m fighting hard to retain an open mind that other schools may actually turn up statistical differences. I’m not going to claim that I’m right and the entire educational establishment is wrong; I’m not Gove. Nevertheless, I put it to you, dear reader, that it’s quite possible that there’s actually no range of teacher quality in your school. There’s simply a tiny minority who are catastrophically bad, and the vast majority who are what we in England might call “fine”, and whose impact on individual student results is actually pretty similar.

 Are both left and right united in error ?

That’s quite a bold statement, I know. After all, don’t Ministers, journalists and Wilshaw line up every week to tell us that the main determinant of student outcomes is the quality of the teacher ? Only last week, Nicky Morgan launched a broadside in the Daily Mail at “coasting” schools, and Wilshaw’s regurgitated some of his favoured old themes about teachers “failing” white working class boys. While from the other side of the spectrum, Gaby Hinsliff wrote a piece about the glass floor through which wealthier parents protect their offspring’s privileged position, to follow Will Hutton’s Observer piece from a couple of weeks ago about the schools system. These commentators encompass both ends of the political spectrum, and they differ significantly on what schools policy should be. However, they’re all based – along with nearly every article, speech or public statement a politician or journalist ever utters on education – on one basic premise : student outcomes are directly related to teacher “quality”.

“Good” teachers produce “good” results, either for your favoured Disadvantaged-Group-Of-The-Moment, or for all students. This analysis then extends seamlessly from individual teachers to schools, producing “Good” schools and “Bad/Coasting/Failing [delete as appropriate]” schools. The prescriptions for how to deal with low student exam outcomes almost always involves some element of replacing “bad” teachers with “good” teachers, resulting in a “bad” school becoming a “good” school. Often, the only difference between the prescriptions of right and left is that the right tends towards the view that you simply sack the “bad” teacher and recruit a “good” one instead, whereas the left has a more optimistic view that all teachers can become “good” through CPD/research/support and so on.

At the moment, we have an interesting inconsistency, particularly on the left, where I suppose my general philosophical position usually resides. On the one hand we decry the Government’s cack-handed attempts to elevate some teachers above others, and OFSTED’s institutionalised desire to label some schools failures while lauding others. We argue this is unfair, as such judgements rarely take into account context, and so on. Yet on the other hand, we cling ferociously to the idea that teachers DO make an important difference. Logically, however, if we accept that teachers do make a significant difference, then we have to accept that there will be a range of difference made. This means we’re buying into the very same concept of the “good” teacher and the “bad” teacher which the Government promotes, and upon which nearly all its policies are based. It doesn’t matter if you don’t think such characteristics are fixed, and you want to label that more kindly as the “good” teacher and the “will-be-better-after-CPD” teacher; you’d have to accept, if you think teachers make a difference, that there will be degrees of difference – a scale of effective to ineffective, if you like. To do otherwise would be inconsistent and illogical. Hopefully you’re still with me.

At this point, it may seem like I’m about to start wheeling out that hoary old chestnut of “A good teacher can get 1.5 years of learning growth; a bad teacher gets half a year of learning growth”.    I’m certainly not. Jack Marwood, my favourite blogger, demolished this myth some time ago, and his series of blogs on this subject should be compulsory reading for everyone in education.

this_is_file_name_1214

Policy Exchange receiving instructions from DFE for their next Free School Impact report

What if everything you ever thought was wrong ?

What if that basic premise is wrong ? What if the reality is that the difference that nearly all teachers make to a student’s outcomes is actually pretty similar no matter what the reputation of the teacher. Or marginal. Or even non-existent. That would be a pretty uncomfortable thought. Because if it were true, once teachers had stopped happily shouting “You can stick your PRP up yer bum!” at the Government, we might have to have a moment of reflection and think – err, hang on…? Who wants to be the guy or gal who doesn’t make a difference? Fortunately, there’s an entire industry of commentators, gurus, authors, “leaders”, trainers, and motivational sign writers who are out there to tell you that there is indeed a “good” and a “bad”. And we can all aspire to endlessly “improve”. Hooray !

Yet….yet…yet… I worry that the reality on the ground doesn’t seem to support the universal narrative. Of course, it’s very hard to make comparisons between schools because of the different contexts – different peer groups, different timetables, different curricula, different cultures and, crucially, different students. So comparing student outcomes for a maths teacher in a suburban Grammar school with one in a tough secondary modern would be crass and stupid. As such, it’s generally something only the DFE and Ofsted do, so we can discount it. All the studies which purport to identify teacher impact try their best to remove external factors such as socio-economic background, school type, subject differences and so on from the equation – trying to isolate that elusive “teacher factor”.

But while I can see how such comparisons can be made on a grand scale involving tens of thousands of students simply because weight of numbers should iron out statistical oddities, I don’t see how they can be made at the level of the individual teacher, because individual teachers will generally only be teaching a very small number of students in any given year, and are thus prone to something which no study could ever entirely eradicate in such a small sample size – what we might call “The Adolescent Effect”, or “Immature Human Agency”. Frankly, you can call it what you like, but anyone who tells you that this group of 30 teenagers will always respond in the same way to the same inputs as that group of 30 different teenagers has probably never met a teenager. Any study which tries to consider individual teacher effectiveness based on different teachers teaching different groups of students is thus automatically flawed beyond hope, in my opinion. You’re simply not comparing like with like – just as the examples I gave above didn’t – they’re different kids, with all the random variations which individual adolescent humans bring to the table; the annoying, loveable little buggers that they are.

However, every year, we actually have a national data set which compares individual teachers’ impacts on hundreds of thousands of individual children who have been through the same school, same culture, same curriculum and so on: we have the data from each and every secondary school. There is thus a way in which we can all use some fairly easily accessible data to establish whether – at least in each individual school – there’s hard evidence of this spectrum of teacher quality. I’d thus like to invite readers to use the statistics from their own schools to see if they can identify these “good” and “bad” teachers. Imagine if we all used our results to try and locate these creatures, but couldn’t ? That would throw a cat in amongst the pigeons, given that nearly all educational policy is based on their existence. Or alternatively, you might find them, and you can play a fun game of “Are the actual top teachers the same people my SLT think they are?”. Come on, we all want to play that game.

I’ll apologise to primary colleagues, as this will inevitably be very secondary-focussed.

 “Contrariwise,’ continued Tweedledee, ‘if it was so, it might be; and if it were so, it would be; but as it isn’t, it ain’t. That’s logic.”

Let’s start with a logical statement :

If “good” or “bad” teachers make a difference to outcomes, then the same children taught by “good” and “bad” teachers will get different results.

That seems to me to be logically unarguable. This is the basis upon which nearly all current education policy is built. Either recruit or train the “best” teachers, and your students will produce the “best” results. If you allow the “worst” teachers to stay in post, or fail to improve them, then your students will get poor results. If you accept that the same students get the same results with “good” and “bad” teachers, then in what sense are those labels valid ? They wouldn’t be. So we need to find evidence which supports this teacher effect.

So, logically, what we would expect to see in any given school – if the statement above is true – is that each student will have a range of results: higher results from the “good” teachers, and lower results from the “bad” teachers. However, we need to go further : if teachers make a difference, then we would expect that the results of a “good” teacher’s students would be consistently better than the results of a “weak” teacher’s students. Or at least consistently statistically significant. If “teacher quality” was the main determinant of outcomes, and “teacher quality” differed significantly, then we would expect to see a simple graph which went something like this :

graphs for stats blog 2

Figure 1 As teacher “quality” rises, so does student achievement

The numbers bit

So can we identify strong or weak teachers through results – as everyone and their dog states we can, repeatedly? The answer is that we should be able to, if they exist, using student-level data.

Here’s an example data set from one teacher – me. I am thought of in my school as a “good” teacher (I suspect AHTs and above would also add “but a pain in the arse”, but that’s not pertinent to this exercise right now). The annual figures for both A*-C and A*-A are WAY above the national average both for my personal classes and for my Department as a whole. In theory, based on a simplistic comparison of my results with history teachers in other schools, I am a very “good” teacher. This would be a bogus statement, however, because I teach a school intake which is of above average ability compared to the national cohort, although it lacks much of a very top end because of the existence of local grammar schools. So my overall results compared to national averages mean nothing at all. I’m not “good” yet.

Still, within the school I teach the same GCSE and A-level students as numerous other teachers. If our earlier statement about the effect of differing teacher quality is true, then we should logically expect me to fit somewhere on a spectrum of teachers, ranging from least to most effective.

So let’s have a look. All anonymised, of course, but these are real results from a few years ago. I’ve only included subjects which are largely written, largely externally examined, and excluded those which only a very small number have taken (eg Russian for our native Russian-speaker). I accept that different subjects are of different ability levels, but again, if we apply logic and say that one can’t compare across subjects, then what in God’s name are we doing talking about “good” and “bad” teachers in the first place, and what the hell is PRP all about ? Given that the entire range of education policies based on the notion of differential teacher quality is predicated on the idea that exam grades in different subjects are comparable, I feel that we may as well use that in our search for the “good” and “bad” teachers.

Results 1

So what patterns jump out ? I did really well with student J – hooray for me, give me a performance related pay rise! But don’t look at students I or R. And damn you, Chemistry teacher, for getting a better result with student Q, but eat my historical dust, mathmo, when it comes to student L !

It’s not obvious, is it ? Let’s try a more recent year.

Results 2

God, look at student E – I suck. Oh, hang on, don’t look at student E, look at student G – I’m still great!

There’s not really a pattern emerging here in terms of teacher impact. I could post more tables, but frankly, I think everyone would get very bored. You can take it from me, however, that all the GCSE and A-level data from my school over the last seven years show the same thing. Incidentally, so do the AS and A2 tables, although those are slightly more difficult as there are fewer subjects per student to compare, and greater diversity of choice.

In other words, there is no evidence of any teacher effect at all in my school. The students I teach get broadly the same results in my subject as they do in all their subjects. Some do better elsewhere, some do better with me, but there’s no consistent pattern.

This might just mean I’m mediocre, of course. Maybe I’m a “coasting” teacher. But then you’d have to apply that description to all my colleagues in the history department, as our individual and collective results are all indistinguishable. And I’ve also done this for other colleagues in other departments (yes, I am competitive, and yes, I was looking to see if anyone was doing better than me). However, my search turned up nothing of significance in comparable subjects. There just isn’t, in my school, any evidence of differential teacher impact. Logic thus requires me to reach the conclusion that there aren’t any significant differences in teacher quality within my school.

It’s the kids, innit?

However, if no pattern is discernible amongst the teachers, one pattern does appear, though, and it’s not one which particularly supports the general idea of “good” and “bad” teachers. If you look at the results above, you have 33 individual students, whose only connection is that they all happen to have been taught history GCSE by me. I could have given you many, many more, but nobody wants to look at endless lists of stats (well, maybe Jack Marwood does, but it’s something of a minority interest). Now look at their grade distributions. There is a remarkable consistency of grades across all their subjects. Most students’ results “cluster” around a certain level. You have A*/A students achieving those grades across the board, with maybe a solo outlier B. You have “Mostly B” students with the odd A or C. You have the occasional C/D borderline student with results either side of that line. There aren’t so many students achieving lower averages because they don’t tend to choose history for GCSE. Give the compulsory Ebacc a couple of years, and I’m sure I’d be showing you their similar new grades in the lower numbers.

These grades are not all the same, of course. It’s not uncommon to see a “cluster” of grades around a certain level, with a couple of outliers. Aha ! You might cry – there’s the evidence of good and bad teachers – the high and low grades. Except, of course, this is not replicated consistently. If these outlier grades were related to the teacher, then logically you’d expect to see a consistent pattern of the History grades always being the best or joint best, or always exceeding English, or vice versa. Take the second table above, and look at students A, B and D. These three students were essentially B/C borderline students, with their results clustered around those two grades. For student A, history was her joint worst result. For student B, it was middling, while for student D, it was her highest result. You can see a similar pattern amongst A/B borderline students in the first graph – sometimes History is their best or joint best result, sometimes not. I guess that makes me an outstanding, coasting and ineffective teacher. This pattern repeats itself across all 200 or so students who take GCSEs in our school every year, across all similar subjects.

In other words, by far the strongest determinant of outcomes in any given subject – in my school at least – is not the teacher, but the student. Students tend to have a general level which they’ll achieve across the board give or take some oscillations, and there’s no evidence of any consistent teacher impact on those oscillations. It may be that they just put more effort into a subject they liked more, or that they just “got” that subject more, or that they simply had a bad day on the day on which that subject’s exam was held. The inconsistent nature of the differences suggests pretty strongly that, logically, those differences are much likely to result from a factor intrinsic to the student, rather than an extrinsic teacher factor. Exam results, it turns out, are much more likely to be related to the students than the teachers. Who knew ?

 teen brain

Yes, yes, I know, not all of them are like this.

But…but…but….

There are, of course, some valid objections to this sort of comparison. Are all subjects of equal difficulty ? No, of course they’re not. So just because a student gets an A in Geography doesn’t mean that their teacher was better than the maths teacher who taught the same student to a B. I accept that, but when you look at data sets with this many results over such a long period, even those differences in subject difficulty can be eliminated, because you should expect to see consistent patterns: if Geography is a grade easier than maths on average, then the “good teacher” will be visible because their results are more than a grade higher than the maths teachers’ grades, and so on. It’s not there, though. It’s never there. In fact, I’d go further : these subjects have very different inputs. Prior to GCSE in Y11, students will have had FAR more maths, English and Science lessons than history lessons. They’ll have had far more “interventions” of the extra-lessons-in-the-holiday variety in some subjects than in others. Yet despite all these differences of inputs from school or teacher, which you’d expect to show up in the results, they don’t. That pattern of clustered results with fairly random outliers still holds : it doesn’t seem to matter what inputs are brought to bear in terms of time, attention, “extras”, priority etc; the A students get a bunch of As, the B students get a bunch of Bs, and so on.

Another objection might be that not all students take the same GCSEs, and those kids who all have the same history teacher might have two different English teachers. Maybe the history teacher is worse than one English teacher but better than the other ? This is a bit of a clutching at straws argument, but it’s valid. Yet we’re looking for patterns here. If we follow our logical statement above, then somewhere in every school there is a teacher who consistently has her students achieving their best grades in her subject, and there should be some poor soul whose students consistently achieve their worst grades. To accept otherwise would be to see teacher quality as an unending game of rock-paper-scissors, where English is better than Maths, and Maths is better than History, while History is better than English. And where on earth would that leave our supposed straight-line spectrum of teacher quality upon which our entire educational policy edifice sits ? So our logic requires us, despite objections, to be able to identify those teachers who are consistently seeing their students achieve their best or worst grades in their subject. I can categorically state that, at least in my school, such a person does not exist.

A third point worth noting is that these grades are of course based on raw scores and moving grade boundaries, and in the more subjective, literate subjects, are at serious risk of mis-marking. This could in theory reduce the value of this data as a comparative tool. But that takes us back to the point I made earlier – if we accepted that the data is so ropey that one shouldn’t draw conclusions form it, then that rather makes the whole concept of “good” and “bad” teachers redundant anyway, because there’s be no reliable way of measuring impact at all.

Anyway, the graph of teacher input to student output in my school, when student-level data is considered, essentially seems to go like this :

 graphs for stats blog 3

As teacher quality rises, so….oh….hang on….

All that’s left is to suggest my school is itself a statistical outlier. All our teachers are of similar standard, which is why we are an “outstanding” school. Well done to us for having recruited/trained such an excellent collection of similarly brilliant teachers. Hmm. Firstly, I don’t think my SLT think that when they look at me. Secondly, that’s a fairly remarkable achievement, because while the teaching workforce in my school is more stable than many, there’s still been a pretty substantial turnover of staff over the seven years since I started crunching these student-level numbers. And that fundamental pattern of no discernible teacher impact has been consistent for those seven years, whether the teachers are young, old, PGCEs, SCITTs, experienced, inexperienced, male, female, blobby, Govian, whatever. The school encompasses teaching styles which vary dramatically, lesson time allocations which differ considerably, and course changes which differ from subject to subject. All the disruption and change of the last seven years which all schools have had to deal with. But not once, in all that time, with all those exams and all those teachers, has a single teacher effect ever been discernible when you search for actual consistent differential outcomes at individual student level. Hmm. Hmmmm. Hmmmmmmm.

Crowd-sourced research

I have to accept that this last point may be true. It may be the case that all teachers in my school are equally great, or mediocre, or bad. But differential good and bad teachers must therefore exist in other schools if education policy is not to be a nonsense. And it’s easy to discover if they do. Any secondary teacher has access to their school’s GCSE results. So I’m inviting all readers to do this same data crunching.

  • First, have a look at your GCSE results, and see if you have the same “clustering” of results on an individual student basis around a certain grade range. If you actually have a fairly common wide distribution of grades at the individual student level, with quite a lot of students achieving a broad range of results, then you probably have good grounds for hope that you can identify some “good” and “bad” teachers.
  • Second, identify students taught by a single teacher (probably tactful to do yourself), and look at their results across the board. Are that teacher’s students consistently achieving their worst or best result in that subject ? If so, then you may have identified a “good” or “bad” teacher. Although you might want to then check another couple of years’ worth of data just to make sure it’s not a one-off fluke.
  • Third, post your outcomes here (don’t name the “bad” teachers, for the love of God !). I’m genuinely interested. Like all of us, my own views are heavily influenced by my own experiences, and so that experience leads me to conclude that there’s not a whole lot of support for the concept that teachers are “bad” or “good”. I’m open to the possibility that somewhere out there, there is statistical evidence that there are teachers whose students always do worse or better for them than for other teachers, in subjects of comparable nature. So let’s see it.

By the way, I expect that it should be possible to identify the occasional “catastrophic” teacher. You know, the guy who teaches the wrong course for the exam, or the one who is so incapable of classroom control that all of his students think they’re actually studying “Desk Jumping and Spit-Ball Lobbing” on Period 4 on a Wednesday. My school just doesn’t have any of those walking disasters, although a very long time ago, when I joined the school, we did have one (loooong since departed), so I accept their existence, if not in the numbers which the media seem to believe. So I think in less fortunate schools, I’d expect to see the occasional teacher whose students’ results are consistently the worst results ALL of them get. What I’m interested in, though, is whether it’s possible to identify that star teacher, the woman whose students always get their best results in her class, year on year. It’s THAT teacher I can’t find in my school. See if you can find him or her in yours.

And if, of course, you don’t want to wait until this year’s results, there’s nothing stopping you digging out 2014’s statistics to play with in that period where you used to have a Y11 class….

45 thoughts on “Lies, Damned Lies, and Educational Statistics

    • Me too. As I said, I’m trying to keep an open mind about this. Part of my sceptical attitude towards much CPD, for example, stems from what I see as a lack of hard evidence of real effectiveness : if all teachers have a similar impact on individual students (whether you think that’s much, some or none), despite very different approaches, then what point would there be in investing time, effort and possibly money changing my own practice ?

      Much of what is asserted as “best practice” seems to stem from Ofsted inspection documents, or educational Theories-Of-The-Month, however well-meaning. I’m not really interested in such stuff.

      However, if someone was able to say “This is the methodology used by the woman whose history students consistently get their best results in her subject year on year”, I’d be all ears.

      Liked by 2 people

  1. I am, quite frankly excited and at the same time relieved that it is not only myself that believes student achievement and student behaviour for that matter, is a result of the student’s themselves! It is high time people realised that as hard as one might try, as creative and innovative one might be and the different but numerous strategies one might try, the truth of it is…..
    ‘you can lead a horse to water….but you can’t make it drink!’
    A student that is dedicated, determined and focussed will always achieve better than the, ‘class clown’ the, negative ‘I can’t do it’ or the, ‘I don’t care’ student. Regardless of how ‘good’ or ‘bad’ the teacher is deemed to be!

    Like

  2. This post is brilliant and important!

    Kids themselves will always have the largest impact on their own results (whether through work ethic, ‘natural ability’ or whatever). The ‘quality’ of the teacher does make a difference so.

    The key messages of the post chime pretty well with how we (as a school) use exam results data in school (so I am probably biased).

    When appointed to SLT as ‘data-bod’ almost 6 years ago i immediately moved to using ‘in school variation’ as our key analysis is. Ie. We look at how subject X do compared to how well those same students do in their other subjects. Looking at ‘raw results’ is little more than measuring the ‘ability on intake’.

    In Raiseonline you used to get a chart with this calculated (along with the national equivalent picture) so you could say things like ‘nationally students did 1/3 of a grade worse in gcse French than in their other subjects but, at our school, they achieved, on average, the same grade as in their other subjects.”

    Sadly that chart is no longer published so we have to use slightly out of date data (although it has been pretty constant over the years).

    As an aside I had a long conversation with senior staff in OFQUAL about how the change to new GCSEs presented an opportunity to properly align the difficulties of subjects so that a ‘7’ in Geog aligned better with a ‘7’ in French (say) but, whilst acknowledging the problem, decided that it was more important for ease of use to ensure direct comparability between old and new GCSEs at key grade points (ie. Bottom of Grade 4 = bottom of grade C).

    We can also apply this in-school variation analysis at class teacher level. Eg. Mr smith’s classes this year tended, on average, to perform half a grade better in his subject than their other subjects.

    Of course great care must be taken here not to impart too great a certainty to one year set of data as there are a myriad of factors that ‘get in the way’ of the data being clean.

    One thing to note is that if you work in a settled subject and teach ‘bottom set’ then if you have gathered students who are weak in your subject – for some students that will be common with their ‘ability’ across the board so ISV seems fine as a measure but, for others, their weakness in your subject will be an anomaly and therefore they will very likely ‘under-perform’ in your subject irrespective of your quality as a teacher. The reverse is true of ‘top sets’ too. We find that (usually) top sets have a higher ISV and lower sets a lower one.

    Crucially this is not a function of the ability profile (if it was then ISV would be, again, really a measure of intake ability a d not move us forward at all) but of the impact of setting in clustering ‘anomaly’ students into one or two groups rather than spreading them across all groups randomly.

    Also need to remember that it is likely that students have had a range of different teachers in ks3. I know as a HoD that, where a class had had difficulties at ks3 and not made as much progress as hoped, then I would put them with my ‘best’ teacher at ks4 to try and sort it out. Obviously in this case the ISV for the teacher is artificially depressed because the students are not randomly assigned but again anomalies (those who at the start of ks4 are already ‘behind in maths compared to their other subjects’) are clustered disproportionately in his group.

    So what have we found over the 6 years?

    There are indeed some subjects that, year on year, whether they recruit more able, less able or typical cohorts, do seem to ‘punch above their weight’ compared to the national picture for their subject. It would seem fair to see those as stronger subjects within school.

    Equally there are some staff who do likewise and ‘buck the trend’ irrespective of their group.

    It seems fair to summise that they are ‘our best’ teachers. We have a couple who consistently come in at the other end of the scale (more about those in a minute). The majority of teachers are somewhere in the middle – occasionally positive for ISV, occasionally negative.

    So what about those staff and subjects who seem constantly to have weak ISV (ie. Kids fairly consistently do worse in their subject/class than others)?

    The key here is that ISV is a zero sum statistic within a school. Because the calculations are comparing results in our school to those elsewhere within our school (nowhere in this analysis so far do we look at national results for ‘similar kids’) there must always be ‘winners’ and ‘losers’ in the statistics – they have to average out at zero within a school.

    That is a really key point – no matter how ‘outstanding’ a school there must, by definition of ISV, be some subjects and staff doing “more” and “less” well than others. Equally a school that is really struggling will have some subjects and staff doing “more” or “less” well. That is what ISV captures.

    So back to those staff and subjects – our overall results (progress ks2-4) as a school are strong. We are a solidly good school (towards but not outstanding) in that regard.

    That has implications for those staff/subjects with consistently low ISV ‘scores’. They are still perfectly good teachers. In another school they may even have a positive ISV ‘score’ – they are made to look ‘weaker’ because of the quality of other subjects/teachers in our school.

    They certainly don’t need sacking, ‘SLT monitoring’ or whatever. They need the opportunity to learn from those others in school who consistently have high ISV.

    Obviously I have over simplified our use of ISV and data more generally here for, believe it or not, brevity!

    As I said at my interview 6 years ago – and repeat frequently – data never tells you anything – it merely helps point you towards some questions to ask.

    Liked by 2 people

    • Many thanks for this excellent and thoughtful reply. I’m heartened by the sensible use of data in your school, and hope that sort of methodology becomes more and more widespread. I also think the rational, measured way you approach the outcomes of the analysis (recognising the flaws in ANY attempt to compare data in this field) is something of a model. Perhaps I should have asked people to hunt for the “good” data-crunchers – approaches like yours are models.

      Liked by 1 person

  3. Yet another insightful blog highlighting the unwillingness of ‘they who can’t be named’ to accept Einstein’s pithy quote: ‘Not everything that counts can be counted, and not everything that is counted counts.’ The notion of the good/bad teacher relies on the premise that exam/assessment results are the sole best indicator of a student’s learning. Given the failure to demonstrate that this is true, the whole foundation crumbles. If this were an ecological or ethological research project (the study of which is notoriously difficult given its multi-factorial nature), the method and models used would have been thrown out decades ago. Yet still the grim adherence to false mantras persists.
    Recalling my own schooling, and that of my three children, the memory of those teachers who were ‘good’ tends to revolve around characteristics other than ‘average A-C grades’ or ‘points scores.’ The features that mark out ‘memorable’ teachers are things like: they were approachable, they seemed to care, they created such an interest and curiosity in the subject, they were able to explain things clearly, they motivated you etc. etc. Do these qualities always translate into better exam results? The answer is ‘case not proven.’ I’d even go as far to say ‘case unprovable.’
    I once attended an after-school training session where a video (recorded from the teacher tv channel) was viewed. We watched two lessons: one by a Maths NQT judged to be ‘satisfactory’ (in old OFSTED money) and a History (lead) teacher judged to be ‘outstanding.’ The Maths NQT delivered a lesson on trigonometry to a mixed ability group where he employed some clearly engaging activities which piqued the curiosity and interest of his mixed bag of students. However, because the students didn’t all appear to have made ‘progress’ in the one hour time frame, he was marked down. Remember – trigonometry is HARD. The outstanding history lesson consisted of an intense sprint through the evolution of castles, where students were reminded at ten minute intervals where they were ‘at’ in terms of their national curriculum level and what features of their work would label them as level 5a or 6b by the end. The teacher was thus able to manufacture a self-fulfilling prophecy. After watching this ‘outstanding’ lesson for 15 minutes my brain was fried with nervous energy and my desire to ever visit a castle again completely destroyed. I vowed never to follow any advice which would lead me to teach such a sterile lesson in the future.
    I intend to show your blog to our school’s data manager – it will be interesting to see if any debate develops from it.

    Like

  4. In August 2011, the Institute for Fiscal Studies produced a report saying a school’s intake governed academic results. Pretty obvious – but this fact is still ignored. But even the IFS said the most important factor in pupil performance was teacher performance.

    If this is true then I was a bad teacher of English/English Lit because only one of my students ever achieved a C. That’s because I usually taught Set 4. Only once did I have a tiny of number pupils who achieved B – that was the first year of GCSE when I taught a mixed ability class.

    But hang on! I also taught GCSE double-option Business and Information Studies. Here I was a ‘good’ teacher because the majority achieved C and above. But that was because they were a self-selected, motivated and generally above-average ability group.

    Did I make a difference? Hell, yes! But not as measured by grades. I kept my Set 4s on task (most of the time) and squeezed coursework out of them. They were transfixed when I read aloud (I did this a lot). And my BIS pupils got a thorough grounding in business studies (including ethics, not on the syllabus) and IT (in its infancy, at the time – we used BBC B computers before graduating to more up-to-date BBC kit).

    So, good or bad? Whatever the judgement, I was burnt out after 20 years and opted for early retirement.

    http://www.localschoolsnetwork.org.uk/2011/08/school-intake-governs-academic-achievement-says-ifs-report/

    Like

    • Thanks Janet. In a sense, you’re recounting the same experience which prompted me, seven years ago, to actually think harder about this. Before then I probably did blithely accept the universal line of “good” and “bad” teachers. But if student outcomes vary, while teacher inputs remain the same, then the logical conclusion is that it’s the student inputs which determine outcomes. I and my colleagues didn’t teach differently in 2008 than in 2007 or 2009, and we did the same course. Yet our students achieved lower outcomes. Did we all suddenly become “bad” teachers for a year ? Or were results determined by students ? It seems pretty obvious when put like that, yet it’s a message which is lost in the endless announcements that teachers are the major, even sole, determinant of outcomes. Which isn’t true, IMO.

      Like

  5. Interesting.
    Speaking as both a primary teacher, and one of children with SEN, I think there *is* a question to answer about, perhaps, teachING quality.
    There is such a narrative, in SEN, of blaming children for not doing well in school, hence the explosion in SEN labelling after the 2001 Code of Practice that, in this case, we do need to have the conversation.
    Yet another example of being caught on the horns of a very horny dilemma, I feel, and one that strikes at the heart of the blame culture within which we live. Thanks for an interesting post.

    Like

    • I’m effectively blindsided on Primary education Nancy. I experience it only as a parent. It does seem to me though that it’s a hugely different kettle of fish. There are arguments to be made about whether a child’s capacity to learn changes as he/she ages (does the “cone” of possible outcomes narrow as the child ages?), but there are also arguments to be had that a primary teacher is much more significant for any given child than a secondary teacher. Even in core secondary subjects, it would be fairly unusual for a secondary teacher to see the same child more than 4 hours a week. A primary teacher would usually see each of his/her students 4 hours every morning, every day. I could never teach primary – I simply don’t have the patience for very small children – and so I bow to nobody in my admiration of primary teachers. But I think primary and secondary teaching is so different that attempts to measure impact/performance have to be different between the two sectors.

      As I’ve written before, two of my children are at the very bottom end of the ability range. One would almost certainly have been labelled “SEN” before it was made much harder for schools to do so. My experience has been that only once in their journey through primary school has a teacher made a comment which demonstrated a lack of understanding of the issue, and a willingness to “blame” the child. I’ve read accounts of less fortunate parents, though, so I don’t doubt that we’ve been fortunate.

      Liked by 1 person

  6. Totally agree with you that ultimately, it’s the student that determines the grade, not the teacher.

    However, as a parent, and (a long time ago!) a student myself, I am aware that teachers can and do affect grades – though possibly not substantially enough in most cases to bring students completely ‘out of the ballpark’ in terms of the overall range of grades. It’s hard to spot that because of the fact that, as you correctly point out, every teacher is not being compared against the same group of students, every subject is not equally hard and every student’s innate ability and effort is not a constant.

    But I think we all know a teacher we or our children have had who was pretty lousy (and I and my kids are fortunate enough to go/have gone to excellent schools, but you still get some of these there). It may not be easy to spot it by a superficial glance at the data, but that does not mean that those teachers do not exist.

    You considered some factors that prevent us drawing reliable conclusions from the data as to which teachers are ‘bad’ or ‘good’. I think there are at least two more factors you omitted that prevent us from reliably reaching those conclusions. As well as the factors you mention, another factor that complicates the equation where poor teachers (though not good teachers) are concerned, is that few parents will happily let their child fail subject X, when it comes to high-stakes GCSE or A Level exams, just to prove the point that the teacher is rubbish! They will employ a tutor, or teach them themselves, if they can. So a B-C average child who scrapes a C may not owe that C to the teacher at all, but to their parents’ or tutor’s efforts. After all, in London I believe around 40% of pupils are tutored outside school at some point – and it is precisely in those subjects where pupils are likely to do badly that tutoring will be used, thus hiding the effects of bad teaching at school to at least some degree.

    In the worst-case scenario, provided the subject is optional, parents will of course let or encourage their child to drop the subject, thus hiding the negative impact of the teacher completely. That makes me think you’re most likely to see your ‘bad teacher effect’ in English/maths and maybe science, as they are compulsory, so pupils won’t be able to drop them no matter how lousy the teacher. You’re very unlikely indeed to see the effects in history because pupils who thought they were going to do really badly in it because of a lousy teacher probably won’t take the subject in the first place! There, it might be more instructive to look at proportions of the year who take the subject at all, compared to the other Ebacc subjects – if 90% of the year group is choosing RS over Geography and History, that might tell you quite clearly what pupils at that school, and their parents, think about the overall competence of the Geography and History departments, relative to the RS department!

    Secondly, you also omitted to mention another major cause of ‘statistical noise’ when it comes to determining who is a ‘good’ or ‘bad’ teacher – that (in my kids’ school, at least) subject teachers do not remain constant for any group of pupils over the 5 years prior to sitting the GCSE (I don’t know about A Level, as my kids haven’t got there yet). So final results cannot easily be attributed to the teaching of any one teacher – there is no way of knowing if an A* is largely down to an inspirational Year 7 teacher who gave the pupils good foundations in and a love for the subject, or an exam-focused Year 11 teacher who taught well for the test. And likewise if a year of low grades can be attributed to poor teaching this year, or in any or all of the previous years.

    It’s a shame, I think, that your article started from the premise of focusing only on secondary teachers, as actually the impact you want to see is much easier to spot at primary level in that pupils usually have only 1 teacher so progress will be much more easily attributable to that teacher and pupils cannot ‘drop’ any subjects (though the same caveats apply, obviously, about the impact of previous teachers, outside tutoring or parental coaching, and, above all else, the contribution of the pupil themselves to their results).

    But most significantly, your article failed to question the assumption that grades are what we should be focusing on – in that you based your conclusions entirely on an analysis of grades! As an examiner, I am very aware of what exams fail to measure and the flaws inherent in the process – Daniel Koretz makes good points about the inevitable impact on validity when one attempts to use high-stakes tests to prove not only students’ abilities in a subject but also teachers’ competence or school quality. It is in no way accidental that the results we have fail to reveal who is or isn’t a good (or bad) teacher – it is an inevitable corollary of the system we have in place.

    But more that that, as a parent, I am aware that exam grades will not reveal who is inspiring my child with a lifelong interest in the subject or covering the full range of subject matter, and who is merely teaching a class of bored pupils the narrow sample of knowledge they need to pass the exam. (A measure of good/bad teaching based purely on exam results, will, of course, see the latter as the ‘better’ teacher.) As a parent, I don’t only value a full set of A*s – I also want my child to be happy at school, to enjoy the subjects and understand the wider lessons they teach, to remember and be able to apply that knowledge and those skills in real life, where appropriate, to learn how to think critically etc etc etc. So that is reason enough to find performance-related pay for teachers based on grades odious – even if it was actually possible to assign grades clearly to the teaching prowess of individual teachers according to student grades achieved, which you’ve convincingly pointed out is not the case.

    That does not, of course, negate the fact that some teachers DO perform ‘better’ – however you wish to interpret that – than others. You might not be able to spot it through results data, and it might be fruitless and demoralising to attempt to reward it with performance-related pay, and it might even be largely subjective – but that does not mean that ‘better’ or ‘worse’ (if not ‘good’ or ‘bad’) teaching definitely exists. (And this variation occurs within the teaching life of an individual teacher, not only between teachers – no teacher teaches every class equally and an individual’s teaching prowess can improve (or decline) over time.) Just because we cannot measure teaching quality accurately does not mean we should fall into the opposite and equally dangerous trap of assuming that what cannot easily be measured does not exist. As a teacher, I would therefore disagree with you re CPD – part of what I hope I pass on to my students is that however good we are, we can still improve, and learning is a lifelong process and not something we should think of only in relation to exam grades (ours or our students!). I’m only on your site to comment at all because I am interested in furthering my own understanding. You’re only sharing your ideas because you’re interested in furthering your own and other people’s understanding of what we do as teachers. So maybe your definition of CPD is a bit too narrow?

    In conclusion, I’d tend to agree that there are severe difficulties in measuring ‘good’ or ‘bad’ teaching in practice. (These are the same difficulties that bedevil educational research generally – it is very hard to isolate any individual factors.) However, you cannot draw the conclusion from difficulties in measuring the impact of a specific teacher that all teaching has no impact at all, or equal impact. It is certainly extremely unwise and almost certainly counter-productive to attempt to reward or punish teachers financially based on grades (which only tell a small part of a student’s ‘achievement’ in a subject anyway). However, as teachers, we should not be misled by the impossibility of quantifying what we do into the dark alley of assuming that we do actually makes no difference at all. (That gives weight to those in favour of Mitra’s Hole-in-the-Wall paradigm, which involves no pay for teachers at all!) What we do as teachers does matter, does differ and can be improved. Attempts to quantify this, though, will almost certainly meet with disaster. What we need is a government that appreciates teachers’ contribution and – as in Finland – allows teachers the time and money to be the best teachers they can be. Teachers – and pupils – need teachers to have a decent non-performance-related salary, funding and time for high-quality CPD (MAs etc), and plenty of planning time built into their schedules. Claiming that all teachers start off the same and cannot improve allows governments off the hook in providing either the salaries or the CPD. In fighting back against unjustified inferences based on student grades, such as performance-related pay, I think you risk throwing the baby out with the bathwater: suggestions that teachers cannot affect grades is too close to an acceptance that teachers are effectively impotent, which I think is neither helpful nor true.

    Like

    • Thanks for the lengthy reply Caroline. I think you make many valid points. I agree entirely that multiple teachers/optional subjects and so on make it harder to measure any differences in teacher impact. I suppose the underlying point I’m scratching at here is that, given it’s effectively impossible to reasonably measure individual teacher impact on student outcomes, then why do we have so many policies (not just PRP) which are based on the idea that we can and do ?

      On the primary teacher issue – I agree that primary teacher impact is – logically – going to be greater than secondary teacher impact for a variety of reasons. However, the reason I don’t write about that is because I tend to shy away from opining on matters which I’m frankly a little unsighted. I’d be interested to read something similar from a primary expert, but I’m not one.

      I think I’d offer a word of caution on the statistic for tutoring which you used. One of the problems of the education “debate” in this country is it tends to be conducted only by people who were themselves from a certain demographic in terms of their own ability, their child’s ability, and their family resources. I find the idea that 40% of ALL London school children receive some form of tutoring outside of school to be literally incredible. I saw a newspaper article last year which referenced a finding that up to 20% of parents in grammar school areas used tutors, largely to cram for the 11+. I can imagine that. But the idea that 40% of all students receive external tuition strikes me as deeply unlikely. I’d be interested in any links you have for that statistic, as it sounds to me like the sort of thing which might fall apart with a bit of probing.

      I think also your last point is interesting. You wrote : “suggestions that teachers cannot affect grades is too close to an acceptance that teachers are effectively impotent, which I think is neither helpful nor true.”

      I agree that this is a dangerous route – it comes close to supporting, for example, the Government’s view that teacher qualifications are irrelevant. However, returning to logical statements, I don’t think that saying “teachers do not seem to have differential impacts on exam results” is the same as saying “teachers are impotent”. I think there’s two points I’d make here :

      1) You yourself referred in your answer to the “other” impacts a teacher can have. “I also want my child to be happy at school, to enjoy the subjects and understand the wider lessons they teach, to remember and be able to apply that knowledge and those skills in real life, where appropriate, to learn how to think critically etc etc etc.” I think that there almost certainly is a difference between teachers with regard to these inputs. The sort of “role model” angle in its broadest sense, of the teacher communicating values, behaviours, self-confidence and self-worth, and so on, is very important. Also important in my view is the teacher’s impact on life outside the classroom – the horizon-broadening trips, the clubs which allow children to form social networks, the sports teams which foster health. Those things ARE important, and some teachers almost certainly contribute more than others. They are, of course, completely unmeasurable, which is why they are now more or less ignored in much of the education system.

      2) However, even if we look at just the exam results, I don’t think what I’m suggesting here is that teachers are impotent even there. One might liken the teacher role to a filament in a lightbulb. For the bulb to work, the filament has to be there. It’s a straightforward task, but it’s an essential one. If the filament was broken (the “catastrophic teacher” as I characterized it above), then the light won’t come on at all. In essence, it’s almost going back to that cheesy old proverb about the teacher opening the door, but the student having to walk through themselves. I guess I’m suggesting that teachers really don’t seem to be making any discernible difference in terms of how far and how fast the students walk through the door, but they still need to open it in the first place. That’s still an important role, rather than an impotent one.

      My final point would be on the issue of continual “improvement”. I understand the motives behind the concept, which are laudable. I’d just make two points again (sorry, I do love my bullet points) :

      1) The concept of continual striving for “improvement” does huge damage to the profession. In the same way that every teacher knows that just because we mean X when we say something, doesn’t mean the listener will receive the same message, so we need to understand as a profession that what we are saying about continual improvement, and what the general public are hearing, are not the same thing. You mean “I aim for improvement because I’m always looking for ways to squeeze out another inch of progress – the same way an athlete who is already top of his field strives to shave off another half a second”. The public hears “Teachers aim for improvement. Because they’re rubbish.”

      2) There comes a point when one must consider whether the marginal gains which may arise from certain activities are worth the resources being invested into achieving those gains. For example, consider how much time, effort, money and attention is being utilised in schools across the country in pursuit of alleged gains in performance which – as we’ve already established – we can’t actually measure in any case, and are based on a concept (teachers inputs significantly affect student outcomes) which is, itself, extremely difficult to prove. All that performance management time used by SLTs and teachers. All that CPD cash. All the management, and monitoring, and spreadsheets and targets. All to produce a gain which might be marginal, but in any case, we almost certainly won’t be able to discern. That’s time, effort and resource which could be going into the wider school experience you mentioned above. There’s a human cost to this pursuit of unknowable marginal gains, which my not be justified expenditure.

      Liked by 2 people

      • Thanks for the reply and sorry my post got so lengthy!

        A briefer response – as someone currently doing (and self-funding) an MA at least in part to make myself a ‘better’ teacher, I certainly hope I’m not wasting my money! Whether we ought to expect the taxpayer to pay for this is a moot point: I certainly think I teach better when I’m better prepared so funding teachers to have sufficient preparation time should be a minimum. I take your point that public perceptions of CPD can merge into an impression that teachers are ‘rubbish’, but I think that arguing that teachers play only a minimal role in students’ success plays equally into devaluing the role of teachers. Certainly, if I believed I could have no significant impact on my pupils’ progress I would probably have given up teaching long ago – there are plenty of jobs that are far easier and better paid!

        I simply cannot agree that all teachers are created equal, or remain equal, even if I do absolutely agree that those differences are not easily quantified. Maybe it’s easier to see at primary level, where having one teacher responsible for everything makes it so much clearer. I have 3 kids and depending on the teacher they have had in any particular year, have seen them make great, some or no progress. My child hasn’t fundamentally changed in personality or ability year-on-year, so I can be pretty sure that the teacher is the one making a difference here. When I see other students in the same class reporting the same thing, or more than one of my children have the same experience with a particular teacher, I can pretty damn sure that it’s the teacher having that effect, rather than any other factors.

        Even at secondary, if you drill down to the level of the individual child, a good or bad teacher is pretty easy to spot. Ask the students! – they will almost certainly know which teachers stand out for either positive or negative reasons. (As will their parents, if this information has been shared.) I’ve been to Parents’ Evenings where it was blatantly obvious that the teacher did not have the faintest idea about my child, know their name or anything else about them. I’ve had to teach my child myself because their German teacher had omitted to mention such basics as the fact that all German nouns begin with a capital letter and then told my daughter she would fail German in her end-of-year test. (I’m not a German teacher – I still managed to get my daughter’s mark up from an incredibly low one in a mid-year test to near the top of her class at the end of the year, by going through her written work with her and explaining why she had made the errors she had, something her teacher had not bothered to do.) I’ve had students tell me how how boring/ineffective/incompetent their previous teacher had been. Students – and parents – can tell you exactly which teachers were effective and which ones weren’t.

        Re the 40% figure, that came from here:

        A recent Ipsos Mori poll for the Sutton Trust found that 24% of all young people in the UK have received private tuition at some point; in London, the figure rises to 40%.

        See: http://www.theguardian.com/education/2013/oct/25/new-boom-home-tuition

        Like

  7. Great blog and interesting (though not surprising) statistics.

    Certainly in my own case, and those of all my friends, we too clustered around a level regardless of the teacher and our interest in the subject. In fact the variation in my grades was entirely down to how I felt on the day of the exam. At GCSE level I did better in Chemistry (a subject I knew nothing about) than Geography (a subject I loved) simply because on the day of the Geography exam I had a headache.

    Also thinking back to my own time in school the teachers I regard as good didn’t always push us along with the exam syllabus but taught us about interesting things. My Spanish teacher was adamant that there was little point worrying about writing well in a foreign language and that speaking and a wide vocabulary would be far more important to us. 20 years later I can communicate well verbally in a language I got a zero in higher level writing for. Was he an effective teacher because the lessons were fun and he taught us how to chat up Spanish girls or a bad teacher because we didn’t get very good scores in the written part of the exam?

    Now my own children are about to start school and I worry that their enjoyment of school seems to be an increasingly small concern. If it can’t be measured then it doesn’t seem to matter.

    Liked by 1 person

    • I absolutely love the fact that your Spanish teacher taught you how to chat up Spanish girls. A hero!

      In my Department, I tend not to give instructions to my excellent colleagues, and not just because they’d rightly laugh at me for trying to wear a “leader” hat (perhaps I should make them stand up when I enter the room – apparently that’s a sign of respect). However, one principle which is sacrosanct in the Department, is that – insofar as possible, and ensuring you teach the necessary material effectively – you should try to make the students’ experience enjoyable.

      I suppose if one believed that students were likely to get to the same destination no matter what you did, you may as well ensure they’re happy on their journey.

      Like

  8. Great article. Two quick points:

    1. I don’t mean to point out the obvious, but is it not possible that teachers have some effect but it’s smaller than previously thought? Have you turned your statistics into numerical values and found the effect each teach has compared to all the others?

    2. You mention in the first example that your enthusiastic male teacher in year 9 had a big effect on your intake which then had a big effect on your results. If your other conclusions are correct then does this not imply that a good teacher is one students enjoy studying under since they will keep studying those subjects they enjoy.

    Like

    • Hi Nathan

      For your first point, I haven’t done that. While I can read a spreadsheet like the best of them, I’m no professional researcher. There’d be all sorts of issues with that too – how would we translate grades into numerical values across different subjects, for example ? To do a half-decent job, you’d probably have to weight subjects according to factors like % internally marked, % coursework, no. of lessons allocated to that subject (which may differ from school to school) and so on. That’s before we even get to the thorny issue of relative difficulty. I think one could come up with a model, but it would never be uncontroversial. It is interesting, though, that this sort of analysis to identify individual teacher impact hasn’t been done, given all you’d need is the detailed results breakdown from a good sample of schools, and the class lists which go along with that data. You should be able to discover how common or uncommon consistent outlier performances (either at bottom or top) are in our schools. If someone was willing to offer to pay me for a year off teaching while I did this, I’d be happy to do it. Otherwise, I suspect that the sort of back of an envelope exercise like this is all we’re going to get.

      For 2, I don’t draw quite the same conclusion. You see the “bad” year we had involved a similar number of students as the previous and subsequent “good” years. It’s just in the “bad” year’s Y9, a young(ish) new male teacher with lots of energy (then) and a sense of humour had taught four lower year 9 sets. He (alright, it was me) had the same impact on recruitment from those lower sets as the later (not me) teacher did on the top Y9 sets. So yes, I agree that a teacher can certainly affect the subjects which students carry on studying, but that had no impact on the outcomes those students achieved, sadly. Basically, my success in attracting less able Y9s to the subject in 2006, completely screwed my department’s results in 2008 ! Talk about perverse incentives….

      Like

      • I suppose I am genuinely surprised that the large tracking bodies (ALPS etc) don’t look in detail at this kind of thing because as soon as you mention it [what students are achieving elsewhere], it is clearly a big factor.

        For point 2, you make a good point in that any attempt to use enthusiastic teaching to improve departmental results quickly becomes a cynical attempt to discourage certain students (which would be awful).

        Perhaps there should be some kind of national anonymised data bank which allows people to run their own data analysis. With enough data you, people would flock to run analysis and my of the issues with weighting subjects could be taken against a large baseline of historical results.

        Thanks again for the article.

        Like

  9. Totally agree that it is the children who make the hugest difference to their own results. Last year we were in the bottom of the primary school heap, different cohort, same teacher, a few tweaks and now we are right back up where we should be. LA happy, governors happy and staff all know the main reason is the attitude of the children and parents. Next year a much weaker cohort and who knows what kind of testing and the bottom of the heap beckons again. I won’t be judging my teachers on it unless I think there is a genuine issue with their teaching. Life is hard enough in some schools already and the absurd notion that we can make year on year gains is frankly mental. Our system of failure will now start with four year olds and then a child can fail at 4, 6, 7, 11 (fail again twice in year 7) and then at 16 and 18 – no wonder we have the most stressed out children and teachers leaving in droves. If I send my special needs children off to Whitehall for the day on a minibus paid for with my pupil premium grant I may be able to get closer to the magic 85% target we are faced with next year, off to get three quotes asap!

    Liked by 1 person

  10. Thanks for putting this out there. I’ve suspected something like this for a while, mostly because science is often taught on the basis of groups with separate teachers for physics, chemistry and biology. (OK, it used to be taught that way, until it was necessary to cover up the lack of physics teachers). The usual pattern in the results was that there was a big difference between sets; top set had the best VA (normally around 0), lower sets doing worse and worse (bottom sets were normally around -1 to -1.5 grades). However, the results for the same set in different subjects didn’t vary much (spread of about 0.2 grades, which is pretty much what you get from the randomness at grade borderlines). It really didn’t seem to matter which combination of teachers a given group had, or their precieved repuations.

    There were two exceptions to this. One was a year where the timetable went wonky, and part of a class was split between two teachers (don’t ask…). They did much worse than anyone else. The other was a year when someone very senior (they had an private office and everything) took one of the groups for one subject. They got much better results with that group than their two colleagues, and I’m sure that wasn’t just becuase they had the time to intervene the hell out of their coursework, back in the days when that was legit. (Actually, I’m not at all sure of that; the improvement in their overall results was exactly what you’d have predicted from the coursework marks).

    That matches the observation that the really intensive “get them up to 5 Cs” interventions need huge amounts of input to haul a couple of grades up from D to C for one student. And that finding loopholes in exam rules is so popular; bluntly, finding a course that “counts”, or doing multiple resits, or finding a way round the resit rules has a better return on effort than teaching.

    None of which means that there aren’t more or less effective teachers, or that we shouldn’t try to get better. It’s just that the difference is statistically small. So if there are differences, they’re probably down to something else. And maybe some of the huge effort and expenditure put into monitoring and developing teacher effectiveness would be better put into other things that would help pupils.

    Liked by 1 person

  11. Hmm. There is a lot here to read and digest, but I get the themes and, well, I can’t help but think they are driven to an extent by an emotional feeling of unfairness.

    OK, I work with schools in using performance data, so I would say that wouldn’t I? Actually, no. I got involved particularly as I wanted to take the statistics out of evaluating data. Using the right tools it ought to be possible, if someone understands teaching and learning, to work constructively with school performance data.

    I am not going to make this a long response. There is a lot of information in this article and I certainly don’t wish to try to counter what it says. I would only point out that data itself is not a bad thing. Well. how can it be? The world runs on it. Doctors use it to work out if I am healthy or not. But data is only really useful when it is turned into information. So we need tools to do this. This is what I have been working on for the last 12 years or so.

    The next point is that data tools which turn data into information don’t in themselves make judgments.
    They provide information so that the information can be considered alongside other information, and then humans then make intelligent interpretations of this information. By doing this we will wrestle with the ambiguities, the variables, the confidence levels, the incomplete picture, that can all lead to incorrect conclusions being drawn, and build our knowledge of what the information is telling us.

    Along the line someone may make an observation, like “it looks like class A has done better than class B” and as a result, more research will be done and more evidence will be collected to see if that hypothesis is correct. So, in this example, if there is nothing different between two classes than the teacher that stands in front of them, it may be that a fair judgment would be that we have a proxy measure here of the effectiveness of teaching between the two groups. This is then worth following up.

    Judgements like ‘good’ or ‘bad’ are subjective value judgements. Ofsted criteria for the 4 Ofsted categories provide safer ground because they are defined by using well-publicised criteria for which evidence can be collected. (However, I agree that individual lessons shouldn’t be graded by inspectors.)

    So, we do have to engage with data, in order to make sure that it is used sensibly. It won’t go away.
    We need to turn data into information and make sure that people are drawing the right conclusions from it. We need to challenge subjective value judgments, and we need to see this activity as part of a programme of research-lead school improvement.

    Mike Bostock

    Like

    • Thanks for the comment. If I can clarify – I certainly don’t think data is “bad”. I use it all the time – for example, I’ve based several appeals to exam boards for re-marks on statistical oddities. I just think that too often a particular dataset, like the example I gave, is misused for purposes for which it is manifestly unsuitable. Data isn’t bad. It’s just data. It’s the interpretation which can go badly wrong.

      And this isn’t driven by a sense of unfairness at all. As I mentioned, I’ve somehow, rightly or wrongly, accrued the reputation of being a “good” teacher, and my data presents a picture which could easily be seen as very positive. So I’m not suffering personally from this (or at least haven’t since 2008!). It’s more that I can’t see a sacred cow without wanting to reach for my cattle prod, and the concept of being able to rank teachers from bad to good is a cow which is ripe for prodding.

      Like

      • I agree that the cattle prod needs to be used on sacred cows, if that is how data is being regarded.

        Education has been slower than other areas of life, like health, to make good use of diagnostic data.
        We are getting better at it I think, but as your article points out, there can be an implicit tyranny if people in power (whether in a school or in Government) make sweeping statements based on data that doesn’t support such certainty.

        Yesterday’s report in The Times of Sir Michael Wilshaw saying that “a quarter of secondary school leaders were not good enough” may be an example of this.
        It’s a headline grabber, and we do need to ask him to explain how he has reached this conclusion.
        It is also indicative of a certain blustering top-down leadership style that is not uncommon, and probably a bit prone to having a negative, rather than positive impact upon those for whom it is intended.

        As to ‘ranking teachers’, well, there is data available in every school which can be used to look at relative performance within a subject. We provide a tool called the Subject Profile which will compare the school ‘subject residual’ across classes. But there are also many variables that need to be considered before we can start comparing ‘teacher impact’. This is why we also provide a Research tool to allow a school to look at the impact of the many influences upon the performance of any one group of pupils.

        Of course, the greatest value comes from teachers using these and other tools to investigate these differences within and across subjects. It is why data tools should be placed in the hands of teachers, to support their research – a bottom-up approach – rather than used by those in power seeking simplistic explanations. We have good evidence that this is a much more effective way of using data to bring about school improvement.

        Like

  12. Fascinating post. I am looking at school data in this way, and hope you are wrong. I will post at the end of the week.

    Like

    • I look forward to it. However, a brief correction : I didn’t say that there’s no possibility that differences will emerge elsewhere. I’m open to the possibility that my school is an outlier of uniformity.

      If much decent evidence turned up from other schools of a consistent significant differential across comparable subjects , I’d be willing to accept that as a suggestion that the “good” teachers are out there. I guess the next question would be : how many are there ?

      Like

      • Why do you insist on saying that the good teachers are only out there if their goodness can be quantified? I don’t in any way share your faith in the reliability of exam results – the fact that it’s dubious whether exam results can even tell us whether we’re looking at a good pupil means it should come as no surprise at all that they shed little light on who is (or isn’t) a good teacher.

        But just because we can’t quantify something doesn’t mean it doesn’t exist. I can’t measure how delicious a meal is – it doesn’t mean all food tastes the same and there is no difference between a well-cooked meal and something cheap out of a packet.

        This seems a fundamental error in your argument.

        I don’t disagree with your underlying conclusion – that teachers should not be rewarded or punished for the results of their pupils – but this not because all teachers are equally good, but because it is impossible to reliably measure how good or bad a teacher is. And existing measures used to do so are laughably inadequate.

        Like

        • Hello again Caroline.

          I put “good” in quotation marks simply because the “good” which this blog was about was the “good” teacher referred to by journalists, politicians and Wilshaw. When they talk about “good” teachers they are certainly referring to them as producers of higher exam results in their students. Hence the search for the “good” teacher. If you think that I personally measure whether a teacher is “good” or “bad” based on the exam results of his/her students, I’m afraid that you’ve seriously missed the point of this blog. I couldn’t be further from that position.

          However, I also don’t buy into ranking teachers in other subjective ways either. Not least because the things you might value in one of your children’s teachers might not be the same things I value. For example, you wrote of your dismay about a languages teacher who hadn’t taught German grammar effectively so you had to catch up. Another commenter wrote about a languages teacher who didn’t bother to teach written Spanish but DID teach his class of boys how to chat up Spanish girls. In all seriousness, I would much rather my own children were taught by a teacher with that sense of fun and practicality than someone who stuck to the test requirements but left them bored. Genuinely. One person’s “good” teacher – even away from exam results – is another person’s tedious turn-off, and vice versa.

          As I wrote in a different reply, I think that there are many things which I would class as “good” in a teacher, none of which could be measured. But ultimately, they are subjective. Any experienced teacher has observed children who love Teacher A’s lessons, but hate teacher B’s lessons, who sit next to a student in the same class who thinks teacher B is marvellous, and teacher A awful.

          So I tend to be very wary of any attempt to compare teachers on any grounds other than very basic thresholds, like not having a dangerous or chaotic classroom, or having zero subject knowledge for the subject they’re supposed to be teaching, for example. Because as you note, it’s impossible to measure them objectively, so all you end up with is a subjective comparison, and your priorities are almost certainly not mine when it comes to what we’d both value in a teacher.

          When I first started teaching, I was an arrogant bugger, convinced that I’d be able to blaze a trail and leave colleagues in the dust. I’m a bit of a “performer” so I quickly acquired a very strong reputation amongst students, staff and parents alike – rightly or wrongly one I still have, I think, based on feedback. I had one colleague who is no longer with the school who had the reverse reputation: I was aware of kids who complained their lessons were boring; some parents went so far as to write in asking for their daughters to be moved out of their class; management was concerned and observed a fair bit. I was, in those early days, convinced that I was the “better” teacher. Yet their students’ exam results were the same as mine and – what’s more – I witnessed students giving lovely cards and gifts to that teacher thanking them for all the help they’d given. Some of those students I hadn’t taught, but some of them I had, and I’d never managed to build that sort of relationship with them, despite all my classroom performances. It taught me some humility and perspective. Just because that teacher was very different from me, did not make them worse. Just because some students felt they got more out of my lessons did not mean that all students did.

          Finishing with your food analogy, I agree that there is a difference between a well-cooked meal and something cheap out of a packet. But my kids would take the delivery pizza every time if they could, while I’d run a mile from anything “quality” which came out of a shell. We’re all different, I guess, and I suppose if I wanted to come up with the best argument for not trying to compare too much, I’d point at someone like Wilshaw, who is so utterly convinced that God broke the teacher mould after him, that his entire prescription for every classroom in the land is that all teachers should just do what he did, and then everyone would be great. He can’t measure that, but he just knows he was the best anyone could be. Grim.

          Like

          • Excellent response! Couldn’t agree more – I’m afraid I haven’t read your blog before and commented on this article only. You’re quite right – I did miss the inverted commas and that makes a whole heap of difference.

            I do, however, maintain some difference to your stated views in that unlike you I do think that there is better or worse teaching (if not ‘good’ and ‘bad’ – and if not easily measurable and therefore useful to the Goves and Wilshaws of this world), and that meaningful CPD can help us improve as teachers.

            I’m still very dubious about your labelling of all differences in quality as purely subjective – like your languages example. I do teach languages (though not German) and don’t know a single languages teacher anywhere who would argue that schoolchildren should not be taught basic use of the written language. Teaching kids that all nouns begin with a capital letter is hardly very difficult to teach (concept can be explained in about 1 sentence) but it is so fundamental that it beggars belief any teacher of German could omit to mention it – it would be like an English teacher omitting to mention the need for a space between words. Or a history teacher omitting to mention the meaning of those little letters AD and BC. The idea that language teaching involves a focus on either fun and meaningful communication on the one hand, or correct use of grammar on the other hand, was exposed as a false dichotomy about 40 years ago and I don’t know any language teachers who wouldn’t expect to teach in a way that was relevant and fun, but ALSO increased accuracy. Anything else is just lazy teaching. Obviously, there are many, many ways to achieve this, and the jury is out as to which method is better. It’s also true that not all learners want or need to know reading and writing skills and ideally, courses would be based on needs analysis so learners are only taught reading and writing skills if this is something that they need or wish to know. But of course secondary-school aged children are never allowed to make those decisions in any subject in the UK, so I see no particular reason why languages should be different. Certainly, my daughter’s teacher had no reason whatsoever to assume that her students did not wish to be able to read or write in basic German, and to make that judgement without checking with the students in question first is unacceptably presumptious. My daughter did want to know that; that teacher failed her. (Her. Not Gove, or Wilshaw, or an arbitrary grade target in an exam. Her. The student.)

            Speaking as a teacher, I do know when I’ve taught a really poor lesson and conversely I know when a lesson has really gone with a bang and a class full of little light bulbs inside heads have all just gone off. I’m sure all teachers know this too, as regards their teaching. As a teacher, I’d like to have more classes like the latter example than the former – not because I’ll get paid more (I won’t), not because I’ll get the sack if I don’t (though this would be theoretically possible if I produced enough of the former type), nor because of any extrinsic government-inspired carrot or stick – but because as a teacher I enjoy seeing students ‘get’ things I’ve taught. Don’t all teachers? Aren’t we all aware that some of our classes achieve this better than others?

            Unlike you, I certainly wouldn’t claim to be a routinely excellent teacher, though I wish I was; the reality is that I can easily spot the difference in quality within my own classes, so it seems pretty obvious that there will be greater variation still in quality between people than there is within one individual. The fact that it’s undeniably complex to explain (in words, let alone scores) why or how a lesson ‘worked’ does not mean that no significant differences in lesson quality can ever exist.

            I think your original article makes some excellent points and I found your statistical analysis really interesting. But I think that what your arguments successfully prove is that current inferences based on exam grades are invalid. It makes that argument really, really well and convincingly. But by cluttering up your argument with some unrelated and unproven hypotheses – that the quality of all teaching is effectively the same and any differences in views are purely subjective, along with your claim that all CPD is therefore pointless – you actually undermine your main thesis. That’s a real shame, because I think you make a point that really needs to be made.

            Like

  13. Or to put it another way, your evidence proves (or at least strongly suggests) that students’ exam grades are very unlikely to be directly attributable merely to the teaching skills of an individual teacher. Therefore, rewards or penalties for TEACHERS based on these grades are invalid.

    But you cannot assume that because teachers’ role in students’ learning is unclear and hard (impossible?) to quantify, that teachers play no significant role in students’ progress at all. You cannot assume that quality of teaching does not significantly vary, either within the work of an individual teacher, or between teachers. You cannot assume that overall quality cannot improve, nor that if it did, its impact would be negligible. (Even though it might be hard to prove what impact it did have.) The fact that something is hard to prove does not mean it categorically cannot exist. It’s not possible to prove or disprove anything based on an absence of evidence!

    All you can say with some certainty is that existing measures to quantify the ‘quality’ of teachers are so flawed as to make any judgement based on those inferences utterly invalid. On this particular point, I agree with you absolutely and also hope to see further research on this too.

    Like

    • Thanks Caroline. I think we’ll agree to differ on the secondary issue of whether it’s possible to identify differential teacher quality on anything other than an entirely subjective basis (and indeed, on whether it’s worth bothering to do).

      I would pick you up on one final point though. You wrote that I consider myself an excellent teacher. I don’t. I’ve said I acquired a reputation as a strong teacher, but reputations are smoke and mirrors. I even wrote about this in the blog which prompted the second largest number of responses. Link below.

      https://disidealist.wordpress.com/2014/08/05/im-not-an-outstanding-teacher-nor-is-anyone/

      I also don’t think all CPD is rubbish. Essentially, the more practical it is, the better for me : so CPD from an exam board on a new specification is very useful, whereas anything with the word “leader” in it is going to be vacuous guff.

      Thanks for your thoughtful contributions. I’ve enjoyed chewing things over with you.

      Like

  14. Reblogged this on THIS Education Blog and commented:
    A really fascinating read…
    I would recommend that every teacher should take at least 10-15 minutes to read this. It gives a different perspective when analysing the data and outcomes of your students.

    Like

  15. Thanks for this. A really good and different perspective. Reblogged on thiseducationblog.wordpress.com Thanks again.

    Like

  16. “Facts are stubborn but statistics are more pliable”

    Mark Twain apparently.

    As a parent governor at a large primary, I found your article fascinating and challenged some of my prejudices coming (as many governors do) from the private sector.

    I will endeavour to better support and challenge the school I serve.

    Being outside the teaching profession, I still cling to the assumption that teachers matter more than the “raw material” of a cohort.

    Perhaps the impact of a good (no quotes) teacher is not measured by simple results, but instead by any improvement in a pupils’ ability to learn. Ie, increase the potential VA for a student. This might be from A to A*, or from C to B, so is not measurable from data; only peer observations, positive feedback, and CPD can help identify and improve this.

    (Note the Oxford comma.)

    Do you agree, or have I misunderstood…?

    Like

    • I think I WANT to believe that teachers matter a lot, but all I can find is evidence which suggests that teachers matter a little, if at all. One thing I am absolutely certain about though, is that any attempt to measure individual teacher quality through the exam results of a very small sample of individual students is ultimately an exercise in futility except in the case of catastrophic teachers (those who teach the wrong course so all their class fail, for example).

      All such an attempt does is wrongly ascribe responsibility for exam results to teachers. Teachers who do not sit the exams, or revise for them, or spend two years studying (or not) for them, or developing their brains, attitude to learning and literacy over the preceding 16 years, or establishing peer culture, or setting parental norms. All those factors dictate what the exam outcomes will be. None of those factors are within the gift of the teacher, and all are intrinsic to the student who sits down in the exam hall.

      In short, exam results measure student achievement, not teacher achievement, and we have allowed ourselves to drift a long way away from this reality into a fantasy world where teacher input = exam outcomes. It doesn’t, and it never has.

      Of course, I’m just an anonymous long-winded blogger. For a much shorter, very clear statement which summarises this argument, I’d direct you to this evidence submitted to Parliament:

      http://www.publications.parliament.uk/pa/cm201516/cmpublic/educationadoption/memo/educ19.htm

      The relevant section is here. I cannot emphasise this enough.

      It is clear that individual school outcomes are a largely consequence of their student intakes. Special schools have lower attainment results than the national average because they recruit only children with severe learning challenges. Grammar schools have higher than average results because they recruit only children with high levels of pre-existing attainment. These results are clearly not because one set of schools is good and the other bad. And so it is with all schools. Insofar as we can explain school outcomes (80%-90% accuracy) they are entirely predictable from the prior attainment and socio-economic characteristics of their student intake. There is no consistently better school or type of school. Any purported school ‘effects’ are tiny in comparison to the composition effect, and wildly volatile from year to year. The quickest way for a school to raise its results is to change the nature of its intake. In which case it is no longer dealing with an ‘equivalent intake’.

      This is just plain fact. Evidenced, logical, fact. It’s worth us all pausing for a bit and thinking just how far from this reality we now actually are in education policy, where teachers’ pay is related to student results over which they have almost no influence beyond the catastrophic; where teachers and schools are ranked by student results which are largely determined by their admissions in Y7, over which they have little control; and where the entire education “debate” about teachers and schools is dominated by phrases like “good”, “inadequate”, “coasting”, which are effectively based on the results of students which are simply not within the control of those teachers and schools. It’s a massive exercise in wishful thinking which is destroying careers, ruining reputations and allowing the large-scale transfer of public assets into private control all over the country. All based on a myth. Remarkable, really.

      Like

  17. From “Something of Myself” Kipling

    My main interest as I grew older was C—-, my English and Classics
    Master, a rowing-man of splendid physique, and a scholar who lived in
    secret hope of translating Theocritus worthily. He had a violent temper,
    no disadvantage in handling boys used to direct speech, and a gift of
    schoolmaster’s ‘sarcasm’ which must have been a relief to him and was
    certainly a treasure-trove to me. Also he was a good and House-proud
    House-master. Under him I came to feel that words could be used as
    weapons, for he did me the honour to talk at me plentifully; and our
    year-in year-out form-room bickerings gave us both something to play
    with. One learns more from a good scholar in a rage than from a score of
    lucid and laborious drudges; and to be made the butt of one’s companions
    in full form is no bad preparation for later experiences. I think this
    ‘approach’ is now discouraged for fear of hurting the soul of youth, but
    in essence it is no more than rattling tins or firing squibs under a
    colt’s nose. I remember nothing save satisfaction or envy when C—- broke
    his precious ointments over my head.

    I tried to give a pale rendering of his style when heated in a ‘Stalky’
    tale, ‘Regulus,’ but I wish I could have presented him as he blazed forth
    once on the great Cleopatra Ode–the 27th of the Third Book. I had
    detonated him by a very vile construe of the first few lines. Having
    slain me, he charged over my corpse and delivered an interpretation of
    the rest of the Ode unequalled for power and insight. He held even the
    Army Class breathless.

    There must be still masters of the same sincerity; and gramophone records
    of such good men, on the brink of profanity, struggling with a Latin
    form, would be more helpful to education than bushels of printed books.
    C—- taught me to loathe Horace for two years; to forget him for twenty,
    and then to love him for the rest of my days and through many sleepless
    nights.”

    Like

  18. Thankyou, I really enjoyed reading this. I teach music and the results of our students often differ significantly from those they get in other subjects so our situation may seem a bit different. However, like in History, the students are responsible for their results, we teach and they learn. Some have amazing talent and play music for hours a day and others can’t hold a tune or clap in time. The results reflect their effort and ability, not our teaching, which is pretty similar year on year.
    I think that teaching is like being a parent, as long as you are ‘good enough’ you shouldn’t beat yourself up, your students, by and large, will get the grade they deserve.

    Like

Leave a comment