Archive

Posts Tagged ‘Education’

The New School Accountability Regime in England: Fairness, Incentives and Aggregation

January 21, 2014 Leave a comment

Author: Simon Burgess

The New School Accountability Regime in England: Fairness, Incentives and Aggregation

The long-standing accountability system in England is in the throes of a major reform, with the complete strategy to be announced in the next few weeks. We already know the broad shape of this from the government’s response to the Spring 2013 consultation, and some work commissioned from us by the Department for Education, just published and discussed below. The proposals for dealing with pupil progress are an improvement on current practice and, within the parameters set by government, are satisfactory. But the way that individual pupil progress is aggregated to a school progress measure is more problematic. This blog does not often consider the merits of linear versus nonlinear aggregation, but here goes …

Schools in England now have a good deal of operational freedom in exactly how they go about educating the students in their care. The quid pro quo for this autonomy is a strong system of accountability: if there is not going to be tight control over day to day practice, then there needs to be scrutiny of the outcome. So schools are held to account in terms of the results that they help their students achieve.

The two central components are new measures of pupils’ attainment and progress. These data inform both market-based and government-initiated accountability mechanisms. The former is driven by parental choices about which schools to apply to. The latter is primarily focussed around the lower end of the performance spectrum and embodied in the floor targets – schools falling below these triggers some form of intervention.

Dave Thomson at FFT and I were asked by the Department for Education (DfE) to help develop the progress measure and the accompanying floor target, and our report is now published. Two requirements were set for the measure, along with an encouragement to explore a variety of statistical techniques to find the best fit. It turns out that the simplest method of all is barely any worse in prediction than much more sophisticated ones (see the Technical Annex) so that is what we proposed. The focus in this post is on the requirements and on the implications for setting the floor.

The primary requirement from the DfE for the national pupil progress line was that it be fair to all pupils. ‘Fair’ in the sense that each pupil, whatever their prior attainment, should have the same statistical chance of beating the average. This is obviously a good thing and indeed might sound like a fairly minimal characteristic, but it is not one satisfied by the current ‘expected progress’ measure. We achieved this: each pupil on whatever level of prior attainment an expected progress measure equal to the national average. And so, by definition, each pupil has an expected deviation from that of zero.

The second requirement was that the expected progress measure be based only on prior attainment, meaning that there is no differentiation by gender for example, or special needs or poverty status. This is not because the DfE believe that these do not affect a pupil’s progress, it was explicitly agreed that they are important. Rather, the aim was for a simple and clear progress measure – starting from a KS2 mark of X you should expect to score Y GCSE points – and there is certainly a case to be made that this expectation should be the same for all, and there should not be lower expectations for certain groups of pupils. (Partly this is a failure of language: an expectation is both a mathematical construct and almost an aspiration, a belief that someone should achieve something).

So while the proposed progress measure is ‘fair’ within the terms set, and is fair in that it sets the same aspirational target for everyone, it is not fair in that some groups will typically score on average below the expected level (boys, say) and others will typically score above (girls). This is discussed in the report and is very nicely illustrated in the accompanying FFT blog. There are plausible arguments on both sides here, and the case against going back to complex and unstable regression approaches to value added is strong. This unfairness carries over to schools, because schools with very different intakes of these groups will have different chances of reaching expected progress. (Another very important point emphasised in the report and in the FFT blog is that the number of exam entries matters a great deal for pupil performance).

Now we come to the question of how to aggregate up from an individual pupil’s progress to a measure for the school. In many ways, this is the crucial part. It is on schools not individual pupils that the scrutiny and possible interventions will impact. Here the current proposal is more problematic.

Each pupil in the school has an individual expected GCSE score and so an individual difference between that and her actual achievement. This is to be expressed in grades: “Jo Smith scored 3 grades above the expected level”. These are then simply averaged to the school level: “Sunny Vale School was 1.45 grades below the expected level”. Some slightly complicated statistical analysis then characterises this school level as either a significant cause for concern or just acceptable random variation.

It is very clear and straightforward, and that indeed is its chief merit: it is easily comprehensible by parents, Headteachers and Ministers.

But it has two significant drawbacks, both of which can be remedied by aggregating the pupil scores to school level in a slightly different way. First, the variation in achieved scores around expected progress is much greater at low levels of attainment than at high attainment. This can be seen clearly in Figure 1, showing that the variance in progress by KS2 sharply and continuously declines across the range where the bulk of pupils are. Schools have pupils of differing ability, so the effect is less pronounced at school level, but still evident.

The implication of this is that if the trigger for significant deviation from expected performance is set as a fixed number of grades, then low-performing students are much more likely to cross that simply due to random variation than high-performing students are. By extension, schools with substantial intakes of low ability pupils are much more likely to fall below the floor simply through random variation than schools with high ability intakes are. So while our measure achieves what might be called ‘fairness in means’, the current proposed school measure does not achieve ‘fairness in variance’. The DfE’s plan is to deal with this by adjusting the school-level variance (based on its intake) and thereby what counts as a significant difference. This helps, but is likely to be much more opaque than the method we proposed and is likely to be lost in public pronouncements relative to the noise about the school’s simple number of grades below expected.

Fig 1: Standard deviation in Value added scores and number of pupils by mean KS2 fine grade (for details – see the report)

figure121012014

The second problem with the proposal is inherent in simple averaging. Suppose a school is hovering close to the floor target, with a number of pupils projected to be significantly below their progress target. The school is considering action and how to deploy extra resources to lift it above the floor. The key point is this: it needs to boost the average, so raising the performance of any pupil will help. Acting sensibly, it will target the resources to the pupils whose grades it believes are easiest to raise. These may well be the high performers or the mid performers – there is nothing to say it will be the pupils whose performance is the source of the problem, and good reason to think it will not be.

While it is quite appropriate for an overall accountability metric to focus on the average, a floor target ought to be about the low-performing students. The linear aggregation allows a school to ‘mask’ under-performing students with high performing students. Furthermore, the incentive for the school may well be to ignore the low performers and to focus on raising the grades of the others, increasing the polarisation of attainment within the school.

The proposal we made in the report solves both of these problems, the non-constant variance and the potential perverse incentive inherent in the averaging.

We combine the individual pupil progress measures to form a school measure in a slightly different way. When we compare the pupil’s achievement in grades relative to their expected performance, we normalise that difference by the degree of variation specific to that KS2 score. This automatically removes the problem of the different degree of natural variation around low and high performers. We then highlight each pupil as causing concern if s/he falls significantly below the expected level, and now each pupil truly has the same statistical chance of doing this. The school measure is now simply the fraction of its pupils ‘causing concern’. Obviously simply through random chance, some pupils in each school will be in this category, so the floor target for each school will be some positive percentage, perhaps 50%. We set out further details and evaluate various parameter values in the report.

The disadvantage of this approach for the DfE is that the result cannot be expressed in terms of grades, and it is slightly more complicated (again, discussed in the report). This is true, but it cannot be beyond the wit of some eloquent graduate in government to find a way of describing this that would resonate with parents and Headteachers.

At the moment, the difference between the two approaches in terms of which schools are highlighted is small, as we make clear in the report. Small, but largely one way: fewer schools with low ability intakes are highlighted under our proposal.

But there are two reasons to be cautious. First, this may not always be true. And second, the perverse incentives – raising inequality – associated with simple averaging may turn out to be important.

The youngest children in each school cohort are over-represented in referrals to mental health services

January 13, 2014 Leave a comment

Author: Erlend Berg

The youngest children in each school cohort are over-represented in referrals to mental health services

It is known that the children who are the youngest in their class tend to do worse, in several respects, than their classmates. On average, they do less well academically throughout their school careers and are less likely to attend university. They have also been found to be less confident in their academic ability and are more likely to report being bullied or unhappy at school, and they are less likely to participate in both youth and professional sports.

Given this, it is perhaps not surprising that these children are also more likely to have mental health problems: they are more likely to be diagnosed with attention disorders, learning disability and dyslexia.

Still, little is known about the consequences for health service provision, and in particular the extent to which these children are over-represented as users of specialist mental health services. In a paper forthcoming in the Journal of Clinical Psychiatry, Shipra Berg and Erlend Berg investigate whether August-born children, who are the youngest in their class in the English educational system, are over-represented in referrals to specialist Child and Adolescent Mental Health Services. The threshold for referral to these services is relatively high, since minor problems are often dealt with by school health workers or family doctors.

The research method is simple. The cut-off date for school entry in England is 1 September. So a child born in August will be among the youngest in his or her class, while a child born in September will be one of the oldest. The researchers obtained dates of birth for all children referred to mental health services in three boroughs of West London for a period of four years, and compared the frequency of birth months of the referred children to the birth-month frequencies in the population.

For example, children born in September represent 8.6% of the population but only 8.0% of referrals. Hence they are 7.3% less likely to be referred to mental health services than the average child.

For August-born children the situation is reversed. Of all children referred to mental health services, 9.4% were born in August. But only 8.6% of the population of children in the relevant age group are born in August. That means that August-born children are 9.1% more likely to be referred than the average child, and 17.8% more likely to be referred than their September-born classmates. These figures are statistically significant, meaning they are very unlikely to be caused by random fluctuations in the data.

When boys and girls are examined separately, the main findings are confirmed for both sexes.

Children in the UK start school at a particularly young age, so an age difference of one year is substantial. The September-born child, who starts school around her fifth birthday, has had a 25% longer life experience than the August-born child, who starts school around his fourth birthday. Clearly, a one-year age difference shrinks as a proportion of life experience as the children grow up. One might therefore expect that the negative effect of being the youngest wears off over time. However, the authors find that the main effect holds for children of both primary-school and secondary-school age. This could mean that being the youngest is detrimental even in secondary school, or alternatively that the disadvantage of being the youngest in primary school has lasting consequences.

It is, in principle, possible to defer a school start to the term (there are three terms per year) in which the child turns five. However, this is rarely practised, because the child would still join the same class they would have been in had entry not been deferred. Deferring entry can therefore mean falling behind in academic and social development even before starting school.

It is worth pointing out that a large majority of children born in August are not referred to mental health services. Other factors, including the children’s home environment, are likely to be more important determinants of mental health than month of birth. Still, August-born children, being the youngest – physically, emotionally and intellectually – in their class, may be more vulnerable than their older peers.

 

RCT + NPD = Progress

October 31, 2013 Leave a comment

Author: Simon Burgess

RCT + NPD = Progress

A lot of research for education policy is focussed on evaluating the effects of a policy that has already been implemented. After all, we can only really learn from policies that have actually been tried.  In the realm of UK education policy evaluation, the hot topic at the moment is the use of randomised control trials or RCTs.

In this post I want to emphasise that in schools in England we are in a very strong position to run RCTs because of the existing highly developed data infrastructure. Running RCTs on top of the census data on pupils in the National Pupil Database dramatically improves their effectiveness and their cost-effectiveness.  This is both an encouragement to researchers (and funders) to consider this approach, and also another example of how useful the NPD is.

A major part of the impetus for using RCTs has come from the Education Endowment Foundation (EEF).  This independent charity was set up with grant money from the Department for Education, and has since raised further charitable funding. Its goal is to discover and promote “what works” in raising the educational attainment of children from disadvantaged backgrounds.  I doubt that anywhere else in the world is there a body with over £100m to spend on such a specific – and important – education objective.  Another driver has been the Department for Education’s recent Analytical Review, led by Ben Goldacre, which recommended that the Department engage more thoroughly with the use of RCTs in generating evidence for education policy.

It is probably worth briefly reviewing why RCTs are thought to be so helpful in this regard: it’s about estimating a causal effect. There are of course many very interesting research questions other than those involving the evaluation of casual effects. But for policy, causality is key: “when this policy was implemented, what happened as a result?” The problem is that isolating a causal effect is very difficult using observational data, principally because the people exposed to the policy are often selected in some way and it is hard to disentangle their special characteristics from the effect of the policy. The classic example to show this is a training policy: a new training programme is offered, and people sign up; later they are shown to do better than those who did not sign up; is this because of the content of the training programme … or because those signing up evidently had more ambition, drive or determination? If the former, the policy is a good one and should be widened; if the latter, it may have no effect at all, and should be abandoned.

RCTs get around this problem by randomly allocating exposure to the policy, so there can be no such ambiguity. There are other advantages too, but the principal attraction is the identification of causal effects. Of course, as with all techniques, there are problems too.

The availability of the NPD makes RCTs much more viable and valuable. It provides a census of all pupils in all years in all state schools, including data on demographic characteristics, a complete test score history, and a complete history of schools attended and neighbourhoods lived in.

This helps in at least three important ways.

First, it improves the trade-off between cost and statistical power. Statistical power refers to the likelihood of being able to detect a causal effect if one is actually in operation. You want this to be high – undertaking a long-term and expensive trial and missing the key causal effect through bad luck is not a happy outcome. Researchers typically aim for 80% or 90% power. One of the initial decisions in an RCT is how many participants to recruit. The greater the sample size, the greater the statistical power to detect any causal effects. But of course, also, the greater is the cost, and sometimes this can be considerable. These trade-offs can be quite stark. For example, to detect an effect size of at least 0.2 standard deviations at standard significance levels with 80% power we would need a sample of 786 pupils, half of them treated. If for various reasons we were running the intervention at school level, we would need over 24,000 pupils.

This is where the NPD comes in. In an ideal world, we would want to be able to clone every individual in our sample and try the policy out on one and compare progress to their clone. Absent that, we can improve our estimate of the causal effect by getting as close as we can to ‘alike’ subjects. We can use the wealth of background data in the NPD to reduce observable differences and improve the precision of estimate of intervention effect. Exploiting the demographic and attainment data allows us to create observationally equivalent pupils, one of whom is treated and one is a control.  This greatly reduces sampling variation and improves the precision of our estimation. This in turn means that the trade-off between cost and power improves. Returning to the previous numerical example, if we have a good set of predictors for (say) GCSE performance, we can reduce the required dataset for a pupil-level intervention from 786 pupils to just 284. Similarly for the school-cohort level intervention, we can cut back the sample from 24,600 pupils and 160 schools to 9,200 pupils and 62 schools.  The relevant correlation is between a ‘pre-test’ and the outcome (this might literally be a pre-test, or it can be a prediction from a set of variables).

Second, the NPD is very useful for dealing with attrition. Researchers running RCTs typically face a big problem of participants dropping out of the study, both from the treatment arms and from the control group. Typically this is because the trial becomes too burdensome or inconvenient, rather than on principle because they did sign up in the first instance. This attrition can cause severe statistical problems and can jeopardise the validity of the study.

The NPD is a census and is an administrative dataset, so data on all pupils in all (state) schools are necessarily collected. This obviously includes all national Keystage test scores, GCSEs and A levels. If the target outcome of the RCT is improving test scores, then these data will be available to the researcher for all schools. Technically this means that an ‘intention to treat’ estimator can always be calculated. (obviously, if the school or pupil drops out and forbids the use of linked data then this is ruled out, but as noted above, most dropout is simply due to the burden).

Finally, the whole system of testing from which the NPD harvests data is also helpful. It embodies routine and expected tests so there is less chance of specific tests prompting specific answers. Although a lot about trials in schools cannot be ‘blind’ in the traditional way, these tests are blind. They are also nationally set and remotely marked, all of which adds to the validity of the study. These do not necessarily cover all the outcomes of interest such as wellbeing or health or very specific knowledge, but they do cover the key goal of raising attainment.

In summary, relative to other fields, education researchers have a major head start in running RCTs because of the strength, depth and coverage of the administrative data available. 

Threshold measures in school accountability: asking the right question

September 25, 2013 Leave a comment

Author: Simon Burgess

Threshold measures in school accountability: asking the right question

We are in the midst of a significant upheaval in the setting and marking of exams, and the reporting of school exam results. One feature of the system has been the centre of a lot of criticism and highlighted for reform: the focus on the percentage of a school’s pupils that achieve at least 5 GCSEs at grades C to A*, including the scores on English and maths. This is typically the most-discussed metric for (secondary) school performance and is the headline figure in the school league tables.

The point is that this measure is based on a threshold, a ‘cliff-edge’. Get a grade C and you boost the school’s performance; missing a C by a lot or a little are the same, and just scraping  a C is the same as getting an A*.

This has been described as distorting schools’ behaviour, forcing schools to focus on pupils around this borderline. The argument is seen as obviously right and strong grounds for change. In this post I want to make two counter-arguments, and to suggest we are asking the wrong question.

First a basic point. One central goal of any performance measure is to induce greater or better-targeted effort. This might just mean “working harder” or it might mean a stronger focus on the goals embodied in the measure at the expense of other outcomes. The key for the principal is to design the best scheme to achieve this. A very common scheme is a threshold one – this can be found for example in the Quality and Outcomes Framework for GPs, service organisations with a target number of clients to see, and of course schools trying to help pupils to achieve at least 5 grades of C or better. An organisation working under a threshold scheme faces very different marginal incentives for effort. Considering pupils: the most intense incentives relate to pupils just below the line: this is where the greatest payoff is to schools to devote the most resources.

The first counter argument starts by noting that the asymmetry in the incentive is not a newly-discovered flaw, it is a design feature which can be very powerful. If there is a level of achievement that is extremely important for everyone to reach, then it makes sense to set up a scheme that offers very strong incentives to do that – that focusses the incentive around that minimum level. This is precisely what a threshold scheme does.

So rather than simply pointing out that threshold designs strongly focus attention (which is what they’re supposed to do), the questions to ask are: is there some level of attainment that has that characteristic of being a minimum level of competence? And if so, what is it? If society feels that 5 grade C’s is a fair approximation to a minimum level that we want everyone to achieve, then it is absolutely right to have a ‘cliff-edge’ there because inducing schools to work very hard to get pupils past that level is exactly what society wants.  It may be that we are equally happy to see grades increase for the very brightest children, those in the middle or those at the lower end of the ability distribution. Or not: all the main political parties express a desire to raise attainment at the lower end and narrow gaps.

The argument should be about where to put the threshold, not whether to have one or not. Perhaps we are starting to see a recognition of this in the recent policy announcement that all pupils will have to continue studying until they have passed English and Maths.

The second counter-argument is based on a scepticism of what is likely to happen without the 5A*-C(EM) threshold acting as a focal point.

The core strategic decision facing a headteacher is how best to deploy her main resource: the teachers. Specifically: how best to assign teachers of varying effectiveness to different classes. It has been said that schools will be free to focus equally on all pupils.

Well, maybe. Or perhaps we should think of the pressures on the headteacher, in this instance from teachers themselves. Effective teachers are very valuable to a school and any headteacher will be keen to keep her most effective teachers happy and loyal. It seems likely (I have no evidence on this, and would be keen to hear of any) that top teachers would typically prefer to teach top sets. If so, we might see a drift of the more effective teachers towards the more able classes in a school (and therefore on average, the more affluent pupils). The imperative of the C/D threshold gave headteachers an unanswerable argument to push against this.

So threshold metrics have an important role to play in communicating to schools where society wants them to focus their effort. The current threshold, at 5 C grades, may or may not be at the right level; but discussing what the right level is, is a more useful debate to have.

Is education policy a blunt instrument when it comes to ‘social mobility’?

September 20, 2013 Leave a comment

Author: Matt Dickson

Is education policy a blunt instrument when it comes to ‘social mobility’?

Earlier this week, Tony Blair’s former speech-writer Philip Collins told a fringe meeting at the Liberal Democrats conference that social mobility was a ‘terrible objective’ and that in any case, education policy could do little to affect it.

“I can’t think of a single education reform in the 20th Century that had a marked impact on relative social mobility at all. Not one,” he remarked.

This conclusion depends on who you think it is important to be “relative” to. On the one hand you might think it is important to be compared to your own parents i.e. where you started, on the other hand you could think it is important to be compared to your peers – where you sit in the distribution compared to your peers from different backgrounds. Let’s think about the former comparison.

The 1972 raising of the minimum school leaving age (RoSLA) has been shown in numerous pieces of research to have increased the education, employment and earnings of the young people affected – relative to their school-mates in the years before the reform. Given that we know that the people who were made to remain in school an additional year were disproportionately from lower socio-economic backgrounds, this policy improved the economic position of young people at the lower end of the economic scale.

“The dull child of the middle class parent has to come down the wrung in order for me to go up, otherwise you don’t have social mobility,” is another problem that Collins identified with the objective of social mobility.

However, nobody had to come down the earnings or education ladder in order for the young people affected by RoSLA to move up – so this policy improved the chances that young people with low taste for education and/or lower ability and from poorer backgrounds, would gaining qualifications, employment and greater earnings. Technically this would be considered “absolute social mobility” and Collins is right in making the assertion that for there to be upward “relative social mobility” there needs to be an offsetting downward move of some.

But Collins is taking a very strong line here – arguably, what we should care about as a society is the extent to which people from all backgrounds can maximize their potential and not have their opportunities curtailed purely because of their parents’ education, income or class. This encapsulates what ‘social mobility’ is all about – and why it remains an important objective.

Moreover, it is an objective that is amenable to policy, as demonstrated by the impact of RoSLA and other education policies of the last fifty years. Another major structural reform in the post-war era was the abolition of selective education in most of the country. Despite on-going controversies, we know that the grammar school system was detrimental to the majority of children from poor households and its ending reduced a major source of income-based differentiation in life chances.

Furthermore, the expansion of higher education in recent decades has seen increases in young people from poorer backgrounds accessing university and the opportunities for progression that this affords. A study by the Institute for Fiscal Studies for the Nuffield Foundation last year showed that while higher education participation has been rising in general over time, it has been rising quickest for young people from the poorest families. This represents genuine ‘social mobility’, driven by a reduction in the educational inequality that separates children from better off and poorer backgrounds.

Taking a longer perspective, one hundred years ago most pupils left school aged 12, People “knew their place” in society and the education system offered very little means of escape for children from poorer families. While the labour market has also changed dramatically since those days, it seems very unlikely that education policy and the revolution in secondary education in particular has had no effect on the chances for poorer pupils of getting on in life.

Education spending, pupil attainment and causality

April 29, 2013 1 comment

Author: Simon Burgess

Education spending, pupil attainment and causality

In these hard times, spending government money effectively is more important than ever. Last week Fraser Nelson challenged the effectiveness of spending in schools, one of the areas relatively protected from Coalition cuts. He said: “The biggest surprise, though, was the money: no matter how you split the figures, the amount spent didn’t seem to make the blindest bit of difference”, his reading of a report by Deloitte commissioned by the Department for Education.

What is the evidence? In fact, it is surprisingly difficult to establish the impact of spending more money on student achievement. This is partly shortage of data (researchers always want more data), but there is a more fundamental reason too.

Perhaps inadvertently, Fraser Nelson illustrated the difficulty in his first paragraph. He noted the variation in per-pupil expenditure “ranging from £4,500 in Lyme Regis to £10,000 in Salford.” This is absolutely right – there are very significant variations in revenue per pupil. But the key point is that these are not random: extra resources are explicitly and systematically directed towards schools in poorer neighbourhoods. The mechanism, accreting the new schemes of each successive government, may be incomprehensibly complex, but the intent is surely right.

Getting back to our question, on the one hand we have this systematic distribution of resources towards poorer neighbourhoods. On the other hand we know that pupil attainment is typically lower in schools in such neighbourhoods; not for every pupil, not in every school, but on average. So if money has no impact on attainment, and we line up pupil attainment and school expenditure, we will tend to see a negative relationship. This derives solely from the way that money is distributed to schools. The fundamental problem is that there are two things going on with opposite effects: low attainment is associated with more money (via the schools funding system) and more money may be associated with high attainment (via the education process). With no other information, there is simply no way of disentangling these two opposing effects, and by itself these numbers can tell us nothing about the causal impact of school expenditure on pupil attainment.

So the view that “the amount spent didn’t seem to make the blindest bit of difference” cannot be supported by this evidence.

What of the wider research evidence, based on studies with a plausibly causal research design? One of the most prominent economists in the field of education, Rick Hanushek from Stanford, is famously sceptical of the value of greater resources for schools. There certainly are studies that show money can matter, but it is probably fair to say that the majority view among economists is that simply providing more resources for schools is not the best option.

The really interesting question is this: why doesn’t more money raise attainment? More money usually helps most things. Either there simply is nothing that schools can buy that raises attainment. This seems unlikely, and would certainly be a surprise to parents paying many thousands of pounds to send their children to private schools. Or there are features of the system which lead schools to spending extra resources on the ‘wrong’ things – things that have little impact on attainment. This might be the manner in which the money is distributed by government (typically short-term, making long-term expenditure decisions risky); or the regulations and agreements governing its spending by schools; or other factors. We have speculated a little about this here

Coincidentally, the Department for Education has just opened a consultation  on school efficiency – they await your views.

Profits in schools – response

March 26, 2013 Leave a comment

Author: Simon Burgess

Profits in schools – response

There were some interesting comments on the recent post I wrote on profit-making schools. This post offers a brief reply to those points.

First, one comment was that allowing profit-making is a solution to the lack of capital for schools:

“advocates see profit-making as a way to tap the private finance that might allow supply-side liberalisation, which would in turn allow choice to operate more effectively than it does at present. Theoretically, of course, this boost to capacity could be done with public finance. But it’s questionable whether the necessary level of spare capacity would be politically sustainable given all the other calls on public spending (especially now). So private finance is (arguably) one solution to that problem.”

It may be a solution to that problem, but it is not a necessary solution, there are other ways. The PFI programme has been funding capital spending on schools for over a decade now.  Nor is it just a thing of the past: in 2011 Michael Gove announced capital expenditure through PFI of around £2bn to rebuild 300 schools. The latest estimates are that PFI expenditure on education will top £260m in 2012-13, and the whole programme has generated over £7bn for school building. The PFI obviously utilises the profit motive in the capital market to get funds into school building without needing profits in the schools themselves.

Second is the question of just how profits can be made. Given fixed revenue per student, it is not possible to directly make a greater rate of return by raising quality (the indirect route is discussed below). Profits can be made by reducing costs. This may be possible without reducing quality, or not. That possibility is that other agents can come in, re-arrange the budget, reduce costs and maintain quality by raising quality per pound spent. The comment was:

“You also argue that ‘outsiders’ are unlikely to know best how best to deploy their budgets. This seems like an odd argument. The market’s virtue is supposed to be innovation and the ability to scale good practice quickly through incentives to mimic the best. If you don’t think that works then I can’t see why you’d be interested in the practical aspects of for-profit schools, since there wouldn’t even be any benefits in principle.”

It is certainly true that schools are unlikely to be making completely optimal decisions. Our own work shows a huge degree of heterogeneity in schools’ financial decisions which is very unlikely all to be optimal. So they certainly have scope for learning. And schools may be able to learn from each other: a lot of people interpret the success of London schools as down to ‘London Challenge’ – and a lot of people interpret the success of that to collaboration, to learning from other schools. In fact, we are in the design stage of a large-scale RCT to test this out. But the key point is that with the current system for school revenue, allowing profit-making provides incentives to reduce costs but no direct incentive to raise quality. So again profits might be a way of encouraging collaboration, but there are other, easier, ways of doing the same thing.

The indirect channel for profit making to affect quality is a dynamic one. The third comment is:

“Presumably if you designed the admission and information systems properly then schools in which children make more progress will expand (either on site, or on another site) due to increased demand This could either come from parents choosing higher performing schools or commissioners awarding contracts/charters to higher performing schools. Then, assuming the school makes a fixed profit on each student they ‘process’, they will increase their profit through increased market share. Student progress up > Market share up > Profit up.”

The key here is the word “presumably”. Yes – this is the standard dynamic market process. If this worked in schools, then this would make choice and competition more effective in raising quality. But it does not appear to work well, as we described here. Understanding the best way to reform the revenue stream for schools to encourage expansion is the important part; profit-making may eventually be part of an incentive mechanism, but is currently tangential to the main problem.

I’m an economist, I believe that incentives matter hugely. Indeed, many of the things that I write or say to the Department for Education involve the phrase “you need to make it matter more”. But that is about individual incentives: perhaps making the pay of Headteachers contingent on school outcomes, perhaps introducing some form of performance incentive for teachers.  These people can raise quality, and can be rewarded for doing so.

Within the present rules of the game, schools cannot be rewarded for raising quality, because the revenue they would receive is independent of quality. Clearly, profit-making schools can introduce individual performance incentives; but so can – and have – non-profit making schools. Again profit-making is a side issue. It’s the wrong battle to fight.

Should we have profit-making schools?

March 7, 2013 4 comments

Author: Simon Burgess

Should we have profit-making schools?

Profit-making schools have returned to the education debate in England. This is an emotive issue for many, but an economic analysis is useful in defining the real issues.

There are some simple claims that can be quickly dealt with.

  • “Education is far too important to be left to the mercy of profit-making companies.” Education is undoubtedly very important, for long-run growth, for social mobility, and for personal well-being. But think about possibly the most elemental of human needs, the production and distribution of food. While this is regulated by government, we are happy to leave all the decisions to profit-making companies. No-one seriously advocates the nationalisation of food.
  • “It just won’t work.” It clearly does at a general level. Countries around the world, including those with well-regarded education systems such as Sweden, allow profit-making schools.
  • “No-one should make money out of education.” Obviously they do at the moment: schools buy things from profit-making companies. This obviously has to be the case unless schools are going to start making their own books, desks and computers. So the real issues are (1) what kind of deal can schools get to minimise profiteering, and (2) what services are best bought in from outside as opposed to provided by the school itself

The appeal of allowing profit is the view that it makes decisions matter more. It provides strong rewards to organisations to innovate, to raise quality, and to do things more efficiently. Crudely, on a per-unit basis, organisations are pushed to improve quality and therefore revenue, or to reduce cost.

What would be the effects of this in the current education system in England? To answer this, we need to think about the parameters of the market.

Start with revenue. Schools get revenue for having students on the books. It is more or less a per-capita fee, albeit with some extras and some adjustment by the LA (for community schools). But to adopt the language of business, this money is for processing the students. The revenue that the school receives for each student depends not at all on the progress that the student makes.

This is central to the issue. Given the current system, there is nothing that profit maximising schools could do to raise their revenue per student by raising quality. Immediately, a great deal of the appeal of profit-making is removed.

The only way that schools could make profits is by driving down costs. This may be fine; it may be that this doesn’t really affect the quality of education if done in a smart way. If not done in a smart way, the quality of education would suffer and attainment would fall. It is clear that even the optimistic scenario does not improve education systemically in any way, either statically or dynamically through encouraging entry. The quality of education is the same, and the overall cost to the taxpayer is necessarily the same.

The counter-argument is that the pressure for profit might reduce slack enough so that the fall in costs allowed for profits and an increase in money spent wisely so that attainment increased. For this to work, it has to be that school budgets are spent very unwisely, and that an outside organisation could identify and cut ‘bad’ spending, take some profit and raise ‘good’ spending. It is certainly true that there is a huge amount of idiosyncratic variation in school financial decisions, variation that is unlikely to all be the result of optimal decision-making. Schools either know how to better deploy their budgets but are not sufficiently incentivised to do so, or they do not know. If they do not know, it is unlikely that outsiders will do (other schools may know; but that is another issue, only very clumsily mimicked by profit-making). Profit-making may answer the first point, but so do two other approaches, discussed below.

So profit-making is pointless at best: under the current market set-up, improvements in attainment would not make money (so would not happen) with profit-making schools, and cutting costs would make money but would either reduce attainment or leave it unchanged.

There are alternative strategies that might get some of the benefits of the innovative drive that profits might unleash, but in a more productive way: paying for attainment and incentivising cost reductions through resources for the school.

Paying for attainment. A positive step that keeps the current non-profit system intact but provides some of the same incentive is tying schools’ revenue to their pupils’ attainment. This would be straightforward to administer in principle, but there are some critical issues to resolve before it could be implemented. Chief among these is: should we pay for the simple ‘output’ of the school (GCSE points) or for pupil progress? There are good arguments both ways, to be visited in another post.  Of course, schools do much more than produce attainment, but this is the focus of policy.

Incentivising greater efficiency in other ways.   What if any surplus generated by this process had to be re-invested in the schools? Perhaps schools need some strong incentive to reduce costs. This might well be true, but this is not profit-making: profit-making by definition means the taking of monetary reward out of the school. An alternative scheme would be essentially equivalent to a team (school)-based incentive scheme in which the incentive is not money for the teachers, but resources for the school – resources saved are kept in the school. This is again potentially a good idea, worth looking at and some way short of profit-making.

Profit making in schools would either solve all schools’ problems nor signal the end of civilisation; the issue provokes strong feelings, but largely misses what should be the central policy concerns. Big gains in levels of attainment depend on raising average teacher effectiveness and big gains in equity depend on weakening the importance of proximity as an admissions rule and on changing the allocation of effective teachers across schools.  None of these would be strongly or directly affected by for-profit schools. However, there are certainly merits in piloting policies that link school’s revenue per student to the progress of that student, and incentivising cost reductions through keeping the surplus in the school.

Categories: Uncategorized Tags: ,

Teacher performance pay without performance pay schemes

December 18, 2012 1 comment

Author:  Simon Burgess

Teacher performance pay without performance pay schemes

Amid the macroeconomic gloom, the Autumn Statement contained a line about teachers’ pay. The School Teachers’ Review Body recommends “much greater freedom for individual schools to set pay in line with performance”. Consultations and proposals are expected in the near future.

But simply giving schools the freedom to do this may be a rather forlorn hope of anything much happening. It is not clear that there is a substantial demand from schools for performance-related pay (PRP) schemes that has only been thwarted by bureaucratic restrictions. It is hard to see high-powered, tough-minded PRP schemes being introduced by more than a handful of schools, not least because we have not seen large scale deviations from national pay bargaining in academies in England despite their new freedoms to do so.

If that path seems unpromising, there are other ways of facilitating a greater reflection of performance in pay, discussed shortly. But first – is PRP for teachers a good idea in the first place? Does it raise pupil attainment? What are the ‘side effects’?

This is a question that economists have produced a good deal of research on. And to summarise a lot of diverse work briefly, the international evidence is mixed. Those on both sides of the argument can point to high quality studies by leading researchers that find substantial positive effects, or no effects. In both cases, interestingly, there appeared to be little evidence of gaming or other unwanted effects of the incentives.

There is little evidence specifically for England. Our own research found a substantial positive effect of the introduction of a PRP scheme, but given the varied results found elsewhere it would seem unwise to place too much weight on this one study. The underlying performance pay scheme was poorly designed but nevertheless had a positive effect on the progress of pupils taught by eligible teachers relative to ineligible ones.

And design is key. There are many reasons why a simple high-powered incentive pay scheme might be detrimental to pupil progress, which we have discussed here and here. These include the fact that teachers have multiple tasks to do, the problems of measuring the outcomes of some of those tasks, the complex mixture of team and individual contributions, and the potential impacts on implicit motivation. The overall message is that incentives work, but schemes have to be very carefully designed to achieve what the schemes’ proponents truly intend.

There is another way to facilitate a closer link between pay and performance that does not require any school to introduce a performance pay scheme.

Published performance information in a labour market can change the way that the market rewards that performance. The critical features are first that the organisation’s own output depends in an important way on this performance characteristic of an individual; second that the organisation has some discretion in the pay offers it can make to new hires; and thirdly that the performance information is public – is available and verifiable outside the current employer. In this case, the pay structure of the market will reflect the performance rankings: high-performing individuals will be paid more.

In teaching, the first two of these three conditions are met: teacher quality matters hugely for schools, and schools have some discretion over pay. Now, suppose we had a simple, useful and universal measure of each teacher’s performance in raising the attainment of her pupils (obviously we don’t at the moment; I come back to this below), and that this was published nationally, primarily for the attention of Headteachers. The idea is that Headteachers trying to improve the attainment of their pupils would be on the look-out for high performing teachers when they had a vacancy to fill. Armed with this performance information, they might try offering a higher wage (or something else – it doesn’t have to be money) to tempt them to join their own school. Equally, the teacher’s current school may respond by raising the offer there.  Over time, this process will tend to raise the relative pay of high-performing teachers relative to low-performing ones, whom no-one is trying to bid for.

This idea should not be a strange one. A number of professions have open measures of performance. Just today it is reported that performance measures for more surgeons will be made public in the summer of 2013; this is already true for heart surgeons.

It is well-known that PRP does two things: it motivates and it attracts. The outcome for pay described here will tend to make teaching more attractive to people who are excellent teachers and less attractive to those who aren’t.

There are a number of problems with this idea, though perhaps less than might appear at first glance.  First, it could be argued that a performance measure derived from teaching in one school is not relevant to teaching in another school. Obviously each child and each school is unique, but it seems very unlikely that there is no commonality of context between one school and the next. Observation suggests this: teachers moving from one school to another are not counted as having zero experience, and Headteachers are often appointed from outside a school.

Second, there might be a fear that the teacher labour market would become chaotic, with everyone churning around from school to school in search of a quick gain. We have to recognise that there is substantial turnover of teachers now < http://www.bristol.ac.uk/cmpo/publications/papers/2012/wp294.pdf >. But the main point is that it does not require much actual movement to make the market work. Schools can make counter offers to try to retain their star teachers and the end result is the same – higher salaries for high-performing teachers.

Third, any measure would be noisy, partial and imperfect. Of course, all such measures are. Whether a measure is perfect is not really the question, the question is how noisy and imperfect is it, and whether it contains enough information to be useful. One advantage in this case is that the consumers of these performance indicators are the people best able to judge their usefulness and their shortcomings: Headteachers. If such metrics are not useful, Headteachers will simply ignore them; there would be no compulsion to use them.  Even in labour markets with some of the most detailed and finely measured performance indicators (for example, football or baseball) there are many moves between employers that do not work out. It is worth re-emphasising that these performance measures are bound to be imperfect and incomplete, but broad measures of performance may nevertheless be very useful.

There are useful parallels to be drawn from another profession: academics. For academics, the combination of very detailed and public performance information and a context where research performance matters a great deal to universities seems to have had a substantial effect on academics’ pay.

The Research Assessment Exercise (RAE) and more recently the Research Excellence Framework (REF) have made a strong research performance very important to a university’s standing and its income. But the critical factor for academics is that an individual’s research performance is public knowledge, through very detailed recording of the impact of their research papers. Departments and universities aiming to improve their ranking seek out star researchers and attempt to bid them away with higher salaries (plus other things such as research facilities). These offers may well be matched by their current employer, but the end result is that salaries now seem to be much more closely correlated with research productivity than before the RAE/REF (I say “seem” as there does not appear to be any evidence on this, so this is casual empiricism). This is a lot of what drives many young researchers to put in very long work hours: having a paper published in a top scientific journal early in a career has a substantial lifetime payoff even in a world with few or low-powered incentive schemes. If you check out academics’ websites you will invariably see their academic output prominently displayed.

Again, an important feature is that these indices of research output are largely consumed by other academics who are aware of their strengths and weaknesses. So although they are far from perfect, they are used by precisely the people best placed to calibrate their usefulness appropriately.

If we are to go down a path of tying teacher pay more closely to performance, and yet respect the rights of increasingly autonomous schools to determine their own pay systems, then this might be an option to consider.  The challenge is to devise a measure that is simple, useful and universal. It would measure the progress made by the pupils that teachers taught, it would have to deal with normal variations in performance by averaging over a number of classes and a few years, and be on a common metric.  This is not straightforward, but if it gave rise to a robust broad measure of performance it could form a part of performance pay for teachers, and performance management more broadly. It could also have substantial effects on the pay of high-performing teachers.

Who fails wins? The impact of failing an Ofsted Inspection

March 27, 2012 2 comments

Rebecca Allen and Simon Burgess

What is the best way to deal with under-performing schools? This is a key policy concern for an education system. There clearly has to be a mechanism for identifying such schools. But what should then be done with schools which are highlighted as failing their pupils? There are important trade-offs to be considered: rapid intervention may be an over-reaction to a freak year of poor performance, but a more measured approach may condemn many cohorts of students to under-achieve.

This is the issue that Ofsted tackles. Its inspection system identifies failing schools and supervises their recovery. How effective is this? Is it even positive, or does labelling a school as failing push it to ever lower outcomes for its students?

It’s not clear what to expect. Ofsted inspections are often dreaded, and a fail judgement seen as being disastrous. It has been argued it triggers a ‘spiral of decline’, with teachers and pupils deserting the school, leading to further falls in performance. But it might also be a fresh start, with renewed focus on teaching and learning, leading to an improvement in exam scores. Equally, we might expect nothing much to happen: after all, the policy ‘treatment’ for those schools given a Notice to Improve is very light touch. It is neither strongly supportive (typically no or few extra resources) nor strongly punitive or directive (schools face no sanctions nor restrictions on their actions). Schools are instructed to focus intensively on pupil performance, and are told to expect a further inspection within a year. In addition – and possibly the most important factor – the judgement that the school is failing is public one, usually widely reported in the local press.

Our research shows that the Ofsted inspection system works. Schools that just failed their Ofsted significantly improved their performance over the next few years, relative to schools that just passed. The impact is statistically significant and sizeable. In terms of the internationally comparable metric of effect sizes, our main results suggest an improvement of around 10% of a standard deviation of pupil scores. This is a big effect, with a magnitude similar to a number of large-scale education interventions. Translated into an individual pupil’s GCSE grades, this amounts to a one grade improvement (for example, B to A) in one or two GCSEs. From the school’s perspective, the gain is an extra five percentage points in the proportion of pupils gaining five or more GCSEs at grades A*-C.

Our findings suggest that the turn-around arises from proper improvements in teaching and learning, not gaming to boost exam performance through switching to easier courses. First, the impact is significantly higher in the second year post visit than the first, and remains level into the third and fourth year after the inspection. So it is not simply a quick fix to satisfy the inspectors when they return twelve months later. Second, we find a stronger effect on the school’s average GCSE score than on the headline measure of the percentage of students gaining at least 5 good passes; if the schools’ responses were aimed at cosmetic improvement, we would expect the reverse. We also find similar positive effects on maths results and on English results.

It could be argued that these results are implausibly large given that the ‘treatment’ is so light touch and schools are given no new resources to improve their performance. The instruction to the school to improve its performance may empower headteachers and governors to take a tougher and more proactive line about school and teacher performance. This may not be a minor channel for improvement. Behavioural economics has provided a good deal of evidence on the importance of norms: the school management learning that what they might have considered satisfactory performance is unacceptable may have a major effect. The second part of the treatment derives from the fact that the judgement is a public statement and so provides a degree of public shame for the school leadership. Ofsted fail judgements are widely reported in local press and this is usually not treated as a trivial or ignorable announcement about the school. It seems plausible that this too will be a major spur to action for the school.

Where do we go from here? Our results suggest Ofsted’s identification of just-failing schools and the use of Notice to Improve measures is an effective policy, triggering the turn-around of these schools. We need to be clear that our research does not address the question of what to do about schools that comprehensively fail their Ofsted inspection. Possibly this light-touch approach can be extended. Since leaving the Headship of Mossbourne school to become the new Director of Ofsted, Sir Michael Wilshaw has argued that schools just above the fail grade should also be tackled: that ‘satisfactory’ performance is in fact unsatisfactory. Such interventions in ‘coasting’ or ‘just-ok’ schools are very likely to be of the same form as Notice to Improve. Our results suggest that this is potentially a fruitful development with some hope of significant returns.

This research is available on the CMPO website and the IoE website.