Britain’s youth are better educated than any previous generation and desperate to work. Young people all over Europe have borne the brunt of this recession, not only in their job prospects but their wages, with young people’s wages falling to levels last seen in 1998 in the UK. The widespread problems of youth employment stem primarily from a lack of jobs but in the UK we are enjoying a mini jobs boom with employment up by over 1 million in the last two years. Yet youth remain at the margins of the labour market, getting just 12% of that increase in employment, despite them making up 40% of the unemployed. With unemployment generally falling, the critical position of young people is perhaps one of the most underappreciated factors of our labour market today. Here then the problem is of policy failure, a system that is exceptionally badly set up to meet their needs, rather than a jobs shortage or an unwillingness to work. With scarring effects on their future jobs prospects and earnings, which are add up costs to the Exchequer years into the future, this policy failure is costing us all dearly.
Not all that the government has done has failed though, for instance the raising of the participation age (RPA) to 17 has led to a substantial increase in 17 year olds staying in school. There has also been a moderate increase in apprenticeships among the young. In addition, the Work Programme has had some success at getting long-term unemployed youth into work. But the current system around the school-to-work transition has three deep seated flaws. Firstly, our system prioritises staying in education, until 18 from next year, and then switching entirely to emphasising an exclusive focus on job search, with tight restrictions on combining education and training with on-going efforts to find work. The number of young people leaving school lacking qualifications is falling but poor educational attainment and a lack of good quality vocational skills among those who don’t go to university is one of the long-standing problems, closely linked to our failure to build a better quality labour market. The Wolf Review showed that five out of ten young people reach 18 without good English and Maths, and our priority must be to put that right. But the social security system also needs to play its part for those who fall through the system. For too long, we’ve tolerated a situation where a system designed for adults actively dissuades young people who don’t have the skills they need for work from addressing that gap at the start of their careers.
Secondly, our system leaves young people lacking any work experience, except for apprentices. Young people need guaranteed work, which helps gets them back into the labour market, with a CV and work experience to bring to future employers. Any set of reforms seeking to address youth unemployment needs to support work experience, either combined with formal training as a Traineeship or through a work experience programme.
Thirdly, no government agency has an extensive reach into or engagement with employers. The governments hiring subsidy to employers was an outstanding policy failure with almost no take up. This was essentially because there was no agency responsible or capable of marketing the programme to employers and helping with addressing the bureaucracy. This was the role of Job Centre Plus in the past but now it is only concerned with monitoring and supporting job search by claimants; recruitment has gone on-line and employer engagement disappeared. Work Programme providers and other bespoke private organisations are building such links but they are designed for single purpose functions and are not open to the government to engage employers about any new policy drive.
The media focus of the proposals outlined by Ed Miliband last week was on the means testing of unemployment benefits, but at its heart the proposals are for a phasing out of continuing education, training and work experience, with required and supported job search between the ages of 18 and 21. It can be seen as combining the old Educational Maintenance Allowance, which only required education participation, with benefits requiring job search. At younger ages those with poor qualifications combine study with workplace based training and job search. At older ages or for those with decent qualifications the focus switches to work experience and job search, whilst for those aged over 21 the focus is again exclusively on effective job search until long-term unemployment becomes a risk, when the Job Guarantee will kick in.
Thus these proposals have two important elements for improving the current model. A phased move away from education to focus exclusively on job search: for those with poor qualifications, little training and no work experience, combining efforts to redress these shortcomings, whilst maintaining job search. Plus a central role for work experience. The third element remains to be clearly addressed, that is how to gain employer engagement with this new model. For me this should be led by local partnerships formed from schools, FE colleges, local authorities and employers who should oversee the tracking and engagement of young people at risk of becoming NEET and engaging employers about apprenticeships, traineeships and work experience and, of course, hiring young people.
The next few weeks might be a horrible time of year if you are 15 or 16. There are some big decisions coming up. One the one hand: the final exams for the GCSE courses, completing two years of work leading up to this moment. There is a lot of studying still to do, notes to be read, exercises to be worked through, understanding to be really nailed down. Final revision.
But on the other hand: the World Cup. In Brazil. England qualified, and while no-one thinks of England as favourites … who knows? Who would want to miss watching Gerrard and the team confounding the pundits and cruising into the semis?
What to do? This is a classic question of time preference: jam today (watching the game) versus jam tomorrow (getting the grades and higher lifetime income). What is the trade-off between grades and goals?
Our research < http://cmpo.wordpress.com/2011/12/06/a-report-of-two-halves/ > can help. We have studied < http://www.bristol.ac.uk/cmpo/publications/papers/2011/wp276.pdf > the decisions of about 3.5m students facing this dilemma in previous summers. We compared the GCSE performance of as-good-as-identical students in years with World Cups (or the European Championship) and years with no exam-time distractions.
On average, grades were slightly lower in World Cup years. We interpreted this as some students taking some time out from studying to keep an eye on the tournament. While there are other possible explanations, our statistical techniques rule out more or less everything else.
That’s on average. Some groups of students saw sizeable declines in their grades. Again, the interpretation of this has to be that they prioritised the tournament and seriously cut down on study time.
How much does this matter? It depends on how close to the key borderline the student’s performance is likely to be. Achieving at least 5 good passes (C grade and above) including English and maths is widely regarded as a necessary minimum for further education or getting a good job. For students who are near this borderline, a grade or so either way matters a lot.
Missing out on 5 good GCSE grades can be very costly. Estimates suggest an average total lifetime cost of around £30,000. This seems a very hefty price to pay for watching some football.
The moral of all this research and numbers: if you are likely to be close to the 5 Cs borderline, stick with the studies, let others suffer the pain of watching England, and get the grades. In the future, you will have earned the money – and the right – to sit back and fully enjoy World Cups.
The role of grammar schools is still a hotly contested topic in education policy in England. We contribute to this debate by showing that earnings inequality is higher under a selective system in which pupils are allocated to secondary schools based on their performance in tests at age 11. While selective systems have declined since their heyday in the mid-1960s, a number of areas retain a selective system and some believe <http://blogs.telegraph.co.uk/news/tobyyoung/100273771/lets-make-every-school-a-grammar-school/ > that this system should again be expanded.
In our recent paper, we moved away from typical questions around grammar schools such as whether access to them is fair (it isn’t) and what the impact of grammar schools is for the marginal student (debatable), to ask about the longer term impacts of these type of systems on earnings inequality.
Using a nationally representative panel data source, Understanding Society, we considered the adult earnings distributions of over 2500 individuals born between 1961 and 1983, comparing those who grew up in an area operating a selective schooling system to those who grew up in very similar areas operating a comprehensive schooling system.
We ensure that the areas we are comparing are very similar by matching areas that are comprehensive to selective areas based on the average hourly wage, unemployment rate and proportion of private schools in both areas. The rich data source also allows us to control for things that may be driving the choice of area and the later earnings distributions, such as parental education and occupation when the individual was 14, gender, age, ethnicity and current area of residence.
We therefore compare the adult earnings of people who have very similar characteristics, live as adults in very similar areas and grew up in very similar areas: the main difference being that one area operated a selective system and the other a comprehensive system.
When we consider these two groups then, we see that earnings inequality is greater for those who grew up in areas operating a selective system compared to those who grew up in comprehensive areas. Comparing individuals of similar characteristics, the variance of earnings (2009-2012) for those who grew up in selective areas is £29.22 compared to £23.10 in non-selective areas. Put another way, the difference in pay between those at the 90th percentile of the wage distribution and those at the 10th percentile for those who grew up in a selective system is £13.14 an hour compared to £10.93 an hour in comprehensive systems.
On a personal level, if you grow up in a selective system and end up with earnings at the 90th percentile, you earn £1.31 more an hour (statistically significant) than the similar individual who grew up in a comprehensive system. At the other end of the scale, if you grow up in a selective system and don’t do so well – earning at the 10th percentile, you earn 90p less an hour (statistically significant) than the similar individual who grew up in a comprehensive system.
We can also compare the 90-10 wage gap between selective and non-selective areas to the overall 90-10 wage gap in the sample. As noted, in selective areas the 90-10 wage gap is £2.21 an hour higher than in comprehensive areas. This accounts for 18% of the overall 90-10 wage gap in our sample. So selective systems account for a large proportion of inequality in earnings. The message is clear. Grammar systems create winners and losers.
There are also interesting differences by gender. If we look separately at males and females, we see that males in selective systems at the top of the earnings distribution do significantly better than their non-selective counterparts (£2.25 an hour) while there is no difference for those at the bottom of the earnings distribution.
For females, the picture is the opposite. Females growing up in selective systems who do well look very similar to successful females from non-selective systems but those who do badly earn significantly less (87p an hour) than their comprehensive system counterparts. We think this could be because males were outperforming girls at school for the cohorts we consider and so more males attended grammars and more females attended secondary moderns within selective systems, although we cannot observe this directly.
What lies behind these differences? Inequality in earnings comes from inequality in qualifications and these in turn might derive from differences in peer effects and teacher effectiveness between the systems. We speculate that in the 1970s and 1980s more able teachers might have been more effectively sorted in a selective system into schools with high attaining pupils. The evidence on peer effects in the UK is mixed but the evidence on teacher effectiveness points to this as a possible key mechanism.
Whatever might be driving this phenomenon, our research shows that inequality is increased by selective schooling systems. If this is combined with evidence < http://www.bris.ac.uk/cmpo/publications/papers/2006/abstract150.html> that sorting within selective systems is actually more about where you are from rather than your ability, then selective systems may not be the drivers of social mobility that some claim. The pros and cons of a system which creates greater inequality will doubtless continue to be passionately debated. What we cannot ignore is that there are losers as well as winners in this story.
Should the government ban junk food near schools?
Last Friday, Jamie Oliver called for a crackdown on the selling of junk food near schools, arguing that it is completely at odds with the government’s investments to tackle childhood obesity. But is there really a causal link between fast food outlets near schools and childhood obesity?
Currie et al. (2010) directly investigate this. As they argue, the fact that fast food restaurants and obesity have both increased over time does not proof such a link. To explore this relationship, they use the exact geographic location of fast food restaurants in California, linked to data on 3 million school children, to explore whether schools’ proximity to a fast food restaurant affects pupils’ obesity rates. They show that fast food restaurants near schools significantly increase childhood obesity. More specifically, having a fast food restaurant within 0.1 mile of a school increases the probability of obesity by 1.7 percentage points (or 5.2%).
To explore the sensitivity of these analyses, they also study the effect of other (non-fast food) restaurants, but find no effects of these outlets on obesity. Furthermore, they investigate whether future fast-food restaurants are associated with today’s obesity. If it is, it would suggest that fast food restaurants may simply locate in areas where obesity is increasing independent of the restaurants. Their results show that only current locations matter.
These findings therefore support Jamie Oliver’s concern, suggesting the government should consider introducing policies to restrict fast food restaurants from opening near schools. In fact, some local authorities have already taken this approach. For example, the London borough of Waltham Forest will not give planning permission to new hot food takeaways if they are within 400 metres from a school, youth facility or park.
Jamie Oliver has had much success in convincing the government to improve the nutritional contents of school lunches. Recent research has shown that the new nutritional guidelines have not only improved the quality of school food (School Food Review; SFR), it also improved children’s exam results and reduced their absences (Belot and James, 2010). In fact, due to the benefits of school meals, Professor Terence Stephenson, chair of the Academy of Medical Royal Colleges, argues in Sunday’s Observer that academies and free schools may actually be damaging children’s health, as they are allowed to opt out of these nutritional guidelines.
Despite the benefits of school meals, their take-up is currently very low at around 43%, reduced from about 70% in the 1970s. In a recent study, I show that the main reason for this rapid decline was the introduction of two Acts of Parliament in 1980 and 1988, which increased the price of school meals, leading to a large proportion of pupils shifting to packed lunches. With school meals currently substantially healthier than the average packed lunch (SFR), the government should consider ways of increasing its take-up. One approach they have taken is to introduce free meals for all primary school pupils in England in their first three years. As the increased price of meals in the 1980s was the main driver for the drop in take-up, offering free meals to all children may again lead to an increase in their consumption. In addition, if alternative outlets, such as fast food restaurants, are simply not available near schools, pupils will not get tempted to swop their healthy school meal for some unhealthy fish and chips.
Ron Johnston, David Manley and Nabil Khattab.
At their party’s 2014 spring conference in York, eight northern Liberal Democrat MPs presented a report entitled Grim up North? Rebalancing the British Economy, using it as the basis for an argument that the forthcoming budget should pay greater attention to investment that would narrow the north:south divide. Using a range of data, they showed that ‘The South has, in every single category of economic affairs spending, been cut by less than the English average’ – with the obvious consequence that the North has suffered most, and yet it has the biggest problems.
Their argument was taken up by the media. The Guardian, for example, highlighted what the MPs identified as ‘fundamental unfairness’ in the coalition government’s treatment of the North – which was making it harder for the party to win seats there. The Independent backed up the story with a small amount of data indicating a north:south divide in both unemployment rates and house prices. A few weeks earlier, it had published an article by a leading economist and former member of the Bank of England Monetary Policy Committee, Danny Blanchflower, entitled The North is still not feeling this recovery, in which he said that ‘The South is seeing recovery and the rest of the country is being left behind’ – an argument based largely on house price changes.
These contributions – and many others like them – raise two important questions:
- Is life grimmer ‘up North’ than elsewhere; and
- Has it got even grimmer since the recession bit in 2008?
To address these questions, we use labour market data from the quarterly ONS Labour Force Survey; the data analysed are for the first quarter of each year between 20-02 and 2013 inclusive. These are available for both the region in which the c.40,000 respondents each quarter live and (for some of the data) that in which they work. Where both are available, we use the latter. (In the data for place of work, Central London is separated from Inner London; it is not in the region of residence data.)
Rather than just use a single indicator of labour market health, we look at the following range:
- The percentage of the workforce who are unemployed, according to ILO definitions – this is one of the commonest measures of local economic health;
- The percentage of the unemployed who have been so for three months or more, often taken as a clear indicator of recession;
- The percentage of those in employment who are working part-time – following the arguments of some of the coalition government’s political and other critics that many of the claimed ‘new jobs’ being created, especially in the private sector, are part-time, low-quality jobs;
- Following that argument, the percentage of those working part-time are doing so because they wanted but could not find a full-time job;
- The percentage of people who are in jobs for which they are over-qualified, using conventional measures of that situation – as the labour market gets tighter, so more people (whether working full- or part-time) may find it necessary to take such jobs because nothing else is available;
- The percentage of those aged 45-64 who have left the labour force, and are no longer either in work or seeking it (what some call discouraged workers) – given cuts in benefits and who might be entitled to them since the coalition government took power in 2010, fewer people in those older adult age-groups may have found leaving the workforce a viable option post-2008; and
- The median gross hourly income of those in work.
Data for all of these are presented in the tables below, for two groups of years: pre-recession (2002-2008) and recession (2009-2013). In all of the tables, we separate out: regions in the North of England; regions in the Midlands and East of England, plus the Southwest; London and the Southeast; Wales and Scotland. For each indicator, we give the regional average for each of the two periods, plus the percentage change between the two. And in each column we highlight in bold the figures that are above the national average – given at the foot of each table.
If the conventional wisdom regarding a north:south divide is valid, then the figures in bold should be concentrated in two parts of the table – the first eight rows, covering regions in the North of England; and the last four rows, covering Wales, Scotland and Northern Ireland. Regions elsewhere – notably in London and the Southeast – should have few figures in bold, since they are supposed to have the more buoyant economies, and ‘conventional wisdom’ suggested that they were less affected by the recession.
Unemployment (Table 1)
The first set of three columns in Table 1 does not show a pattern of unemployment that conforms to expectations. Whereas six of the eight northern regions had unemployment rates exceeding the national average of 4.9 per cent in the pre-recession period, with a peak of 6.5 per cent in Tyne and Wear, both Inner and Outer London also had rates greater than the national average whereas three of the non-English regions did not. Indeed, the unemployment rate in this period of relative prosperity was higher in Inner London than anywhere else in the country. Before 2009 the divide was between London and the North, on the one hand, and the rest of the UK on the other (with the West Midlands Metropolitan County and Strathclyde being the main outliers). In the subsequent period – 2009-2013 – the pattern was very similar: the unemployment blackspots were now more clearly in the northern metropolitan counties but London was not far behind, and still with rates above the national average.
But the third column suggests that it did get grimmer up north in relative terms. Ten of the twenty regions experienced a percentage growth in unemployment rates above the national figure of 57.1 per cent – and the two London regions were not among them. London and the Southeast suffered less from job losses in the recession than many other regions it seemed – but so did Merseyside, the non-metropolitan northern regions and Scotland.
Long-term unemployment (Table 1)
London also experienced less growth than the average – 28 per cent – in the percentage of the unemployed who had been out-of-work for three months or more once the recession set in, although it was relatively high there in the pre-recession years. In many of the northern regions, plus Wales, Northern Ireland and Strathclyde, over 60 per cent of the unemployed post-2008 had been so for more than three months; in relative terms it didn’t get much worse there once the recession set in, but other parts of the country were catching up.
Part-time employment (Table 2)
London stands out (especially Central and Inner London) in these data on the geography of part-time working as having much lower percentages prior to the recession. The percentage working part-time nationally increased only slightly (by 6.6 per cent) during the subsequent recession years – and London had by far the highest rates of increase. If the recession forced more people into part-time work, therefore, this characterised London much more than areas further north; there, increased unemployment was the norm.
Working part-time out of necessity rather than choice (Table 2)
The LFS surveys ask those working part-time if they are doing so out of choice (perhaps because they are students or carers) or out of necessity: the latter wanted full-time work, but couldn’t find any. Just under 10 per cent gave that answer pre-recession but the percentage almost doubled after 2008. As with the geography of unemployment, the pattern shown by these data is not a simple north:south divide: both before the recession and after it set in there was a split between the South, on the one hand, and – on the other – the North, plus London, plus the four regions outside England. Exacerbation of the problem was not concentrated in the latter group of regions, however, but in those where relatively few part-timers were so employed out of necessity before the slump, notably in the Midlands and the Southeast (as well as Greater Manchester). What was a regional problem became more of a national one in recession conditions.
Over-qualification (Tables 3-4)
In a buyers’ labour market workers are more likely to feel constrained to take positions for which they are over-qualified than when the demand for labour – especially skilled labour – outstrips supply. This suggests that the percentage of those in work who held jobs for which they were over-qualified would: (a) be concentrated in those regions with higher levels of unemployment and under-employment; and (b) increase in number most in those where unemployment also increased. The data in Table 3 are consistent with that argument to a considerable extent.
For full-time workers, with a small number of exceptions (notably West Yorkshire Metropolitan County), holding a job for which you were over-qualified was a characteristic of over one-quarter of all employees in both periods in both the northern regions and those outwith England; the inter-regional differences were not too substantial, however, with only one percentage above 30 pre-2009 and none below 20. Although many of the northern regions with high percentages pre-2009 experienced an above-average increase in their percentages over-qualified, there was also some levelling-off of the regional differences, with large increases in London and the East of England. Among part-timers, on the other hand, most regions in the North of England experienced above-average growth in over-qualified workers – although the largest, from a relatively small base, was in Outer London. Among part-timers who wanted full-time work but couldn’t find it, there was an increase of nearly one-third between the two periods – with much of the increase being in London and the Southeast (Table 4).
Discouraged workers aged 45-64 (Table 4)
This is the one indicator where we anticipated a fall in the percentage involved – and it occurred: some 5 per cent less in that age group had opted out of the labour market post-2008 than was the case pre-2009 (Table 3). And more so than many of the other indicators analysed here, there is a clear ‘traditional’ north:south divide. Older working-age adults were more likely to have opted out of the workforce in the northern regions (many possibly because of employment-related health conditions) but less so in the recession years than previously. Were they forced back into the labour market by benefit cuts, or….?
Income (Table 5)
Although a substantial number of LFS respondents decline to give information on their income, enough do for the geography to be clearly delineated. Overall, in both periods, the ‘grim up north’ argument is clearly sustained. Median incomes in Central London were almost twice those in the northern regions, with that elsewhere in London and gross hourly incomes were also significantly higher for both full- and part-time employees in the Southeast than elsewhere. But the divide was not exacerbated in relative terms by the recession: median incomes grew by less than the average in most of London as well as in the Southeast for both groups of workers and employees in the North and Midlands benefited slightly more.
In conclusion: grimness is not just a northern problem?
There is no simple story to be told about the geography of the recession that set in after the credit crunch began in 2008, therefore. On most labour market indicators it is grim up north: it was before the recession and it remained so afterwards. But not all parts of the ‘North’ experienced worse conditions than the ‘South’ on all indicators either before or during the recession; the geography was more nuanced than that. And it certainly didn’t universally get even grimmer up north. On some of our indicators, instead of the north:south divide getting wider it got narrower – with even London’s labour markets suffering badly relative to the national situation. And so, with no simple pattern it is not sensible to go for simplistic policies – favour the North: parts of the North are indeed in trouble – but they aren’t necessarily alone, and some of the suffering is being shared across the country more widely than sometimes appreciated.
Table 1. Percentages Unemployed (of those economically active) and Unemployed for three months or more (of those unemployed)
Table 2. Percentages Working Part-Time and Part-Time out of Necessity (of those working part-time)
Table 3. Percentage in Jobs for which they are Over-Qualified by Region of Workplace
Table 4. Percentages of the Unemployed who were Unemployed for Three Months or more and of Persons aged 45-64 who were Not Economically Active (Discouraged Workers)
Table 5. Gross Median Hourly Income (£) by Region of Workplace
Author: Simon Burgess
The New School Accountability Regime in England: Fairness, Incentives and Aggregation
The long-standing accountability system in England is in the throes of a major reform, with the complete strategy to be announced in the next few weeks. We already know the broad shape of this from the government’s response to the Spring 2013 consultation, and some work commissioned from us by the Department for Education, just published and discussed below. The proposals for dealing with pupil progress are an improvement on current practice and, within the parameters set by government, are satisfactory. But the way that individual pupil progress is aggregated to a school progress measure is more problematic. This blog does not often consider the merits of linear versus nonlinear aggregation, but here goes …
Schools in England now have a good deal of operational freedom in exactly how they go about educating the students in their care. The quid pro quo for this autonomy is a strong system of accountability: if there is not going to be tight control over day to day practice, then there needs to be scrutiny of the outcome. So schools are held to account in terms of the results that they help their students achieve.
The two central components are new measures of pupils’ attainment and progress. These data inform both market-based and government-initiated accountability mechanisms. The former is driven by parental choices about which schools to apply to. The latter is primarily focussed around the lower end of the performance spectrum and embodied in the floor targets – schools falling below these triggers some form of intervention.
Dave Thomson at FFT and I were asked by the Department for Education (DfE) to help develop the progress measure and the accompanying floor target, and our report is now published. Two requirements were set for the measure, along with an encouragement to explore a variety of statistical techniques to find the best fit. It turns out that the simplest method of all is barely any worse in prediction than much more sophisticated ones (see the Technical Annex) so that is what we proposed. The focus in this post is on the requirements and on the implications for setting the floor.
The primary requirement from the DfE for the national pupil progress line was that it be fair to all pupils. ‘Fair’ in the sense that each pupil, whatever their prior attainment, should have the same statistical chance of beating the average. This is obviously a good thing and indeed might sound like a fairly minimal characteristic, but it is not one satisfied by the current ‘expected progress’ measure. We achieved this: each pupil on whatever level of prior attainment an expected progress measure equal to the national average. And so, by definition, each pupil has an expected deviation from that of zero.
The second requirement was that the expected progress measure be based only on prior attainment, meaning that there is no differentiation by gender for example, or special needs or poverty status. This is not because the DfE believe that these do not affect a pupil’s progress, it was explicitly agreed that they are important. Rather, the aim was for a simple and clear progress measure – starting from a KS2 mark of X you should expect to score Y GCSE points – and there is certainly a case to be made that this expectation should be the same for all, and there should not be lower expectations for certain groups of pupils. (Partly this is a failure of language: an expectation is both a mathematical construct and almost an aspiration, a belief that someone should achieve something).
So while the proposed progress measure is ‘fair’ within the terms set, and is fair in that it sets the same aspirational target for everyone, it is not fair in that some groups will typically score on average below the expected level (boys, say) and others will typically score above (girls). This is discussed in the report and is very nicely illustrated in the accompanying FFT blog. There are plausible arguments on both sides here, and the case against going back to complex and unstable regression approaches to value added is strong. This unfairness carries over to schools, because schools with very different intakes of these groups will have different chances of reaching expected progress. (Another very important point emphasised in the report and in the FFT blog is that the number of exam entries matters a great deal for pupil performance).
Now we come to the question of how to aggregate up from an individual pupil’s progress to a measure for the school. In many ways, this is the crucial part. It is on schools not individual pupils that the scrutiny and possible interventions will impact. Here the current proposal is more problematic.
Each pupil in the school has an individual expected GCSE score and so an individual difference between that and her actual achievement. This is to be expressed in grades: “Jo Smith scored 3 grades above the expected level”. These are then simply averaged to the school level: “Sunny Vale School was 1.45 grades below the expected level”. Some slightly complicated statistical analysis then characterises this school level as either a significant cause for concern or just acceptable random variation.
It is very clear and straightforward, and that indeed is its chief merit: it is easily comprehensible by parents, Headteachers and Ministers.
But it has two significant drawbacks, both of which can be remedied by aggregating the pupil scores to school level in a slightly different way. First, the variation in achieved scores around expected progress is much greater at low levels of attainment than at high attainment. This can be seen clearly in Figure 1, showing that the variance in progress by KS2 sharply and continuously declines across the range where the bulk of pupils are. Schools have pupils of differing ability, so the effect is less pronounced at school level, but still evident.
The implication of this is that if the trigger for significant deviation from expected performance is set as a fixed number of grades, then low-performing students are much more likely to cross that simply due to random variation than high-performing students are. By extension, schools with substantial intakes of low ability pupils are much more likely to fall below the floor simply through random variation than schools with high ability intakes are. So while our measure achieves what might be called ‘fairness in means’, the current proposed school measure does not achieve ‘fairness in variance’. The DfE’s plan is to deal with this by adjusting the school-level variance (based on its intake) and thereby what counts as a significant difference. This helps, but is likely to be much more opaque than the method we proposed and is likely to be lost in public pronouncements relative to the noise about the school’s simple number of grades below expected.
Fig 1: Standard deviation in Value added scores and number of pupils by mean KS2 fine grade (for details – see the report)
The second problem with the proposal is inherent in simple averaging. Suppose a school is hovering close to the floor target, with a number of pupils projected to be significantly below their progress target. The school is considering action and how to deploy extra resources to lift it above the floor. The key point is this: it needs to boost the average, so raising the performance of any pupil will help. Acting sensibly, it will target the resources to the pupils whose grades it believes are easiest to raise. These may well be the high performers or the mid performers – there is nothing to say it will be the pupils whose performance is the source of the problem, and good reason to think it will not be.
While it is quite appropriate for an overall accountability metric to focus on the average, a floor target ought to be about the low-performing students. The linear aggregation allows a school to ‘mask’ under-performing students with high performing students. Furthermore, the incentive for the school may well be to ignore the low performers and to focus on raising the grades of the others, increasing the polarisation of attainment within the school.
The proposal we made in the report solves both of these problems, the non-constant variance and the potential perverse incentive inherent in the averaging.
We combine the individual pupil progress measures to form a school measure in a slightly different way. When we compare the pupil’s achievement in grades relative to their expected performance, we normalise that difference by the degree of variation specific to that KS2 score. This automatically removes the problem of the different degree of natural variation around low and high performers. We then highlight each pupil as causing concern if s/he falls significantly below the expected level, and now each pupil truly has the same statistical chance of doing this. The school measure is now simply the fraction of its pupils ‘causing concern’. Obviously simply through random chance, some pupils in each school will be in this category, so the floor target for each school will be some positive percentage, perhaps 50%. We set out further details and evaluate various parameter values in the report.
The disadvantage of this approach for the DfE is that the result cannot be expressed in terms of grades, and it is slightly more complicated (again, discussed in the report). This is true, but it cannot be beyond the wit of some eloquent graduate in government to find a way of describing this that would resonate with parents and Headteachers.
At the moment, the difference between the two approaches in terms of which schools are highlighted is small, as we make clear in the report. Small, but largely one way: fewer schools with low ability intakes are highlighted under our proposal.
But there are two reasons to be cautious. First, this may not always be true. And second, the perverse incentives – raising inequality – associated with simple averaging may turn out to be important.
Author: Mike Peacey
Can peer review be improved?
Scientific publishing is under the spotlight at the moment. The long-standing model of scientists submitting their work to a learned journal, for consideration by their peers who ultimately decide whether it is rigorous enough to warrant publish (a process known as peer review), is now hundreds of years old – the first scientific journals date from the 17th Century. Is it time for change? Two major factors are driving the scrutiny which this traditional publication model is facing: the astonishing growth in scientific output in recent years, and technological innovations that make print publishing increasing anachronistic. At the same time, there are concerns that many published research findings may be incorrect. The true extent of this problem is difficult to know with certainty, but pressure on academics to publish (the “publish or perish” culture) may incentivise the publication of novel, eye-catching findings. However, the very nature of these findings (in other words, that they are surprising or unexpected) means that they are more likely to be incorrect – extraordinary claims require extraordinary evidence. Peer review has been criticised because it sometimes fails by allowing such claims to be published, often when it is clear to many scientists that the claims are extremely unlikely to be true. Is peer review as unsuccessful as is sometimes claimed? And how might it be improved? We explored this question recently in a mathematical model of reviewer and author behaviour.
In our model we considered a number of scientists who each, sequentially, obtain private information (e.g., through conducting experiments) about a particular hypothesis. The result of each experiment will never be perfect, but will on average be correct (with more controversial topics providing nosier signals). Once they have completed their experiment, the scientists each write academic papers with the objective of advancing knowledge. These papers are then reviewed by a peer before a decision is made whether or not it is published. When a paper is published, the manuscript begins to partially influence the conclusions that later scientists reach. As a result, the amount of new information transmitted decreases. In other words, authors begin to “herd” on a specific topic. We found that the extent to which this herding occurs (and hence the confidence we can have in a hypothesis being correct) will depend on the particular way in which peer review is conducted. When reviewers are encouraged to be as objective as possible they do not use all the information available to them, and therefore their decision provides other scientists no information about their own private information. When reviewers are allowed a degree of subjectivity when making a decision (i.e., to use their judgement about whether the results are likely to be correct, as well as the more objective characteristics of the paper they’re reviewing), the peer review process transmits more information and this allows science to be self-correcting.
Models such as ours necessarily simplify reality, and typically focus on one aspect of a process to determine how important it is to that process. So the results of our model certainly shouldn’t be taken as definitive; rather, they can help to identify some interesting questions which can then be followed up empirically. For example, reviewers usually have to rate manuscripts on a number of dimensions, such as novelty and likely impact. One question might be whether asking reviewers to explicitly rate the extent to which they believe the results of the study provides useful information. Our results also suggest that opening up other channels through which scientists can make their private information known could also be valuable. These could include post-publication peer review, which is growing in popularity, and prediction markets to capture these signals at an aggregate level. The landscape of scientific publishing is likely to change dramatically over the next few years, as open access, self-archiving, altmetrics and other technology-driven innovations become increasingly common. This provides an opportunity to implement changes to a model of scientific publishing that has otherwise remained essentially unchanged for decades.
Park IU, Peacey MW, Munafò MR. Modelling the effects of subjective and objective decision making in scientific peer review. Nature. 2013. doi: 10.1038/nature12786.
The youngest children in each school cohort are over-represented in referrals to mental health services
Author: Erlend Berg
The youngest children in each school cohort are over-represented in referrals to mental health services
It is known that the children who are the youngest in their class tend to do worse, in several respects, than their classmates. On average, they do less well academically throughout their school careers and are less likely to attend university. They have also been found to be less confident in their academic ability and are more likely to report being bullied or unhappy at school, and they are less likely to participate in both youth and professional sports.
Given this, it is perhaps not surprising that these children are also more likely to have mental health problems: they are more likely to be diagnosed with attention disorders, learning disability and dyslexia.
Still, little is known about the consequences for health service provision, and in particular the extent to which these children are over-represented as users of specialist mental health services. In a paper forthcoming in the Journal of Clinical Psychiatry, Shipra Berg and Erlend Berg investigate whether August-born children, who are the youngest in their class in the English educational system, are over-represented in referrals to specialist Child and Adolescent Mental Health Services. The threshold for referral to these services is relatively high, since minor problems are often dealt with by school health workers or family doctors.
The research method is simple. The cut-off date for school entry in England is 1 September. So a child born in August will be among the youngest in his or her class, while a child born in September will be one of the oldest. The researchers obtained dates of birth for all children referred to mental health services in three boroughs of West London for a period of four years, and compared the frequency of birth months of the referred children to the birth-month frequencies in the population.
For example, children born in September represent 8.6% of the population but only 8.0% of referrals. Hence they are 7.3% less likely to be referred to mental health services than the average child.
For August-born children the situation is reversed. Of all children referred to mental health services, 9.4% were born in August. But only 8.6% of the population of children in the relevant age group are born in August. That means that August-born children are 9.1% more likely to be referred than the average child, and 17.8% more likely to be referred than their September-born classmates. These figures are statistically significant, meaning they are very unlikely to be caused by random fluctuations in the data.
When boys and girls are examined separately, the main findings are confirmed for both sexes.
Children in the UK start school at a particularly young age, so an age difference of one year is substantial. The September-born child, who starts school around her fifth birthday, has had a 25% longer life experience than the August-born child, who starts school around his fourth birthday. Clearly, a one-year age difference shrinks as a proportion of life experience as the children grow up. One might therefore expect that the negative effect of being the youngest wears off over time. However, the authors find that the main effect holds for children of both primary-school and secondary-school age. This could mean that being the youngest is detrimental even in secondary school, or alternatively that the disadvantage of being the youngest in primary school has lasting consequences.
It is, in principle, possible to defer a school start to the term (there are three terms per year) in which the child turns five. However, this is rarely practised, because the child would still join the same class they would have been in had entry not been deferred. Deferring entry can therefore mean falling behind in academic and social development even before starting school.
It is worth pointing out that a large majority of children born in August are not referred to mental health services. Other factors, including the children’s home environment, are likely to be more important determinants of mental health than month of birth. Still, August-born children, being the youngest – physically, emotionally and intellectually – in their class, may be more vulnerable than their older peers.
Author: Simon Burgess
RCT + NPD = Progress
A lot of research for education policy is focussed on evaluating the effects of a policy that has already been implemented. After all, we can only really learn from policies that have actually been tried. In the realm of UK education policy evaluation, the hot topic at the moment is the use of randomised control trials or RCTs.
In this post I want to emphasise that in schools in England we are in a very strong position to run RCTs because of the existing highly developed data infrastructure. Running RCTs on top of the census data on pupils in the National Pupil Database dramatically improves their effectiveness and their cost-effectiveness. This is both an encouragement to researchers (and funders) to consider this approach, and also another example of how useful the NPD is.
A major part of the impetus for using RCTs has come from the Education Endowment Foundation (EEF). This independent charity was set up with grant money from the Department for Education, and has since raised further charitable funding. Its goal is to discover and promote “what works” in raising the educational attainment of children from disadvantaged backgrounds. I doubt that anywhere else in the world is there a body with over £100m to spend on such a specific – and important – education objective. Another driver has been the Department for Education’s recent Analytical Review, led by Ben Goldacre, which recommended that the Department engage more thoroughly with the use of RCTs in generating evidence for education policy.
It is probably worth briefly reviewing why RCTs are thought to be so helpful in this regard: it’s about estimating a causal effect. There are of course many very interesting research questions other than those involving the evaluation of casual effects. But for policy, causality is key: “when this policy was implemented, what happened as a result?” The problem is that isolating a causal effect is very difficult using observational data, principally because the people exposed to the policy are often selected in some way and it is hard to disentangle their special characteristics from the effect of the policy. The classic example to show this is a training policy: a new training programme is offered, and people sign up; later they are shown to do better than those who did not sign up; is this because of the content of the training programme … or because those signing up evidently had more ambition, drive or determination? If the former, the policy is a good one and should be widened; if the latter, it may have no effect at all, and should be abandoned.
RCTs get around this problem by randomly allocating exposure to the policy, so there can be no such ambiguity. There are other advantages too, but the principal attraction is the identification of causal effects. Of course, as with all techniques, there are problems too.
The availability of the NPD makes RCTs much more viable and valuable. It provides a census of all pupils in all years in all state schools, including data on demographic characteristics, a complete test score history, and a complete history of schools attended and neighbourhoods lived in.
This helps in at least three important ways.
First, it improves the trade-off between cost and statistical power. Statistical power refers to the likelihood of being able to detect a causal effect if one is actually in operation. You want this to be high – undertaking a long-term and expensive trial and missing the key causal effect through bad luck is not a happy outcome. Researchers typically aim for 80% or 90% power. One of the initial decisions in an RCT is how many participants to recruit. The greater the sample size, the greater the statistical power to detect any causal effects. But of course, also, the greater is the cost, and sometimes this can be considerable. These trade-offs can be quite stark. For example, to detect an effect size of at least 0.2 standard deviations at standard significance levels with 80% power we would need a sample of 786 pupils, half of them treated. If for various reasons we were running the intervention at school level, we would need over 24,000 pupils.
This is where the NPD comes in. In an ideal world, we would want to be able to clone every individual in our sample and try the policy out on one and compare progress to their clone. Absent that, we can improve our estimate of the causal effect by getting as close as we can to ‘alike’ subjects. We can use the wealth of background data in the NPD to reduce observable differences and improve the precision of estimate of intervention effect. Exploiting the demographic and attainment data allows us to create observationally equivalent pupils, one of whom is treated and one is a control. This greatly reduces sampling variation and improves the precision of our estimation. This in turn means that the trade-off between cost and power improves. Returning to the previous numerical example, if we have a good set of predictors for (say) GCSE performance, we can reduce the required dataset for a pupil-level intervention from 786 pupils to just 284. Similarly for the school-cohort level intervention, we can cut back the sample from 24,600 pupils and 160 schools to 9,200 pupils and 62 schools. The relevant correlation is between a ‘pre-test’ and the outcome (this might literally be a pre-test, or it can be a prediction from a set of variables).
Second, the NPD is very useful for dealing with attrition. Researchers running RCTs typically face a big problem of participants dropping out of the study, both from the treatment arms and from the control group. Typically this is because the trial becomes too burdensome or inconvenient, rather than on principle because they did sign up in the first instance. This attrition can cause severe statistical problems and can jeopardise the validity of the study.
The NPD is a census and is an administrative dataset, so data on all pupils in all (state) schools are necessarily collected. This obviously includes all national Keystage test scores, GCSEs and A levels. If the target outcome of the RCT is improving test scores, then these data will be available to the researcher for all schools. Technically this means that an ‘intention to treat’ estimator can always be calculated. (obviously, if the school or pupil drops out and forbids the use of linked data then this is ruled out, but as noted above, most dropout is simply due to the burden).
Finally, the whole system of testing from which the NPD harvests data is also helpful. It embodies routine and expected tests so there is less chance of specific tests prompting specific answers. Although a lot about trials in schools cannot be ‘blind’ in the traditional way, these tests are blind. They are also nationally set and remotely marked, all of which adds to the validity of the study. These do not necessarily cover all the outcomes of interest such as wellbeing or health or very specific knowledge, but they do cover the key goal of raising attainment.
In summary, relative to other fields, education researchers have a major head start in running RCTs because of the strength, depth and coverage of the administrative data available.
Author: Paul Gregg
How should long-term unemployment be tackled?
Earlier in the week, George Osbourne announced new government plans for the very long term unemployed. The government flagship welfare to work programme, the Work Programme, lasts for two years and so there has been a question about what happens to those not finding work through it. Currently only 20% of those starting the Work Programme find sustained employment, although many more cycle in and out of employment.
Very long-term unemployment (2+ years) is strongly cyclical, almost disappearing from 1998 to 2009, but has returned with the protracted period of poor economic performance. This cyclicality is a strong indicator that it is not driven by a large group of workshy claimants. Rather the state of the economy leaves a few who unable to get work quickly face ever increasing employer resistance to higher them. Faced with ample choice of newly unemployed these people look like unnecessary risks with outdated skills.
Very long-term unemployment is thus not a new phenomenon and a large range of policies have been tried before and hence we have a very good idea of what does and does not work. The proposals had three elements. The first which got the headlines was that claimants would be made to ‘Work for the Dole’. The effects of requiring people to go into work placements depends a lot on the quality of the work experience offered. Such schemes have three main effects: first, some people leave benefits ahead of the required employment. This is called the deterrent effect and is stronger the more unpleasant and low paid (eg work for the dole) the placement is. Then, whilst on the placement, job search and job entry tend to dip as the person’s time is absorbed by working rather than applying for jobs. Finally, the gaining of work experience raises job search success on completion of the placement. This is stronger for high-quality job placement in terms of the experience gained and being with a regular employer who can give a good reference if the person has worked well.
The net effect of many such programmes, including work for the dole, has often been little or even negative. Australia and New Zealand have all tried and abandoned Work for the Dole policies because they were so ineffectual in getting people into work. The best effects from work experience programmes come where job search is actively required and supported when on a work placement, where the placement is with a regular employer rather than a “make work” scheme and where the placement provider is incentivised to care about the employment outcomes of the unemployed person after the work placement ends. The Future Jobs Fund under the previous labour government, which placed young people into high quality placements and paid a wage, was clearly a success in terms of improving job entry although the government cut it.
This element of the government’s plans has little chance of making a positive difference. However, the other elements maybe more positive. Some, the mix across elements is not clear yet, of the very long-term unemployed will be required to do daily signing. This probably means that the claimant will have to attend a Job Centre Plus office every day and look for and apply for jobs on the suite of computers. This is very similar to the Work Programme but more intense and perhaps with less support for CV writing and presentation etc. This may enhance the frequency of job applications but perhaps not the quality and may prove no more successful than the Work Programme. The third element is to attend a new as yet unspecified programme. As there are few details as yet it is hard to comment on this part.
The overall impression is that the announcement is of a rehashed version of previous rather unsuccessful programmes founded on a belief that the long-term unemployed are workshy rather unfortunates needing intensive help to overcome employer resistance and return to work.