The UK 2024 General Election result is not quite what it seems

This blog is mainly about (social) science, but my interest in and concern about poverty and inequality makes it a bit about politics too. And this is just a brief post to explain the UK Labour Party’s landslide election win on 4th July 2024; what it is, and importantly is not.

On the progressive side of the argument, there are always two analyses of how to more forward. One (the brave new world approach) is that you have to offer something bold, visionary a thorough-going reform of all of our current institutions to capture and energise the broad, latent desire for a better life for most people. The other (the steady as she goes approach) is that you have to move to the centre, not scare the media or business, and propose more or less a continuation of the status quo, but with slightly better intentions, a bit more competence, and a promise that you will use your new-gained power and any windfall to help people disfavoured by the current settlement as and when you can.

On the face of it, Keir Starmer’s 2024 landslide represents a triumph for the second approach. This is particularly so when contrasted with Jeremy Corbyn’s in 2017. 2017 was a brave new world manifesto, and an other-wordly leader. 2024 was a ruthlessly steady-as-she-goes, disciplined message, offering very little actual reform of our institutions, no extra taxation and not much extra expenditure, and ruling out radical reforms like basic income and wealth taxes. It looks like the strategy was a triumph, winning 410 seats in the 650-seat House of Commons. My colleagues here in France are asking me how Starmer did it: how did he crack making the centre-left great again? A long article in yesterday’s Le Parisien was presenting Starmer as a determined genius, who has shown that you will win people’s votes and defuse right-wing populism by moving to the centre ground and being relentlessly sensible and orthodox. And lamenting, of course, that no-one has managed to do this in France, where we face the second round of legislative elections this Sunday, and the prospect of a right-populist government.

The UK result seems, in other words, to argue against the need for – or wisdom of – a radical reform agenda like one we just proposed in Act Now, and for an incremental, more centrist approach.

Not so fast though. The large Labour landslide last night is a very particular product of the way the UK election system works, and of the disarray of the other parties. It does not say anything very general about what people want from the state right now. There are very important ways in which Labour lost last night.

First, the votes. In 2017, Labour got 12.9 million votes. With 99% of the votes counted, in 2024, they have 9.7 million. So, they have lost about 3 million voters. (Even compared to Corbyn’s ‘worst performance since 1935’, namely 2019, Labour have lost about half a million votes). How could Labour possibly have done so well in terms of parliamentary numbers this time with such a low vote? It is simply because the other main party has lost even more: the Conservatives polled 14 million in 2017 and 2019, and only about 7 million last night. This is because they presided over a series of scandals and a lot of instability. If there is one thing people like about conservatives, it is that they conserve effectively, not lurch from crisis to crisis. And Labour also gained a lot of seats in Scotland last night, where circumstances are rather particular: the incumbent party there, the Scottish National Party, had had a series of scandals and changes and shed votes, allowing Labour to move from second place to first in a number of places.

Where have the lost voters all gone, three million from Labour and seven million from the Conservatives? Three or four million have gone to the populist right in the form of the Reform party. A smaller number have gone away from the Labour party to the left, in the form of Greens and independents, who have done well in this election. But the biggest beneficiary of all is ‘Did not vote’. ‘Did not vote’ had a better performance than Labour in this election by some margin: almost 20 million (non) votes to Labour’s 10 million-ish. At the 1945 General Election, 73% of all registered electors voted; last night it was more like 60%.

So, in brief, does Starmer’s victory say anything very generalisable or reproducible about how to advance the progressive cause in the reality of current electoral politics, and how to defuse the populist right? Perhaps not. The populist right has done pretty well in this election, just been kept from having more seats in parliament by the first-past-the-post electoral system.  And, it can hardly be said that Labour has been elected with a great surge of popular support. More like they are beneficiaries of the bizarreness of the electoral system. There is a large segment of the population disaffected with all of the current offerings, which may well be captured by other political offerings in the future, as they were in the Brexit vote (and as has happened here in France this year).

The truth is we still don’t know how many votes Labour could have got with a more ambitious, redistributive, green, Act Now style offering, because the experiment has not been done. Perhaps most critically, the election process seems to have done little to foster public discussion and deliberation about how society should actually be reformed to make things better. This seems like a missed opportunity. Perhaps, now that the election is over with, the democratic process can begin.

Act Now!

UK readers may be wondering why Prime Minister Rishi Sunak has called a snap General Election to be held on July4th 2024, when the law did not require him to do so for another six months, and his governing party was trailing in the opinion polls. I can now reveal the answer.

Act Now!, our book setting out a series of policies to make Britain a better, fairer, healthier place to live, was due to publish in July. Clearly, Rishi could not bear to risk the public seeing what we were proposing. Best get the election out of the way before the content becomes widely known, or you’ll all be wanting it!

I am happy to report that Manchester University Press have worked miracles to accelerate publication.  We will be launching in London on June 27 2024, and copies should be in the shops at least a week earlier (order copies here by the way). The pre-publication strapline ‘a must read ahead of the next General Election’ will still work, just.

The cover of the book 'Act Now' by the Common Sense Policy GroupThe point of this post is to tell you a bit more about Act Now!, how it came about, and who its sinister-sounding author, the Common Sense Policy Group, really is.

At the end of 1930s, it was clear that the institutions of Britain needed a reboot. There had been a series of crises: the Great Depression, financial instability, populism, social unrest, and then war in Europe (sound familiar at all?). It was widely agreed that things were not working, but perhaps less easy to set out practical reforms were that would make things better; especially reforms around which people of different political stripes and traditions could come together.

It is here that the first Beveridge report of November 1942, Social Insurance and Allied Services, comes in. The first Beveridge Report is an unlikely bestseller, much of it being composed of descriptive historical and statistical information. On the other hand, Beveridge had moments of inspiring rhetoric: it is to Beveridge that we owe political nostrums like: ‘[a] revolutionary moment in the world’s history is a time for revolutions, not for patching’; and that only with a ‘comprehensive policy of social progress’ can the ‘five giants’ of Want, Disease, Ignorance, Squalor and Idleness ever be slain. Beveridge laid out in specific detail  what a comprehensive policy of social progress would like for Britain in 1942. Within weeks, the report was apparently the most talked-about topic in the country, and a frequent theme in letters between serving soldiers and their families. Beveridge gave many different people a way of transforming their generalized desire for things to get better into concrete institutional forms; and hence served as a coordination point for the population’s desire for  change.

Beveridge is overwhelmingly associated in people’s minds with the UK Labour government of 1945, which implemented substantial parts of the programme. However, the report influenced all the major political parties, by providing, directly to the people, a concrete plan that politicians could agree with or disagree with, at their peril, but not really ignore. Whoever had won that election would have had to implement much of it. The report was generated outside the processes of any particular political party. William Beveridge was a civil servant and an academic. He briefly became an MP, for the Liberal Party, a couple of years after the report had been published, and then, ironically, lost his seat in the 1945 election to the Labour Party that went on to implement many of his ideas.

It’s grandiose to compared one’s own work to giants like Beveridge. Nonetheless, Beveridge inspired a group of us last year, when we challenged ourselves to ask: what would ‘a comprehensive policy of social progress’ look like for Britain in 2024, rather than 1942? Though our views are politically diverse, we are broadly in agreement that Britain’s current institutions are generating: excessive poverty, inequality and hunger; a decline in the liveability of everyday civic life; needlessly poor health; insufficient sustainability; and a disengagement from the democratic progress. Diagnosis is easy, though: what is the treatment?

We set out to write a short book proposing policy measures that would help, across all the major areas of domestic governance. The author team is made up of 17 academics, politicians, and people from the voluntary sector. Although different people took the lead for different policy areas, this is a collective document, not an edited volume. That’s a challenge of course, since we didn’t all agree either substantively or stylistically (for this reason, and because he did not want his ideas to be smoothed to averageness by collaboration, Beveridge published his report under his sole name, although he had consulted many people in the course of his deliberations). We persevered, and we have  produced a text that has its own voice, not a series of different voices, or the voice of any one of us.

It was an, ahem, interesting process. We were very clear we wanted each section to outline a good policy for a particular area, explain why it is good, and say how it could be brought about; nothing else. Nonetheless, the academics amongst us just couldn’t quite resist spending pages and pages diagnosing the ideological and historical origins of the current ills; and the politicians amongst us could not quite resist pen-portraiting ourselves on factory floors, in hospitals and green fields, sharing the values, shaking the hands and kissing the babies of the electorate. These are the déformations professionelles of our respective professions. But, with a lot of good-natured collaboration and a lot of last-minute hacking and editing, the text went to the publisher at the turn of 2024. The idea was for the book to launch at the houses of parliament in July, and be available over the summer when party manifestoes were being pondered ahead of an expected general election in October 2024 or so. That’s the plan that has just gone out the window.

We still hope the book will be of interest, even though the conversation it was designed to contribute to has been foreshortened. The UK is presumably about to get a Labour government, but their initial manifesto will be a slim, steady-the-ship document. What larger political settlement Labour will work towards in the years ahead is still to be determined. And, the new parliament may include more liberals and more Greens than the current one (we count the deputy leader of the Green party of England and Wales amongst our author team). There is a still a chance for the political conversation to move into new territory after the election, and hence still a chance for Act Now! to be relevant.

I won’t say much about what the Act Now! programme is. I hope you will read the book and see what you think. There are not many jokes but it costs under a tenner. What I will say is that the measures are costed and feasible: we include amongst our team the former chief economist of the Institute for Public Policy Research. Moreover, we have done a lot of opinion polling, and the proposals seem to be popular. In all our public opinion research, we find a tremendous public appetite for quite far-reaching progressive change.  Don’t let the newspapers tell you the electorate is socially conservative and basically committed to the status quo; that characterisation may well describe newspaper owners, but it does not describe the electorate.  People want the hope of a better society in the future than in the past.

This is my first foray into these non-academic waters (teaser, not the last: look out for another volume early in 2025). I am delighted and surprised by the  enthusiasm of the reaction to Act Now! so far. I don’t kid myself that our thinking is that original or our prose that magisterial. I think it just shows that many people share the same appetite that started us off on the journey: to identify a practical progressive vision for our times.

Philip Pettit of Princeton was kind enough to describe it as: ‘a coherent, radical, and feasible manifesto for government. Given the chance, it would ignite enthusiasm, win the young back to politics, and enable people to enjoy security and freedom in their life with one another and with the powers that be. It calls us back to a realistic image of the good society.’ For Will Snell of the Fairness Foundation, this is: ‘a genuinely radical and comprehensive plan to rebuild our society, economy and democracy from the ground up….unusual in that it combines a bold overarching vision with detailed, evidence-based policy proposals, and demonstrates that they are popular with the public. The question now is whether our politicians are prepared to listen.’ Peter Jones of Newcastle University sees ‘an inspiring, imaginative and radical vision for Britain’s future, equal in ambition to the challenges the country faces.’ And David Wilson of Birmingham City University urges you to ‘read it if you want to see what real pragmatic reforms could do and use it to remind yourself that there was a time when our politics wasn’t inert, ineffective and indolent.’

Former archbishop of Canterbury Rowan Williams gifted us a beautiful and uplifting introduction. In his view, ‘the proposals in this book are detailed and pragmatic, set out with careful attention to how they might be implemented and how they might be funded. These chapters are not an idealistic rant demanding some sort of total recalibration of how we live. But they are unmistakeably radical, in the sense that they interrogate what the political establishment of both left and right take for granted, what they think is achievable and acceptable….[They] look towards [a] shift in the imagination – the spirit – of Britain, returning repeatedly to that fundamental challenge of how we sustain a social order that does justice to the most humane, generous and grounded instincts of our communities.’ To have our intent to be heard in this way is a thousand feet above what Matthew Johnson and I could have hoped for, on an unsuccessful walk to Benwell Nature Park, Newcastle, in the rain, back at the outset of this process.

What about our authorial name, the Common Sense Policy Group?  Names containing ‘common sense’ are often found badging groups of extreme libertarians who want to undermine the state, or social conservatives whose mission is to defend traditional social hierarchies. What business does a self-identified progressive group have describing their politics as ‘common sense’?

We did this on purpose. We’d like to reclaim the notion of common sense for people who want an ever better, fairer, more humane society for ever more of the people. The political right presents progressive ideas as  theoretical, abstruse, ideological, as coming from foreign intellectuals with a tenuous relation to place, tradition  or practicality. They invoke common sense in opposition to this, to justify current inequalities of resources and power, and bind us into grim understandings of the limits on the what is possible. It may not be pretty, they tell you; but common sense tells you there is no alternative.

We take a different view. The present, concentrated, social-good-hostile political settlement is not the only way that human societies could or have lived. It’s not the most ‘natural’ way, nor the way that most faithfully reflects the moral and imaginative capacities of most people. It is one rather specific way of being a society, a way that was carefully constructed, justified and propagated quite recently by a small sector of the population who had enough power and saw it as in their interests to do so.  The insistence that it is common sense, for example, that most of the growth of the UK economy goes to a small group of highly wealthy people, or that our rivers are full of sewage whilst our water companies pay dividends to their private owners, is an ideological move. It’s not common sense at all.

On the contrary, there is a case to be made that there exists, in humans, a common sense, or a least a commonly held set of moral sentiments, that is the wellspring of the progressive impulsive. People at all times and places have cared about the least well off in society. They have been averse to certain types of inequality and motivated by fairness. They have had the sense that social relationships and institutions, not just individual resources, are important. They are happy to contribute to public goods as long as others also do so. Undoubtedly, they value their autonomy, freedom and privacy too; these are also part of their common sense. These widely-shared  human sentiments provide us with an infrastructure for imagining visions of how societies can, not just be good enough, but become ever better as material abundance increases through technical progress.

In short, the Common Sense Policy Group wants to flip the flop on the invocation of ‘common sense’ in political conversation. We don’t agree that it is strange alien ideologies that give people the idea of a better, fairer, more equal society; nor that it is common sense that provides the grim reminder that we have to tolerate and normalise the current concentration of wealth, health and power. On the contrary, the defenders of the current concentrations reflect one ideological group in one place, at one time, motivated by some highly abstract conceptions of how things should work. It’s common sense that says: time to act.

The double dividend of safety

A guest blog in which Gillian Pepper states the obvious…..

A picture of Gillian Pepper of Northumbria University

Some time ago now, I was chatting with Daniel over lunch. I told him that Richard Brown and I were continuing to find evidence in support of a theoretical model that Daniel published over a decade ago. Daniel surprised me with his response. He declared that the conclusions of his model (which I will explain in moment) were so obvious that it would be surprising if they weren’t true. He had a point. And yet, we continue to act as if the obvious weren’t obvious. Perhaps, Daniel and I agreed, our conclusions would need to be repeated numerous times and to many audiences before they can perforate collective consciousness. As a starting point, Daniel invited me to write this guest blog.

The model of the “obvious”

Though Daniel’s original model contains various details and assumptions, the key points are as follows:

  1. We are all exposed to health risks which, no matter what we do, will reduce our life expectancies. That is, there are risks beyond our behavioural control. For example, without refusing to ever leave our homes, we could never entirely eliminate our risk of death due to transport accidents. Daniel originally referred to this as extrinsic mortality risk, borrowing a construct from evolutionary biological models of senescence. We now call it uncontrollable mortality risk.
  2. Some people are exposed to greater overall risk than others, and some are less able to mitigate the risks they face. That is, there are inequalities in exposure to risk. Depending upon where in the world you happen to live, and what resources you have available to improve your safety, there are myriad uncontrollable risks that might affect you. If you’re unlucky, war, violence, natural disasters, or extremes of weather might be hazards you face on a regular basis. Or perhaps the risks you face might be less obvious issues, such as mould and damp in your home, a polluted neighbourhood, or flammable cladding on your building. Whilst these issues may seem controllable to a relatively affluent person, they can still be classified as uncontrollable for those who can’t afford to move to a better neighbourhood, or to make the necessary repairs to their housing.
  3. Uncontrollable risks reduce the future benefits of healthy behaviour. If there’s a non-zero chance that we will be struck down by an uncontrollable force before reaching an age at which the consequences of our lifestyle choices will be felt, then the temptation to indulge in short-term rewarding but long-term damaging behaviours, such as alcohol consumption will be greater. Especially when there is some benefit of that indulgence in the present (e.g. improved social bonding).
  4. There is also a trade-off: time, money, and effort spent on health cannot be spent on other things that matter to us. Daniel’s model examines varying strengths of trade-off but, in general, the idea is that efforts spent on taking care of our health conflict to some extent with other things that might be important to us. Anyone who has experienced sleep deprivation due to caring responsibilities or eaten unhealthy convenience food due to time pressures at work will readily understand such trade-offs.
  5. Consequently, exposure to uncontrollable risk should reduce our motivation towards healthy behaviour because it would mean investing efforts in health, instead of other priorities when, regardless of our efforts we might not live to see the long-term payoffs of taking better care of ourselves. This, I believe, is an unconscious driving force behind health motivation. One of a number of reasons (there will, of course, be other drivers too) that it can feel so difficult to do those things which we know would in some sense be better for our health.
  6. Finally, the model suggests there will be a compound effect of extrinsic risk and health behaviour. An important implication of this is that people who, by no fault of their own, can do little to control the risks they face, will be less motivated to take care of their health (mitigate the risks they can control) than those of us who are lucky enough to feel safe and in control of our lives. And this will make the gulf in their achieved life expectancy even wider than it would have been for structural reasons. Social disparities in health behaviour can thus be seen as a downstream consequence of structural inequalities, rather than whim or ignorance, as some might assume.

To summarise the general idea: if you believed that, despite best efforts, you might die young due to war or natural disaster, would you worry much about whether you were eating enough fruits and vegetables? Probably not. And that was Daniel’s point. It would be rather surprising if people living in environments laden with threat were keen to quit smoking and forgo junk food. Nonetheless, we’ve dedicated a fair bit of time to testing this model.

We first tested the model by devising a measure of perceived uncontrollable mortality risk and assessing its relationship with self-reported health behaviour. When that study uncovered surprisingly large associations between perceived uncontrollable risk and health behaviour, we sought evidence of a causal relationship. We ran experiments designed to alter people’s levels of perceived control and measure their subsequent food choices. These found that people who were primed to feel that their personal risk levels were largely controllable were more likely to choose fruit than chocolate as a reward for taking part in the study. Richard Brown and I collected data during the COVID-19 pandemic to assess whether perceptions of uncontrollable risk had increased, and whether this was related to health behaviours in the UK (relatedly, we worked with Calvin Isch and colleagues to look at perceptions of uncontrollable risk in the USA). We found that perceived uncontrollable mortality risk had increased due to the pandemic and that it was associated with greater odds of smoking and lower odds of meeting Government guidelines on diet and exercise. More recently, Richard and I have published a replication and mini meta-analysis on the topic.

So, why all this effort to look for an association which would be puzzling if not present? Well, the answer is that the idea has some important implications. One of these implications is something I like to call the double dividend of safety.

The double dividend of safety

The idea of the double dividend of safety is simply that, if we make people safer by reducing those risks which they can’t avoid for themselves, we can expect that they will become more motivated to take care of their own health. So, we get the primary benefit of the initial improvement in safety, and the additional, secondary benefit of improved health from better health behaviour. That’s two benefits. A double dividend. If you think you’ve heard of the double dividend concept before, it may well be because you’ve encountered it in the context of environmental taxes. In this context, “double dividend” refers to the idea that environmental taxes should not only reduce pollution (the first dividend), but also reduce overall tax system costs if the revenue generated is used to displace other taxes that slow economic growth (the second dividend).

Understanding the double dividend of safety (rather than environmental tax) is important for numerous reasons. Among them, the fact that public health goals are often approached in silos. Behaviour-change programmes tend to operate in isolation, with practitioners rarely able to address the wider problems affecting those whom they seek to serve. This is not news, of course. Healthcare leaders have pointed out the need to break down this siloed approach. However, the double dividend of safety gives us another reason to call for joined-up thinking.

The concept could also be used to “sell” safety. You might think this unnecessary. Isn’t the importance of safety another one of those things that should be blindingly obvious? However, in a recent conversation with a Campaigns Manager at a global safety charity, I was surprised to learn that it can be difficult to persuade those in power that safety is important. “Safety isn’t sexy”, he said. This came as a surprise to me, but perhaps it shouldn’t have. Those who have the power to make change for others, on average, probably don’t have much experience of being unsafe. As Daniel mentioned in a recent blog on inequality, when the ruling classes have so little contact with what the majority experience it becomes difficult for them to make decisions that work for the public good. Yet, it remains true that public health funds are spent on giving the general public information and tools (usually in the form of websites and apps) in attempts to improve health behaviour. For example, the UK Government’s Better Health Campaign, which purportedly cost £10m. Such efforts make it clear that there is a desire to improve health behaviour.

What if, we were to instead shift our focus to making people safer? The double dividend of safety suggests that they would automatically be more motivated to take care of their health: a double win. Whilst this might initially seem like the harder (and probably more expensive) path to take, I’m willing to bet that it would also be the more gainful one in the long run.

Your study should not be like a mansion

Lately, I’ve been coming across a lot of proposed study designs that were like mansions. There I was, appreciating the well proportioned main research questions and the generosity of the outcome measures, when a little door got opened up in the panelling, and it became evident there were whole wings beyond the part where I came in; wings with other measures, sometimes in wildly different styles, and objectives of their own, and additional treatments, and turrets and gargoyles and intervening variables. The wings were somehow connected to the hall I had entered by, making for one big, rambling complex. Yet, they were somehow separable, in that you could live for years in one without needing to go into the others. Indeed, you could imagine them being parcelled off into entirely separate flats.

A picture of Schloss Ringberg, Bavaria
Schloss Ringberg, Bavaria. Your study really should not be like this. If you want to see why, read about the lives of Friedrich Attenhuber and Duke Luitpold and you will see.

Your study should not be like a mansion. It should be more like a single room than a mansion. Your study should follow the principles of Bauhaus or Japanese minimalism. Clutter should be removed until rock bottom simplicity has been achieved; then the design should be decluttered all over again. The ambition should be repeatedly refined and made narrower. There should ideally be a single objective. Outcomes should be measured with the best available measure, and no others.  Control variables should be designed out of existence where possible.  Mediators and moderators – do you need them? Why? You haven’t answered the first question yet. The analysis strategy should have the aching simplicity of Arvo Part’s Spiegel Im Spiegel. Anything that can be put off to another future study should be, leaving this one as clear and austere as humanly possible.

I am aware that I always made my studies too complicated in the past, and I see the desire to do so almost without exception in the younger researchers I work with. I am wondering where the desire to over-complicate things comes from.

Part of it, I am sure, comes from the feeling that there is a potential upside to having more measures, and no cost. You’ve got the people there anyway, why not give them that extra personality questionnaire? Or stick in that extra measure of time perspective, or locus of control, or intolerance of uncertainty? The extra burden on them is small; and surely, if you have a superset of the things you first thought of, then you can find out all the things you first thought of, and maybe some more things as well.

We were taught to think this way by the twin miracles of multiple regression and the factorial experimental design. The first miracle meant, we thought, that we could put more predictors in our statistical model without undermining our ability to make estimates of the effects of the ones we already have. In fact, things might even get better. Our r2 value would only go up with more ‘control’ variables, and our estimates would become more precise because we had soaked up more of the extraneous variance.

The second miracle meant, in an experimental study, that we could cross-factor an additional treatment with the first, without affecting our ability to see the effects of the existing one. Let’s do the thing we planned, but have half the participants do it in an inflatable paddling pool, or wearing noise-cancelling headsets. Our ability to detect the original effect will still be there when you average across this treatment. And we will know about the effects on our outcome of being in a paddling pool, to boot!

The truth is, though, that nothing comes for free. Cross-factoring another experimental treatment can make it difficult to say anything very generalizable about the effects of the original treatment. We wanted to know whether, in the world, caffeine improves memory performance, and we discover that whether it helps or hinders depends on whether you are standing in a paddling pool or not. But, in life, in the real world conditions where one might use caffeine to boost memory, one has not, as a rule, been asked to stand in a paddling pool. What then is the take home message?

As for the miracle of multiple regression, this is even more problematic. The idea that including some extra variable X2 in your regression leaves you still able to estimate the effects of X1 on Y in an unbiased way holds only in a subset of the possible cases, namely when X2 has an effect on Y but is not affected by X1, Y or any of their unmeasured consequences.  It is very hard to be sure that these conditions apply to your study.  This fact is not widely appreciated, with the consequence that whole swathes of social and behavioural sciences include far too many variables in their regressions, including many that they should not (see here and here; I am looking at you sociology, and you, epidemiology). Your thing does not become more true if you have controlled for more other things; it usually becomes more obscure. In fact, if you see it in a complex analysis with lots of additional covariates (especially if you see it only then), this increases the chances that it is in fact a statistical artifact (here for a case study).

Another exacerbating factor is psychology’s obsession with identifying mediators. It’s all very well to show how to change some outcome, but what’s the mechanism by which your intervention works? Does it work by changing self-esteem, or locus of control, or stress? Again, we were taught we could answer mechanism questions at no cost to the integrity of our study by throwing in some potential mediating variables, and running a path analysis (where you run your regression model first without and then with the inclusion of the potential mediator, and compare results). But, again, with the exception of some special cases, doing this is bad. Not only  does adding a mediator often lead to overestimation of the degree of mediation, it actually imperils your estimation of the thing you cared about in the first place, the average causal effect. There is a whole slew of papers on this topic (here, here and here), and they all come to the same conclusions. Don’t clutter your study with mediators in the first instance; they will probably confuse the picture. Identify your causal effect properly and simply. Answering further questions about mechanism will be hard and will probably require new studies – maybe whole careers – designated for just that. (Similar comments apply to moderators.)

What underlies the impulse to over-complicate, at root, is, fear of being found insufficient. If I have only one predictor/manipulation and one outcome, how will I be judged? Can I still get published? Does if look tooo simple? What if the result is null? This is what I hate most about science’s artificial-scarcity-based, ‘significance’-biased, career-credentialing publication system. People feel they need a publishable unit, in a ‘good’ journal, which means they have to have a shiny result. They feel like they can increase their chances of getting one by putting more things in the study. This imperative trumps actual epistemic virtue.

So, complexity creeps in as a kind of bet-hedging against insecurity. Let’s add in that explicit measure of the outcome variable, as well as the implicit one. In fact, there are a couple of different explicit scales available: let’s have them both! That gives us lots of possibilities: the explicit measures might both work, but not the implicit one; or one of the explicit measures might look better than the other. There might even be an interaction: the treatment might affect the implicit measure  in participants who score low on the explicit measures – wouldn’t that be cool? (Answer: No). Even if the intervention does not work we might get a different paper validating the different available measures against one another. But the problem is that you can’t make a study which is at the same time an excellent validation study of some different measures of a construct, and also a test of a causal theory in that domain. It looks like a capacious mansion, but it’s just a draughty old house none of whose wings is really suitable to live in.

If you put in more objectives, more measures, and more possible statistical models, you are more likely to get a statistically significant result, by hook or by crook. This does not make the study better. We are drowning in statistically significant results: every paper in psychology (and there are a lot of papers) contains many of them. It’s not clear what they all mean, given the amount of theoretical wiggle room and multiple testing that went into their construction. Their profusion leads to a chaotic overfitting of the world with rococo ‘theories’ whose epistemic lessons are unclear.  We need fewer new significant results, and more simple and clear answers (even descriptive ones) to more straightforward questions. Your study could be the first step.

Perhaps the main unappreciated virtue of simpler studies, though, is that they make the researcher’s life more pleasant and manageable. (Relatedly, an often overlooked benefit of open science is that it makes doing science so much more enjoyable for the researcher.) When you double the number of variables in a study, you increase the possible analyses you might conceivably run by at least a factor of eight, and perhaps more. Don’t tell me you will have the strength of character to not run them all, or that, having discovered one of those analyses gets a cute little significance star, you will not fret about how to reframe the study around it. You will spend months trying out all the different analyses and not be able to make your mind up. This will be stressful. You will dither between the many possible framings of the study you could now write. Your partner will forget what you look like. Your friends’ children will no longer be toddlers and will have PhDs and children of their own. Under socialism, data analysis will be simpler than seems even imaginable under the existing forces and relations of production. Until then, consider voluntary downsizing of your mansion.

Note. Some studies have a lot of measures by design. I am talking about ‘general purpose’ panel and cohort studies like NHANES, Understanding Society, the SOEP, and the UK National Child Development Study. Rather than being designed to answer a specific question, these were envisaged as a resource for a whole family of questions, and their datasets are used by many different researchers. They have thousands of variables. They have been brilliant resources for the human sciences. On the other hand, using them is full of epistemic hazard. Given the profusion of variables and possible analyses, and the large sample sizes, you have to think about what null-hypothesis significance testing could possibly mean, and maybe try a different approach. You should create a Ulyssean pact before you enter their territories, for example through pre-registering a limit set of analyses even though the data already exist, and pre-specifying smallest meaningful association strengths, rather than null hypotheses. Even in these studies, the designers are conscious of trying not to have too many alternate measures of the same thing. Still, it remains the case that a lot of what I say in this post does not really apply to the designers of those projects. Your study should not be like a mansion, unless it actually is a mansion.

The Changing Cost of Living study, part two: The dynamics of poverty, anxiety and depression


Readers may be aware of the Changing Cost of Living study, which we carried out in this team from Autumn 2022 to Autumn 2023.  I wrote an earlier post here explaining what the study was and why we did it. The study is now complete and the first paper is available here as a preprint.

Briefly, we were interested in how people’s mental health (notably for today’s purposes, anxiety and depression) related to their financial circumstances. Many, many previous studies have found that people with worse financial situations are more anxious and more depressed. So what could we possibly add?

The problem with the literature on income and mental health is not one of knowing whether there is an association; there is.  It is knowing how to interpret this causally. There are three possibilities here:

  1. Having a low income causes anxiety and depression.
  2. Having anxiety and depression causes people’s incomes to decline, because their symptoms interfere with their ability to work or their career progression.
  3. Some other variable both causes anxiety and depression and causes income to decline. Anxiety, depression and low income find themselves mingling because they are all here as consequences of something else, be it genes or schooling or exposure to lead paint.

Causal understandings matter in social science, because they orient you to the place you should be intervening if you want to make human life better. Roughly speaking, the stronger pathway 1 is, the more you are pointed toward redistribution and the relief of poverty; the stronger pathway 2 is, the more valuable it seems to improve treatment for mental health conditions as the first priority; and it is unclear where pathway 3 points, but it is probably somewhere else.

Teasing apart the three causal pathways is challenging. The particular way we wanted to do it in the Changing Cost of Living study was through an intensive longitudinal study, in which the same people’s financial situations, anxiety and depression, were measured repeatedly and frequently over time. So, our participants (around 470 adults from the UK and France, towards the middle and lower end of the income distribution) filled in a financial diary every month, specifying all the sums that had come in in the previous month, and the main non-negotiable sums that had gone out (rent/mortgage, energy and water bills, local taxes). This allowed us to calculate an income-to-costs measure: what is the ratio of your incomings to your unavoidable outgoings? Over the period of the study, both incomes and particularly costs were changing a lot: this was a period of high inflation in things like energy bills, especially in the UK. The participants also filled in standard clinical measures of anxiety and depression every month.

These data allow us to ask a question about each person’s average position, and a question about their fluctuations, as follows:

  • Average position: Were participants who, on average over the year, had lower income-to-costs ratios, more anxious and depressed on average over the year?
  • Fluctuations: Was a given participant more anxious and depressed than usual following a month in which their income-to-cost ratio was worse than usual?

The average position question is the less interesting: I would have eaten my hat if there had been any other answer than ‘yes!’, and so indeed it proved.

The fluctuation question is interesting in the light of the three causal pathways outlined above. If pathway 3 is important and some third variable, genes or childhood experience or schooling, is responsible for the association between income and mental health, then there is really no reason for the fluctuations in income to be coupled to the fluctuations in mental health. So to the extent to which the answer to the fluctuations question is yes, this is probably picking up some kind of causal relation. (This is the relation known in social science as Granger causality, which is a kind of causality-lite. Really  getting at causality in social science is hard – what really causes behaviour, after all? – but if wiggles in one variable produce wiggles in the other, that seems causalish in the way that the association between having an expensive coffee machine and liking opera does not).

Does the answer to the fluctuation question tell us anything about the relative importance of pathways 1 and 2? We argue that it does, especially because we studied fluctuations over the short timescale of one month. The reasoning is as follows. Let us say that an increase in your  anxiety and depression leads your income to decline. That’s probably not going to happen on the timescale of a single month. A worsening of your symptoms could lead you to need time off; to move to part-time working; or to miss out on promotions you otherwise would have got. But these are the kinds of effects that will take months or years to accumulate. If my symptoms suddenly get worse, that might well show up in a worse financial situation in a year’s time, or 5 year’s time. But my income is unlikely to drop a lot this very month, not least because most employers give you sick pay at least for a while. So, we argue, if my financial situation has been worse just this very month, and I feel more anxious and depressed this very month, the most parsimonious conclusion is that I feel anxious and depressed as an immediate result of the financial situation; i.e. pathway 1 is the primary driver. This argument is not completely water-tight, but I think it is reasonable. In other words, the size of the fluctuation in anxiety and depression when income-to-costs fluctuates is a lower bound estimate of the causal effect of finances on these outcomes.

That build up was rather long; what about some results? Figure 1 summarises the data.

Figure 1. Anxiety (panel A) and depression (panel B) scores for people with different ratios of income to essential costs on average over the year, shown in a typical month (black solid line), a month where the ratio was twice its usual value (dotted green line), and a month where the ratio was half its usual value (red dashed line).

The solid line shows the typical anxiety or depression score for someone whose income is only equal to their essential costs (1); for whom it is twice their essential costs (2); or 3 or 4 times.  As you can see, the lower your average income-to-cost ratio, the greater your symptoms (this is the answer to the average position question, above). But the dotted and dashed lines show what happens when a person’s income-to-cost ratio fluctuates from their typical value: a month where things are twice as good as normal (green dots) or only half as good (red dashes). This is the answer to the fluctuation question: yes, in a month where you finances are less good than usual, your mental health is less good than usual too.

The within-person month-to-month associations are about one quarter the strength of the between-person differences. That means that if I double your income overnight, I will (extrapolating from our analysis) make your anxiety and depression immediately better; but I will only make up about 25% of the expected difference between you and someone whose income has always been twice yours. There are different ways to think about why this is the case, but the easiest one for me is the assumption that a lot of the bad things about poverty are cumulative: one month of shortfall is manageable, two months is harder, three months is worse, and so on. In this light, the fact that you can get as much as a quarter of the overall effect in a single month is striking.

Of course, conclusions about causality without performing a true randomized trial must be tentative. We present additional analyses and consider alternative interpretations in the paper. But I do feel these data, simple though they are, support an emerging (actually, re-emerging) view of anxiety and depression as conditions that should be conceived of socioecologically. That is, though the causality is undoubtedly complex, we should not rush to the brain of the depressed individual as the sole causal focus, still less to drugs (or for that matter cognitive behavioural therapy) as the only relevant types of possible intervention. That brain is responding to socioecological factors: the distribution of material resources in the environment, power, social support. Those are the main causes of the causes. If we understand and influence the distribution of those things in the population, our potential effect on population health is immense. Yes, the effect of an income fluctuation on depression was only a point or two on a scale; but the same is also true of the effects of antidepressant medications, whose efficacity is measured using similar scales, and which also have very modest effects on average.  Multiply the odd point on the scale by the millions of people who find themselves in poverty, disenfranchised, and isolated, and you can see the vast potential for political decisions to improve – or exacerbate – human suffering.

The Changing Cost of Living study was a collaboration and I would like to thank my collaborators and the funders. And of course, the participants, who completed an average of over 10 financial diaries each and stuck with us in great numbers over the course of the year. The work was funded by the French Agence Nationale de la Recherche (ANR); the UK NIHR; the University of York Cost of Living Research Group; and the UK Prevention Research Partnership (MR/S037527/1) collaboration, ActEarly. For a full funding statement, see the paper.

Universal Basic Income already is a targeted system

A common response to our work on Universal Basic Income as an anti-poverty policy is the following: ‘Well, that’s going to cost a lot of money. Rather than giving money to everyone, including lots of people who don’t need it, it would be better to target all that money on the poorest people, who really need it. You will have create a bigger anti-poverty effect that way’. There is a version of this argument here, for example.

Though this argument seems intuitively right, it’s actually not, oddly. UBI schemes of the kind we have advocated are in fact both universal (everyone gets them) and really well targeted at the poor. In fact, UBI schemes can be designed to have any precision and profile of social targeting that a policy-designer could imagine. In this post, I try to explain why.

The important thing to bear in mind is that the fiscal interaction between the state and the citizens is a two-way thing: there are both taxes (citizen to state) and transfers (state to citizen). When assessing how targeted a system is, you have to consider both of these: what is the net effect of the taxes and transfers on the income of each individual or household?

This means that although with a UBI, the transfer is universal, the net effect can be anything you like: you just set tax rates and thresholds appropriately. You can make it regressive, flat, progressive, targeted at the bottom 3%, targeted at the bottom 18%, or anything else you want to do.

In fact, here’s a theorem: the net effect of any non-universal benefit, for example a means-tested one, can be redesigned as a UBI with appropriate modification to the tax code. In the box below is a proof of this theorem (it’s really not complex). Here is the intuitive example. Let’s say you want just the bottom 10% of the income distribution to get £100 a week. You could make a transfer of £100 week to the bottom 10%; or you could, with equivalent financial effect, give everyone £100 a week and claw an extra £100 a week back from the top 90% by changing their tax thresholds and rates.

It follows that the right distinction to make is not between targeted transfer systems on the one hand and universal ones on the other. As we have seen, universal systems can be targeted. The right distinction is between ‘absent by default’ systems (the transfer is not made unless you actively apply for it), and ‘present by default’ systems (the transfer is made automatically to everyone). The question then becomes: why prefer a present-by-default system over an absent-by-default one? Why is it better, instead of giving £100 to 10% of the population, to give £100 to 100% of the population and then claw it back from 90% of them?

Actually, there are really good reasons. Absent-by-default schemes have a number of drawbacks. From the administrative side, you need an army of assessors and officers to examine people’s applications and try to keep track of their changing circumstances. But this is really hard: how can you tell how much need someone is in, or how sick they are? It means the state getting intrusively involved in people’s personal lives, and making judgements that are very difficult to make. From the user side, demonstrating eligibility is difficult and often humiliating. Even in terms of what they set out to achieve, absent-by-default systems fail. The point of the social safety net is to provide security and certainty. These are the things that people in adversity most need, and which help them make good decisions in life. Yet absent-by-default schemes like those that currently operate in most countries generate insecurity—rulings on eligibility can change at any time—and uncertainty—applicants don’t know if their application is going to succeed or be knocked back, or even when a decision will be made. And in an absent-by-default system, the support that comes through comes through retrospectively, after a delay in which the application has been assessed. By this time the person’s circumstances could have changed again, or they have got into an even worse predicament of homelessness or debt, which will cost even more to sort out.  

The other great drawback of absent-by-default systems is that they always generate perverse incentives. If you have to demonstrate unemployment in order to continue to qualify, then you have an incentive not to take some part-time work offered to you; if you have to demonstrate poverty, you have an incentive never to build up savings; and if you have to demonstrate ill health, you have an incentive to remain sick. It is very hard to avoid these perverse incentives in an absent-by-default system.  

Why do countries mostly have absent-by-default systems, when those systems have such obvious drawbacks? Sir William Beveridge was aware of their drawbacks back in the 1940s when he designed the UK’s current system. He favoured presence by default for child benefit and for pensions, and this has largely been maintained. He was against means testing for some of the reasons described above, but more means testing has crept into the UK system over the decades. He did however make more use of absence by default than a full UBI would. That’s because the economic situation seventy years ago was so different from the one we face now.  

Seventy years ago, people of working age tended to have single, stable jobs that paid them the same wage over time, and this wage was generally sufficient for their families to live on. The two circumstances where they needed the social safety net were cyclical unemployment, and inability to work due to illness or accident. These circumstances were rare, exceptional, and easy to detect: it is relatively easy to see if a factory has been shut down, or someone has broken a leg in an industrial accident.  

By contrast, today, many people have multiple concurrent or consecutive economic activities. The incomes from these fluctuates wildly and unpredictably, as in the gig economy or zero-hours contracts. It is often insufficient to live on: 61% of working-age adults in poverty in the UK today live in a household where at least one person works. The situations of need that Beveridge’s systems were designed to respond to were rare and exceptional. In the UK and other affluent countries today, need is frequent, can crop up anywhere, and waxes and wanes over time. An absent-by-default system cannot keep up with, or even assess, situations like this in any kind of reasonable or efficient way. The perverse incentives also loom large, as people avoid taking on more hours or activities so as not to trigger withdrawal of benefits.

You might by now be convinced that a transfer system that is both universal and well targeted at the poor is logically possible. But I need to convince you that it is practically possible too. Figure 1 shows the net effect on people in different deciles of the income distribution of a starter Universal Basic Income scheme for the UK, as recently modelled by Reed and colleagues. The important thing about this scheme is that it is realistic and affordable. With just modest changes to the tax code, chiefly the abolition of the personal zero-tax earnings allowance, it is fiscally neutral. That means, the government does not have to spend any more money than it already does, even in the short term, in order to bring the scheme in. As you can see, although everyone would get the payments, the net benefit would be hugely greatest for the poorest 10% of the population; somewhat beneficial for everyone below the median; and only a net cost to the richest, who would be net payers-in to an even greater extent than they already are.

Figure 1. Effects of the introduction of the starter Basic Income scheme on the incomes of households in different income deciles in the UK (percentage point change on the status quo). From Reed et al. (2023).   

This starter scheme would see a universal payment of £63 a week, about 70 euros. Sixty-three pounds does not seem like very much, but the scheme would still have a dramatic immediate impact on poverty. Using the conventional definition of poverty as 60% of the median income, the number of working age adults in poverty would fall by 23%, and children in poverty by 54%. The well-being impact would be larger than these figures imply, because people would have the predictability of a regular amount coming in each week that they knew would always be there. The long-run distributional consequences could be even more positive, as certainty and lack of perverse incentives allow people towards the end of the income distribution to be more active and become more productive.  

It probably is that bad

The discipline of psychology is wringing its hands about its failure to make enough substantial and dependable scientific progress over the last fifty years of effort. First, we blamed our methods: hypothesizing after the results were known, researcher degrees of freedom, p-hacking and the rest. Then, we went after the theories: theories in psychology were so arbitrary, so vague in their relation to anything we could measure, so foundation-less, and so ambiguous, that tests of them were both much too easy, and much too hard. They were much too easy in that they could always be deemed a success. They were much too hard, in that the tests did not generally lead to substantive, cumulative, coherent knowledge. Reforms to the theory-building process were urged (here, here, here). Actually, these critiques were not new: Paul Meehl had made them floridly decades earlier. Writing in the last century, but in a parlance that would have been more at home in the century before, he compared the psychology researcher to: “a potent-but-sterile intellectual rake, who leaves in his merry path a long train of ravished maidens but no viable scientific offspring.”

I read all this with interest, but I remember thinking: “it can’t really be that bad.” I didn’t come across that many obviously terrible theories or terrible tests in my day to day life, or so I felt. I assumed that authors writing about the theory crisis – who obviously had a point in principle – were exaggerating how bad the situation was, for rhetorical effect.

Recently, Joanna Hale and her colleagues have made the important contribution of creating a database of theories in psychology (more specifically, theories that relate to behaviour change). All of the theories are represented in a common formal and graphical way. The database is here, and the paper describing its construction is here.

The database gives us a few useful things about each theory. First, a list of the constructs, the things like self-efficacy or self-esteem or normative motivation or health beliefs or whatever, which constitute its atomic elements. Second, a list of the relations between them (self-efficacy influences normative motivation, self-esteem is part of self-efficacy). And third, combining the first and second, a nice graphical representation, a kind of directed acyclic graph or DAG: which construct, according to the theory, does what to which other construct?

The genius of this database is that our array of theoretical tools (76 different theories, no less) is laid out before us on the bench in utter clarity. I have to say, my immediate reaction is: oh dear, it really is that bad.

Why do I say this? If you look at figure 1 I think you will see why. I chose this theory more or less at random; most are not much different.

Fig 1. Example representation of a theory, from the theory database. I chose it pretty much at random; they are nearly all like this.

The first and most obvious problem is that the theories contain many links, an average of 31 and a maximum of 89. And that is the direct connections. A connected network with 31 direct links probably has thousands of distinct indirect ways of getting from any upstream construct A to any downstream construct B. Some of these pathways will be mutually suppressive: A has a positive influence on B; but also a positive influence on M which has a negative influence on B. So what should be the empirical covariance of A and B, given that the directions of these associations are specified by the theory but their strengths are not? The theory is consistent with: positive (the direct pathway is dominant); negative (the indirect is dominant); or null (the two pathways cancel each other out). In short, pretty much any pattern of associations between non-adjacent constructs could probably be accommodated in the theory’s big, leaky tent. It’s generally unclear what the systemic effect will be of intervening at any point or in any direction. Moreover, with 31 links and null hypothesis significance testing at p < 0.05, something is definitely going to be associated with something else; there will always be statistically significant results to discuss, though their actual significance will be unclear.

The multiplicity of links is an obvious problem that hides, I think, the much more fundamental one. Psychology’s problem is really one of ontology. In other words, what should be our atoms and molecules? What is in our periodic table? What is the set of entities that we can put into boxes to draw arrows between; that we can fondly imagine entering into causal relationships with other entities, and making people do things in the world?

In the 76 theories, there were 1290 unique constructs. Even allowing for fuzzy matching of names, 80% of those constructs only appeared in a single theory. No construct appeared in all the theories. Only ‘behaviour’ and ‘social’ appeared in more than half the theories, and those are hardly tightly defined. It’s like having 76 theories of chemistry, 80% which name completely unique types of building block (mine’s got phlogiston!), and which contain no type of building block common to them all.

The fact that we lack a stable ontology is really what makes our results so hard to interpret. Let’s take the theory known as ‘the systems model of behaviour change’ (figure 2). The theory distinguishes between (inter alia): the perception of self; the perception of social influence; attitudes; motivation to comply; and health intention. These constructs are all supposed to enter into causal relations with one another.

Figure 2. The database diagram for the Systems Model of Behaviour Change.

Suppose we measure any two of these, say motivation to comply, and health intention. We find they correlate significantly, at say r = 0.4. At least three things could be going on: (1) the theory is confirmed, and motivation to comply influences health intention; (2) our measurement of at least one of the constructs is impure; one of the questions that we think of as measuring motivation to comply is a really bit reflective of health intention too; thence the correlation is a methodological artifact; (3) motivation to comply and health intention are really the same thing, measured in two noisy ways using different questions; (4) neither motivation to comply or health intention is really a thing, so the correlation is meaningless.

The difficulty, it seems to me, is that the measurement issues cannot be solved whilst the ontological ones are still live (and, probably, vice versa). If we are not sure that X and Y are really two things, then we never know how to interpret the covariance of their measurements. Maybe we haven’t measured them purely; maybe we shouldn’t be measuring them separately; or maybe we have learned something about the structure of the world. None of the various criteria of measurement reliability or validity that circulate in psychology really helps that much. This criticism is most obviously applicable to correlational research, but it affects experimental manipulations too.

The ontological problem is really hard. You might think you can purify your ontology, get it down to bed rock, by eliminating all entities that could be redescribed in more atomic terms. But this approach account cannot be right. Taking it literally, we would need to remove from our ontologies all kinds of very useful things: atoms, elements, organisms, and genes, for example. There would be no science but subatomic physics, and it would take forever to get anywhere. No, there are all kinds of things we want to hang onto even though we know they can be eliminated by redescription.

The criterion for hanging on to an ontological category has to be much looser. Something like that: it reliably shows up for different observers; it is an aspect of a stable level of physical or biological organisation; and it proves itself useful and generative of understanding. Though this is far from clear cut, most of psychology’s ontological cabinet probably does not comply. In fact, who knows where the 1290 constructs in the database come from? Probably a combination of ordinary language, folk psychology, loose analogies, academic carpet-bagging, and random tradition. That’s really our problem.

There is no simple answer to this difficulty. The much-advocated ‘doing more formal modelling’ is not a solution if the entities whose relations are being formally modelled are flaky. Some ontologies are better than others. Generally the better ones (i.e. more stable, and leading to clearer and more accurate predictions) are more rooted in either biology (that is, either neuroscience or evolutionary biology), or in computational frameworks that have proved themselves predictive in a detailed way on lower-level problems (I am thinking of Bayesian models of cognition and their social applications, as discussed here). But, for many parts of ‘whole-person’ psychology, scientific performance in some coefficient of determination sense is not enough. We also want to maximise intuitive gain. The theoretical terms have to generate some insight, not least in the very people whose behaviour is their subject. Some ontologies probably do more for intuitive gain, others for biological realism, and it is hard to find the middle way.

One thing is for sure: we don’t need any more than our (at least) 1290 constructs. Perhaps there ought to be a global ontological non-proliferation treaty. I imagine a permanent conference, domiciled in St. Kitts or Geneva, where teams quietly work towards voluntary reduction in psychology’s ontological stockpiles. Volunteers?

Phoebe and the paradox of tragedy

All summer long, our little cat Phoebe spent much of her time squeezed onto the kitchen window bar, gazing out into the garden. How sweet, I thought, she is looking out on the sunshine and flowers. Coming home and encountering her yet again in position, we would say ‘she must really like sitting there’.

Weeks went by and it began to unsettle me. This is starting to seem obsessive; she has become like Edward in Harold Pinter’s A Slight Ache, staring up the lane from the scullery to see if the silent match-seller is in sight. Finding her unresponsive, curled up in her bed when I came down to make the morning tea gradually went from being charming to worrying. Has she worn herself out with her vigil? What if looking out all day and night is actually a horrific and exhausting chore? What if it is making her ill? Eventually, in an insomniac moment, I could not stop myself descending at 3am, to discover the silhouette of Phoebe by moonlight, at her post, peering fixedly into the gloom.  

We took her to the country for a couple of weeks. My anguish heightened a notch when, on returning and being released from her cat carrier, she sprinted downstairs to her post, where the window now had a greasy streak from her pressed nose, and stuck there.

A little later, the evidence became incontrovertible. Two huge tom cats were having a territory war in the street outside; one or other would patrol past every day or two. It was appalling for her to spy one, brazenly milling outside (female cats outside oestrus – and Phoebe is neutered – have no interest in sex; but they are worried about competition and violence). She would fill the kitchen with low moans, her tail bushed, almost foaming at the lips in her terror, but could not pull her gaze away. It took her many minutes to calm down. Phoebe didn’t like looking out from her post; she could not help it. The garden was not a scene of beauty; it was a site of mesmerising threat.

Phoebe and her problems remind me of David Hume’s essay Of tragedy. Why do people pay attention to representations – tragedies, horror movies, dark paintings – whose emotional effects include, prominently, negative emotions such as fear, anxiety and disgust? As you can imagine there is a substantial literature on this problem (see here for example). All of the many offered solutions seem to me special pleading or failures to really resolve it. We admire the beauty of the construction (Hume’s answer). We enjoy the moral evaluation that our own negative feelings are justified, or enjoy seeing the baddie get their comeuppance. Or, negative emotions produce arousal and we like arousal (even if negative in source) when we have nothing much else to do. Or, the negative emotions in question are not really negative but counter-evidentially positive, maybe because you have distance from or control over them, or because you frame them a certain way. And so on. But the point is, Phoebe spent her summer on the window bar without seeing any baddies get their comeuppance; without admiring any artistry; without being paradoxically ennobled; without having any detachment or control; certainly without moral vindication. Yet, there she was.

Whether the paradox of tragedy is even a paradox rather depends on your underlying model of how minds work. Intuitively, humans adopt the naïve utility calculus as their working model of the mind. That is, we assume that people (or cats) do actions because they like them; their liking is what makes them do the actions. Hence, the paradox: if they are doing things that seem to make them feel awful, those things must not really make them feel awful (or they wouldn’t be doing them); there must be some convoluted way in which they actually like those things, all evidence to the contrary. Thence all the various epicycles to explain how this could be the case, to square the evidence of our senses with the obvious violation of the naïve utility principle.  

But the naïve utility calculus is an everyday working model – a folk theory – not a good scientific account of how minds actually work. Minds are bundles of evolved mechanisms, mechanisms that generate attention and motivation to certain categories of things: conspecific and allospecific threats, potential food, potential allies, potential shelter, and so on. We don’t attend to those things for some higher-order reason such as we like them, or we estimate that we will have greater hedonic utility if we attend to them. We attend to them because they capture our attention, given the design of the mental mechanisms we have. As John Tooby and Leda Cosmides argued, humans are best conceived of as adaptation executors, not maximizers of fitness, utility, or pleasure. A fortiori they would apply this to cats too. From this point of view, there is no paradox whatever about Phoebe allocating all her time to a vigil that made her feel dreadful. Her mind was telling her she needed to keep an eye on that stuff, like it or not. Even in humans, it’s really, really difficult to switch off mechanisms when they are doing their thing, even though we have enough self-reflection to understand sometimes that we are self-harming in the process. Think of behavioural addictions, or devastating unrequited love.

How does this help with the paradox of tragedy in art? For one thing it shows that if you base your philosophy and psychology on a folk theory of the mind, you will generate apparent scientific puzzles that have no solution (other than: don’t start from there!). As my colleagues Edgar Dubourg and Nicolas Baumard have argued, artistic representations are cultural technologies. They are deliberately made by producers in order to capture the attention of consumers, just like you would make a shoe to protect a foot. Those producers understand something about the minds of audiences, like cobblers understand something about the anatomy of feet. So, naturally, the producers include ingredients that are good at causing mental adaptations in the audience to allocate attention: they include predators and rivals, love objects and moral regularities, places of shelter and places to flee, and so on. (There is a TEDX talk in French by Edgar on his work here.)

Producers typically include a range of different ones of these ingredients, to keep up the interest and lessen habituation. Thus, there is a very large set of different possible genres and sub-genres with different mixes of ingredients. But there is no requirement that artistic representations have to be positive in some general affective sense; they just have to succeed at making other minds pay attention. For humans, they have other requirements too, in order to endure. They have to hang together in some kind of plausible way, and the most durable ones have to repay cognitive reflection and communication. Artistic representations can be co-opted for other purposes such as teaching or there creation of coalitions. But those things are not their functions, and, to the point, there is no particular paradox if they incite mainly negative emotion rather than mainly positive emotion.

Ludwig Wittgenstein wrote that if a lion could speak, we would not be able to understand it. I think the point is that what is relevant and attention-grabbing to a creature depends what kind of mind that creature has; and that in turn depends, in Wittgenstein’s words, on the creature’s ‘form of life’. In more contemporary parlance, we would call this the ecology within which the creature’s mental mechanisms have evolved and developed. An unmoving tom cat sitting on the pavement is not my idea of an attention-grabbing spectacle (though come to think of it, isn’t there a late Samuel Beckett play that is more or less that?). Who knows how captivating, how nuanced, how dreadful, it was to Phoebe? If cats made art, maybe that is the art they would make.

October update: Autumn has come, the tom cats have gone, and Phoebe has left her position on the window bar.

Does greater inequality cause worse health? No! And: kind of yes!

The question of whether greater economic inequality makes people’s health and wellbeing worse is an important one. The literature has been moving fast over recent years, and the debate has moved on somewhat since my previous essay.

It can all get a bit technical and econometric at times. The questions most people care about are: (1) is the relationship between inequality and health really a cause and effect one?; and: (2) if we make our country more equal, for example by increasing benefits or redistributing from the rich to the poor, will we improve health? Somewhat paradoxically, I am going to answer (from my current understanding of the literature): ‘kind of no’ to the first question; and ‘yes’ to the second.

First, let’s look at the very brief history.

Act One:

In which a slew of papers appears, showing that countries, or US states, with greater inequality in incomes had lower life expectancy, worse health and mental wellbeing, and a host of other poor outcomes like lower trust and higher crime, when compared to countries or states with lower inequality. Inequality here was measured as the dispersion of the distribution of incomes, typically captured by the Gini coefficient; and many of these studies controlled for the median per capita income of the country. It’s not how rich you are, the argument went, it’s how big the gap is within your society. Big gap is bad, above and beyond how much money people have in absolute terms. The finding of the Gini-bad health correlation was sufficiently recurrent as to produce claims that this was a general finding about humans and what they need to be healthy.

Act Two:

In which a medley of articles, mostly by economists, argues that the correlation between income inequality and average wellbeing is a kind of statistical artefact. When inequality is greater, the poorest people within that society are also poorer. If you think about it, it must be true that, other things being equal, when the inequality is greater, the poor are poorer. Imagine you have two countries both with median incomes of $50,000, one of which is more equal and the other more unequal. Visualize them as two distributions of incomes, distributions whose medians are in the same place. The poorest people in the unequal one must be poorer in absolute terms (further to the left) than the poorest people in the equal one. That’s just what it means for it to be a more dispersed distribution. Now, what’s really bad for health and wellbeing is being poor, and what is more, it’s a non-linear relationship. At a certain point, being a bit poorer makes your health a lot worse. So if the poorest people in society X are a bit poorer, their health is a lot worse, and hence the average health of the whole population is lower. (In a more unequal society, the richest people are also richer than the rich in a more equal society; but beyond a certain point, being richer does not increase your health much, so the positive effect of greater inequality on health – via the rich getting even healthier – is statistically small).

Controlling for the median incomes of the two countries does not eliminate the confound between the income of the poorest people and inequality: in my example, the median incomes of the two societies are the same. Thus, a series of studies argued that the correlation between countries’ Ginis and measures of aggregate health or wellbeing mostly (though perhaps not entirely) comes down to the poorest people in the more unequal countries having lower individual incomes. Tom Dickins and I just published a recent example of the genre, using data from 28 European countries. We showed that the association between the Gini coefficent and average health or life satisfaction is greatly attenuated once you control for individual income.

The difference between the poverty-pushers and the inequality-urgers is very subtle: both, in practice, both think it would be better for health in countries like the UK and USA if economic resources were redistributed. The real difference between their claims only becomes clear when you examine some wildly improbable counterfactual scenarios. Imagine an economic windfall that doubled the real incomes of everyone in bottom half of the income distribution, and trebled the real incomes of everyone in the top half. Do you think average health would get better or worse? For a poverty-pusher, the answer is better, because everyone’s incomes have gone up, including big income increases for the people currently facing poverty. For a purist inequality-urger, the answer is worse, because it is the gap that matters per se. We seem unlikely to get to see the results of that experiment any time soon. In the mean time, both camps agree that bringing the incomes of the worst-off closer to the median of the distribution in countries like the US and UK is a good goal: it would reduce both poverty and inequality. Both camps also agree that taxing wealth or very high incomes is a reasonable way to do this: for the inequality-urgers, that’s a good in itself, because it reduces the gap. For poverty-pushers, it’s the rich who can most afford to contribute more without suffering any meaningful decline in their well-being.

Act Three:

In which social psychologists repeat the search for correlations between the Gini coefficient and subjective measures of health and wellbeing. They improve on the work of Act One by using larger data sets, often including data over time rather than just a single cross-section. They tend to focus on inequality over smaller areas, such as US counties, rather than larger ones, such as US states or countries. This is, they argue, a double virtue. There are thousands of US counties, and only 50 US states. So you have much more statistical power when you use the smaller unit. Plus, people are really bad at knowing anything about the inequality of their entire country. They spend most of their lives moving around the smaller place where they live, and meeting the other people who live there. The Gini coefficient of a whole country is unlikely to be related to anything in people’s actual lived experience of inequality; the Gini coefficient of their town or county just might be. So it’s a better test of the causal potency of inequality to affect people if you use the local-scale measure. And, in fact, when people on low incomes move to rich neighbourhoods, thereby increasing the inequality they experience in their daily lives, their health and wellbeing improve rather than getting worse.

The Act Three studies conclude that there is no consistent relationship between inequality, as measured by the Gini coefficient, and happiness or health (for example here and here). Their studies are big and they are pretty firm about this. Their argument, note, is different from the Act Two guys. The Act Two guys said that there was a relationship, but it was largely explained away by individual income. The Act Three guys said there was no relationship to explain away in the first place.

And so?

And so, dear reader, where is the hypothesis that more inequality is bad for health, as they say, at?

First, I don’t think the current evidence supports the contention that the magnitude of the gap is in itself directly harmful to human health to any substantial extent. The primary grounds for saying this comes from the Act Two studies: when you control for the curvilinear effects of individual income, most (though maybe not absolutely all) of the association between inequality and health goes away. Inequality is associated with poor population health, because when the inequality is bigger, a greater fraction of the people face material scarcity. But, it is material scarcity that actually puts the causal boot in at the individual level. Concretely, the proximal reason people living in poverty in Tennessee have terrible health is not that the difference between their incomes and those of millionaires elsewhere in the state is too big. It’s that their incomes are not sufficient to live well on, given the society they live in. (I put this last rider in because I always get the response; yes, but their incomes are higher than the rich of yesteryear. Well maybe, but a lot of things cost more now than they did in yesteryear, including some really important things like food and access to healthcare.)

The secondary grounds for saying that inequality is not in itself the causal agent of harm is that when you measure the size of the income gap at the local-area scale, like the town or country, it seems to explain no variation in health outcomes (see Act Three). But the local-area scale is the area at which people are most likely to actually experience inequality in their lives. It’s odd in a way. When you measure inequality at a huge scale of measurement where people would be unlikely to be able to actually detect it (the country), you find associations. Where you measure it at a scale closer to their lived experience, those associations are absent. This does rather support the view that it can’t be the inequality per se that is the causal force at the proximal level. (By the way, I think the reason the associations hold at the large scale of countries better than the small scale of cities or countries is that the former contain a broader range of incomes, and the effect is largely mediated by individual income, as per Act Two.)

However, despite saying that inequality is not an important direct influence on health at the individual level, I do think that if we reduced inequality in developed countries, population health would improve. I am pretty much as sure of this as I am of anything in social science. This is simply because when you change inequality, you change the distribution of individual incomes. Specifically, you raise the incomes of the poor, for whom it will make a vast difference in health and wellbeing; and slightly reduce the incomes of the rich, who will scarcely feel it. So, the total amount of well-being goes up. (An important corollary of my position is that raising the incomes of the poor would improve population health whether or not it reduced the wealth of the rich, that is, regardless of its impact on the Gini. It’s just that, as it happens, the best levers we have for improving the incomes of the poor will also reduce the Gini.)

So: is the association between (country-level) inequality and population health causal, or not? Here, you have to say ‘it depends what you mean by causal’. On one view of causality, the way for example we say that AIDS is caused by the HIV virus, then, no; I don’t think we have identified, in the Gini coefficient, a pathogenic agent that causes the individual-level harm in such a way as to satisfy the Koch postulates. On the other hand, what people generally care about when they talk about cause is something like: would it regularly make a difference to health if we reduced the inequality of the income distribution? On this view of causality–which is sometimes referred to as an interventionist or manipulationist view–then I would have to say yes. Across the range of conditions that presently exist in developed countries, then available interventions that reduced inequality would generally, unless they had some weird negative by-products like causing a famine or a war, improve population health and wellbeing, possibly by a lot. Sorry if that’s rather a philosopher’s conclusion, but it seems to make sense of the conflicting literature.

There’s one more thing to say about the size of the gap in society. It may not per se have much effect on most people’s wellbeing. But I’ve been persuaded, notably by Darren McGarvey’s book The Social Distance Between Us, that it could have a big effect on the wisdom of our leaders. Broadly speaking, when social gaps are big, the people in power make worse decisions, and the people not in power are less able to hold them to account. This is because the ruling caste has so little contact with what the rest of the people are actually experiencing, and vice versa, that it is almost impossible for them to make appropriate decisions that work for the public good. And their constituents become disengaged, which means less public deliberation and input into the processes that are supposed to make the country better. The consequences of this gulf between the imaginary world that politicians are making policy for and the actual world of people’s lived experience are so evident that I scarcely need provide examples (consider the UK of the last twenty years, e.g.). If this factor is important, then it’s actually a different kind of argument for reducing inequality:- by doing so, we could get better institutions and better solutions to the challenges that we face.

What do people want from a welfare system?

All industrialised societies feature some kind of welfare system: institutions of the state that transfer material resources to certain categories of people or people who find themselves in certain kinds of situation. Non-industrialised societies have systems of social transfers too, albeit sometimes more informal and not organised by the state. People seem to think this is a good thing, or at least necessary. This raises the question: what do the public think a good welfare system would be like? How generous do they want it to be, and how would they like it to distribute its resources?

Polls in European nations consistently find most people expressing strong support for the welfare state. But there is a problem with this: when asked, a lot of people express support for tax cuts too. And for lots of other things, things that probably can’t all be achieved at the same time. This has led to one view in political science that most people’s policy preferences are basically incoherent (and hence, not much use in setting public policy). There is another interpretation, however.

Imagine you ask me whether I would like more generous benefits for people with disabilities, and I say yes; and you ask me if I would like tax cuts, and I say yes to that too. This might seem incoherent. But really, you should interpret my response to the first question as being other things being equal (i.e. if this move could be made without perturbing anything else) then I would favour more generous benefits for people with disability; and other things being equal I would favour tax cuts. Well doh. Of course if you could have lower taxes and everything else remain just as good, that would be nice. If you ask me about tax cuts without telling me about what would have to be discontinued to allow for them, you are implying they could be made with no loss to other social goods. But favouring tax cuts that cause no loss to other social goods is a totally different position than favouring tax cuts at the expense of something else. We should not confuse the two (and hence, by the way, you should distrust polls who say that say 107% or whatever of the British public want tax cuts; 107% of them also want better hospitals too). There is nothing incoherent about favouring other-things-being-equal tax cuts, but also preferring spending on benefits to be maintained in the event that the two goals conflict.

In other words, just asking people baldly about one thing, like tax cuts, doesn’t really tell you about the most interesting question, which is: given that different social goods, all of which we might want, are in conflict, how do you – the public – want them to be traded off against one another? How much more tax would you pay for higher benefits, or how much more poverty would you tolerate in order for taxes to be lower?

A popular method for studying how people make policy trade-offs is the conjoint survey. The researcher thinks of all the possible dimensions a policy could vary on. Let’s imagine our policy is a meal. It could vary on the dimension of cost (with levels: $1, $10 $50, etc.); deliciousness (1-10); style (French, Chinese , Italian, Ethiopian); nutritional value; carbon footprint; and so on. Now, we randomly generate all the possible meals within this multiverse, using all the combinations of levels of each attribute. Then we repeatedly present randomly chosen pairs of these policies, and the respondent says which one they think is better.

Because of the random generation, some of the policies are unicorns: the utterly delicious meal that costs $1 and has minimal carbon footprint. And some are donkeys: the $100 disgusting meal. But when you give enough choices to enough participants, you begin to be able to estimate the underlying valuation rules that are driving the process of choice. In effect, you are doing multiple regression: you are estimating the other-things-being equal effect on the probability of a policy getting chosen when its deliciousness is 6 rather than 5, or its cost $20 rather than $10. Valuation rules allow you to delineate preferences about trade-offs, by comparing the strength of a dispreference on one dimension with the strength of a preference on another. For example, people might be prepared to pay $3 for each increment of deliciousness. The trade-offs can be different for different groups of respondents: maybe those on low incomes will only pay $1 for each increment of deliciousness, meaning that in life they end up with cheaper and less delicious meals.

In a new study, Joe Chrisp, Elliott Johnson, Matthew Johnson and I used a conjoint survey to ask what 800 UK-resident adults want out of the welfare system. We made all of our welfare systems somewhat simple (a uniform weekly payment with one level for 18-65 year olds and a higher level for 65+). We then varied four kinds of dimensions:

1) Generosity: How big are the payments?

2) Funding: What rates of personal income tax should people pay to fund it? And would there be other taxes like wealth or carbon taxes?

3) Conditionality: Who would get it? What would they have to do to demonstrate or maintain entitlement?

4) Consequences: What would be the effect of the policy on societal outcomes, specifically, the rate of poverty, the degree of inequality, and the level of physical and mental health?

People in fact made very coherent-looking valuations, at least on average. And, yes, other things being equal, they wanted income taxes to be lower rather than higher. But the strongest driver of choice was the effect on poverty: people want the welfare system to reduce poverty, and they like it when it reduces poverty a lot (figure 1).

Figure 1. Estimated marginal effects on the probability of policy choice of rates of income tax (top); and effect on poverty (bottom). The dots are central estimates and the lines, 95% confidence intervals.

In the figure, a value to the left of the vertical line means that having that feature made people less likely to choose the policy, all else equal; and a value to the right of the vertical lines means having that feature more likely to choose the policy. This is compared to a reference level, which in this case is the current UK income tax rates for the upper graph, and the current rate of poverty for the lower one. So, the more a welfare system reduces poverty, the more likely respondents are to choose it; the more it increases poverty, the less likely are to choose it; and the effect is graded – the bigger the reduction in poverty, the better.

There were other features that also affected preferences. People like the idea of funding welfare from a wealth tax or a corporate or individual carbon tax, relative to the government borrowing more money. And they quite liked the welfare system to improve physical and mental health, and reduce inequality – or at least, not to make these things worse. However, none of these was as strong as the desire to see poverty reduced.

We also varied who would get the benefit (citizens, residents, permanement residents), and what the conditions would be (have to be unemployed, means testing….). None of these design features made much difference. This is something of a surprise since a big theme in the recent literature on public preference over welfare systems is the idea of deservingness: people don’t want welfare payments to go to the wrong kind of people, where wrong is conceived as slackers, free-riders or foreigners, and this saps, or can be deployed in order to sap, their support for welfare institutions. The way I read our results, these deservingness concerns are mostly pretty weak in the grand scale of things. People want a welfare system to reduce poverty in the best value-for-money way; they don’t care too much about the design choices of the institution so long as it does this.

The findings shown in figure 1 allow us to pit a given income tax rise against a given effect on poverty. For example, would people by prepared to pay ten more percentage points in order to halve the poverty rate? You work this out simply by summing the coefficients, negative for the tax rise, positive for the poverty cut, and seeing if the result is greater than zero. This exercise reveals a zone of possible acceptability, a range of income tax rises that people would find acceptable for a sufficiently large cut in poverty (figure 2).

Figure 2. Zones of acceptabilty and unacceptability for combinations of income tax rises and poverty change. The area shown in red would on average be unacceptable, and that shown in yellow would be acceptable. The status quo is shown in white.

These findings are quite noteworthy. Really substantial income tax rises – ten percentage points or more – would be acceptable on our average to our respondents, as long as they delivered a big enough decrease in poverty. British political parties currenrly work on the consensus that any talk of income tax rises is politically unfeasible. The Labour Party is currently and rapidly distancing itself from any hint of tax rises of any kind, including wealth tax, which our results and other research suggests would be popular. When the Liberal Democrats proposed a 1% increase in the basic rate of income tax in 2017, it was viewed as politically risky. Our results suggest they could have been an order of magnitude bolder and it could have been popular.

A worry you might well be having at this point is: well yes, this was all true of the particular sample you studied, but maybe they were particularly left-wing; it wouldn’t play out that way in the population more broadly. In fact, we already went some way to mitigate this by weighting our sample to make it representative of voting behaviour at the 2019 General Election. Also, and more interestingly, people of different sub-groups (left/right, young/old) differed only rather modestly in their valuations. Figure 3 shows figure 2 again but respectively for Conservative and Labour voters in 2019. You might think that we would see that Conservative voters want lower tax at any cost, while Labour voters want redistribution at any price. Not at all: both groups have a trade-off frontier, it just looks a bit different, with Labout voters valuing the poverty reductions a bit higher relative to the tax rates than Conservative voters do. But both groups have an area of possible acceptability of income tax rises, and these areas overlap. Ten percentage points on income tax to halve poverty, for example, would be acceptable even to Conservative voters, and therefore a fortiori to other groups.

Perhaps these results are not surprising. We already know that there is strong support for a social safety net, and that people care about the outcomes of the worst off. Our findings just show people accept that it has to be paid for. So really the pressing question is: how have politicians come to believe that tax rises are completely politically impossible in contemporary Britain, when this and other research suggests that this is not the case? For example, a review in The Guardian of Daniel Chandler’s recent book Free and Equal, which proposes a moderate Universal Basic Income and tax rises to find it, basically said: nice idea, but who’s going to vote for that in the Red Wall? (The Red Wall refers to electoral districts in the Midlands and North of England thought of as something of a bellwether. ). Yet, both our present study and our previous research in the Red Wall give the same answer: most people. Chandler’s proposals are exactly in the zone that commands broad assent in the Red Wall, and even amongst people who have recently voted Conservative.

Without wanting to go too dark on you, I have to remind you of the evidence that the opinions of the average voter don’t actually matter very much in politics as it stands (at least in the USA). What parties propose is influenced by the views of the rich and by organized business, and pretty much unresponsive to the views of everyone else. Interestingly, this narrow sectional interest gets mythologised and re-presented as ‘the views of the person in the street’; but this is mainly a kind of ‘proletariat-washing’. A small group of people who have a lot of power and influence don’t want to reduce poverty by raising taxes. The Labour party, by choosing not to propose doing so, is courting this group. What gets passed as wooing the public is really wooing the elite. They might have judged, and perhaps rightly, that wooing this group successfully is necessary to win power, but let’s not confuse this with following public preference. The median British voter may well favour something much more transformational.

How can I explain this to you?

One of the big problems of the social and human sciences is the number of different kinds of explanations there are for what people do. We invoke a great range of things when we talk about why people do what they do: rational choice, conscious or unconscious motivations, meanings, norms, culture, values, social roles, social pressure, structural disadvantage…not to mention brains, hormones, genes, and evolution. Are these like the fundamental forces in physics? Or can some of them be unified with some of the others? Why are there so many? It is not even clear what the exhaustive list is; which elements on it could or should be rephrased in terms of the others; which ones we can eliminate, and which ones we really need.

It’s bad enough for those of us who do this for a living. What do the general public make of these different constructs? Which ones sound interchangeable to them and which seem importantly different? The explanation-types are sometimes grouped into some higher-order categories, such as biological vs. social. But how many of these higher groupings should there be, and what should be their membership?

In a recent paper, Karthik Panchanathan, Willem Frankenhuis and I how people understand different types of explanations; specifically, UK adults who were not professional researchers. We gave participants an explanation for why some people do something. For example, in a certain town, a large number of murders are committed every year. Researchers have ascertained that the explanation is….and then one of 12 explanations. Having done this, we then presented the participants with 11 other explanations and asked them: how similar is this new explanation for the behaviour to the one you already have? Thus, in an exploratory way, we were mapping out people’s representations of the extent to which an explanation is the same as or different from another.

The basic result is shown in figure 1. The closer two explanations are to one another on the figure, the more similar they were seen as being. We used a technique called cluster analysis to ask how many discrete groupings it is statistically optimal to divide the graph into. The answer was three (though it depends a bit on the parameter values used). There was one grouping (hormones, genes and evolution) that definitely stood apart from all the rest. These are obviously exemplars of what people have in mind when they speak of ‘biological explanations’. The remainder of the explanations was more of a lump, but when it did divide, it fell into one group that was more about things originating in the individual actor’s head (choice, motivation, meaning, psychological traits); and another that was more to do with the expectations, pressures, and obligations that come from the way the wider social group is structured (culture, social roles, social pressure, opportunity); in other words, forces that came into the actor from outside, from society.

Figure 1. Network representation of how similar participants viewed different explanations as being. A shorter distance between two explanations means they were viewed as more similar, a longer distance that they were viewed as more dissimilar. The key is: HORmones; GENes; EVOlution; a psychological TRAit; MOTivation; CHOice; MEAning; CULture; social ROLe; social PREssure; CHIldhood experience; and OPPortunity.

What we recovered, perhaps reassuringly, was a set of distinctions that is widely used in philosophy and social science. Our participants saw some explanations as biological, based on sub-personal processes that are not generally amenable to reflection or conscious volition. These were perceived as a different kind of thing from intentional psychological explanations, based on mental processes that the person might be said to have some voluntary say in or psychological awareness of, and be responsible for. These in turn were perceived as somewhat different from social-structural explanations, which are all about how the organisation and actions of a wider network of people (society) constrains, or at least strongly incentivises, individuals to act in certain ways. In other words, we found that our participants roughly saw explanations as falling into the domains of neuroscience; economics; or sociology.

So far, so good. However, it got a bit murkier when we investigated perceptions of compatibility. Philosophers have been keen to point out that although reductionist neuroscience explanations, intentional psychological explanations, and social-structural explanations are explanations of different styles and different levels, they are in principle compatible with one another. They will be, once we have polished off the small task of knowing everything about the world, completely inter-translatable. Every behavi0ur that has an intentional explanation has, in principle, a reductionist neurobiological explanation too. When you privilege one or the other, you are taking a different stance, not making a competing claim about what kind of entity the behaviour is (it’s a perspectival decision, not an ontological commitment). In other words, when you give a neuroscience explanation of a decision, and I give an intentional psychological one, it is not like a dispute between someone who says that Karl Popper was a human, and someone who says that Karl Popper was a horse. Both our accounts can be equally valid, just looking at the behaviour through a different lens.

In our study, we asked a different group of people how compatible all the different types of explanation were, where, we told participants that compatible means both explanations can be true at the same time. The degree of rated compatibility was almost perfectly predicted by how similar the explanations had been rated by the people in the first sample (figure 2). In other words, explanations, for our participants, can only be true at the same time to the extent that they are similar (a norm explanation and a culture explanation for the same thing can both be true; a norm explanation and a hormonal explanation cannot). This is not really normatively right. An explanation for a fatal car accident can be given in terms of the physics (such and such masses, such and such velocities, such and such forces), and also in terms of the intentional actions (the driver’s negligence, the pedestrian’s carelessness, the mechanic’s malevolence). These explanations would be quite dissimilar, but perfectly compatible.

Figure 2. The compatibility of two explanations (rated by one group of people) plotted against the similarity of those two explanations (rated by a separate group).

Our respondents’ incompatibilism, if it turns out to be typical of a wider group of people, could be problematic for science. No more so than in the case of ‘biological’ explanations for human behaviour. These being seen as the most dissimilar from intentional or social-structural explanations, they ended up being seen as rather incompatible with those others. In other words, if you say that XX’s violent outbursts are due to levels of a particular hormone, people perceive you as asserting that it must not be the case that XX is motivated by a genuine sense of moral anger; or that XX has been forced into their position by a lifetime of discrimination. Really, all three things could be simultaneously true, and could be important, but that may not be what people infer. Thus it seems worth stating – again and again, even if to you this feels obvious – that studying the neurobiological or evolutionary bases of something does not mean that the intentional level is irrelevant, or that social factors cannot explain how whatever it is came to be the case. We scientists usually see these different levels as all parts (more accurately, views) of the same puzzle; but certain audiences – many, perhaps – might see giving an explanation at a different level as more like claiming that the jigsaw puzzle is actually a chess set.

What is going on when researchers choose one kind of explanation rather than another? For example, what is at stake when we say ‘depression is a biological condition’? If what I have said about explanations being in-principle inter-translatable is true, then depression is a biological condition, but no more so than supporting Red Star FC, or having insufficient money to pay the rent, are biological. Depression is also a psychological condition, and also a social-structural one. Everything is an everything condition. In other words, ‘depression is a biological condition’ ought to assert precisely nothing at all, since explaining biologically is no more than a stance, a stance that can be taken about anything that happens to humans. The subset ‘biological’ conditions is the whole set, and perfectly overlapping with the set of psychological and social ones.

Yet, when people say ‘depression is biological’, they often seem to think they have asserted something, and indeed are taken to have done so. What is that thing?

When you choose to advance one type of explanation rather than the other, you haven’t logically ruled anything in or out, but you have created a different set of implicatures. You are making salient a particular way of potentially intervening on the world; and down-grading other possible ways of intervening. This comes from the basic pragmatics of human communication. Explanations, under the norms of human communication, should be not just true, but also relevant. In other words, when I explain an outcome to you, I should through my choice of words point you to the kind of things you could modify that would make a difference to that outcome. (Causal talk is all about identifying the things that could usefully make a difference to an outcome, not all of the things that contributed to its happening. When I explain why your house burned down on Wednesday, ‘there was oxygen in the atmosphere on Wednesday’ is a bad explanation, whereas ‘someone there was smoking on Wednesday’ is a good one. )

So, when I say ‘depression is a biological disorder’, I am taken to mean: if you want to do something about this depression thing, then it is biological interventions like drugs that you should be considering. And thus, by implication, depression is not something you can best deal with by talking, or providing social support, or increasing the minimum wage. Choosing an explanatory framing is, in effect, a way of seizing the commanding heights of a debate to make sure the search for remedies goes the way you favour. This is why Big Pharma spent so many millions over the years lobbying for psychiatric illnesses to be seen as ‘biological disorders’ and ‘diseases of the brain’ (all those findings and books about that you read in the 1980s and 1990s – they were basically Big Pharma communications, sometimes by proxy). This sets the stage for thinking more meds is the primary way of thinking about the suffering in society. We found some evidence consistent with this in our study: when we provided a ‘biological’ explanation for a behaviour, participants spontaneously inferred that it would be hard to change that behaviour, and that drug-style interventions were more likely to be the way to do it successfully.

The hostility of social scientists to ‘biological’ explanations is somewhat legendary (in fact, like a lot of legends, it’s common knowledge in some vague sense but a bit difficult to really pin down). When social scientists say ‘X [X being morality, literature, gender roles, or whatever] cannot be explained by mere biology!’, what they mean to say is not: ‘I deny that the creatures doing X are embodied biological creatures causally dependent on their nervous systems, arms, and feet to do it.’ What they are saying is something much more like: ‘I am worried that if you frame X in terms of biology, the debate will miss the important ways in which social-structural facts, or deliberate reasoning processes, have actually made the key difference to how X has come out.’ And perhaps: ‘I am particularly worried that couching X in biological terms will lead all kinds of people to assume that X must always be as it is, and could not be re-imagined in healthier ways.’ Hence, as sociologist Bernard Lahire recently put it: ‘to get too close to biology in the social sciences is to risk being accused of naturalising the current social world, of being conservative.’ In effect what social scientists are saying is not, the biology stuff is not true, but that it is not the most relevant stuff we could be talking about.

A very similar point applies to the argument between rational-choice approaches and social-structural ones. You know the old quip: economics is all about how people make choices, and sociology is all about how they have no choices to make. Essentially, critics of rational-choice economics are not saying: ‘I deny that X came about because lots of people did one thing rather than something else they could have done, and this is causally due to their agency and relative valuations of the various courses open to them’. They are saying something more like ‘I am worried that by focussing on the choice processes of the individuals involved, we will neglect the broader social configurations and institutions that are responsible for the fact that the options they had to choose between were all bad ones’; or, ‘I am particularly worried that by invoking the language of choice, the only interventions we will end up thinking about are information-giving and silly nudges, not reforming society so people have better opportunities in the first place’ (for a big debate on this topic, see here).

What to do? Fortunately, the implicature that a adopting biological framing means the appropriate level of intervention is pharmacological is a defeasible one (on defeasible and non-implicatures, see here). That is, it will be assumed to be true, unless the contrary is specified. You can, without contradiction, say: ‘depression is a biological condition, but it turns out that the best way to reduce its prevalence is to improve the social safety net, because it is brought on by poverty and isolation’. As it turns out, you not only can say this; you need to.

Treating causes, not symptoms

Yesterday saw the launch of our report ‘Treating causes not symptoms: Basic Income as a public health measure’. The report presents the highlights of a recently-ended research project funded by the National Institute for Health and Social Care Research. This has been an interdisciplinary endeavour, involving policy and political science folk, health economists, behavioural scientists, community organizations, and two think tanks.

The public has Manichean intuitions about health. On the one hand, people feel very strongly that an affluent society should compensate and protect its members against the spectre of ill health. This is particularly true in the UK with its strong tradition of socialized care. They will support spending large amounts of money to make good health inequalities. But when you suggest that the best way for society to make good health inequalities is by removing the poverty that lies upstream of them, people often baulk. You can’t do that, surely?

I think there are a few reasons for this reaction. One is that different lay models of causation govern the two domains. Ill health seems to be all about luck, the kind of luck society should insure us against. No one sets out to get ill. Poverty seems, perhaps, to be more about character, effort and intentional action, and hence is a domain where people generally feel that individuals should fend for themselves. If there were financial handouts, some people (it is feared) would set out to live off them; but no-one suggests that because the hospitals are free, people set out to become ill and live in them. In reality, the differences between the domains of health and wealth are not so clear: health outcomes reflect character and choices as well as luck and circumstances; and financial outcomes involve a lot of luck and structural barriers as well as effort. So, some difference in kind between the two domains is not a sufficient reason for limiting policy interventions to just one of them.

Another reason for resistance is that people assume that the cost of reducing poverty is so enormous that trying to intervene at that level is simply unfeasible. Clearing up the mess we might just be able to afford; but the price tag of avoiding the mess in the first place is astronomical. It’s not clear that this is right. If reducing poverty is expensive, not reducing it is really expensive too. Already, around 45% of UK government expenditure is on the National Health Service. That direct cost is so high because the population is so sick. As well as the illnesses that need treating, there is all the work that people cannot do due to ill health. A quarter of working-age adults in the UK have a long-standing illness or condition that affects their productivity. Many of these involve stress, depression and anxiety, conditions where the income gradient is particularly steep.

These considerations raise at least the theoretical possibility that if we reduced poverty directly – via cash transfers – there might not have to be a net increase in government spending. Yes, there would be an outlay. But on the other hand, health would improve; healthcare expenditures would go down; the cost of cleaning up other social pathologies like crimes of desperation would be reduced; people would be more productive; and hence tax takes would increase. And, as I have long argued, there would be a double dividend. As we reduced people’s exposure to the sources of ill health that they cannot control, they would spontaneously take more interest in looking after themselves in the domains they can control, because it would be more worth their whiles to do so. Eliminating poverty is an investment that might not just be affordable, but even profitable.

It’s all summed up in figure 1 (yes, you can tell I am becoming a real social scientist, I have a barely-legible diagram with lots of boxes and many arrows between them). Reducing poverty hits the social determinants of health. It’s cleaning the stream near its source. Downstream, the individual determinants of health are improved; downstream of that, there are better health outcomes; and downstream of that, all the social and economic benefits. Depending on the costs, and on the strengths of the various connections, there might be cash transfer systems that would pay for themselves.

Figure 1. Direct and indirect economic effects of basic income.

This is the possibility that the NIHR funded us to model. That they would do so gives you some indication of the health crisis times we are living in. The NIHR is a hard-headed funder whose mission is to get the best possible cost-benefit ratio for the UK healthcare pound. Even they – hardly utopian or politically radical by mission – can see that paying for ever-better sticking plasters might not be the only course worthy of serious consideration.

To chunk through the net consequences of a cash transfer scheme as per figure 1 involves a lot of estimates: estimating the effect of the scheme on the distribution of household incomes; estimating the effects of household income on physical and mental health; and estimating the effects of better physical and mental health on economic behaviour and tax revenues. Each of these steps is full of uncertainty of course. It’s been a privilege to work alongside my health economics colleagues who have made serious attempts to estimate these things, as best they can, based on data. There were some things we were not able to estimate and made no attempt to include in the models. For example, I suspect that making people’s lives more predictable, as you would with a basic income, has a positive health value above and beyond the actual amount of income you give them. This is not factored into the calculations. Neither is the likely reduction in crime, and hence in the fear of crime. Thus, if anything, I think our estimates of potential benefits of reducing poverty are pessimistic.

I urge you to have a look at the report to see whether you find the case compelling (and there are more detailed academic papers in the pipeline). We consider three scenarios: a small basic income of £75 a week for adults under 65, with the state pension for over 65s staying as it is now; and then a medium scheme (£185) and a generous scheme (£295). I will focus here on the small scheme since the results here, for me, indicate what a no-brainer this kind of action is. Our small scheme is already fiscally neutral, with just some small changes to tax, means-tested benefits and national insurance. In other words, this scheme would cost the government nothing even without factoring in the population health benefits. Yet, it would be redistributive, with 99% of the poorest decile of households increasing their incomes by more than 5%. And because the poorest households are the ones where there is most ill health, its benefits would be dramatic despite its modest size.

Our model suggests that the small basic income scheme could prevent or postpone 124,000 cases of depressive disorder per year, and 118,000 cases of physical ill-health. The total benefit to UK population health is estimated at 130,000 QALYs per year.  The QALY is a somewhat mysterious entity beloved of health economists. Very roughly, we can think of one QALY an additional year of perfect health for one person, or two extra years in a state of poorer health that they value only half as much. So, if 130,000 people, instead of dying, lived a year in perfect health, then 130,000 QALYs would be gained. That’s a lot. The department of health values a QALY at £30,000 for cost-benefit purposes. That is, if you want to be hard-headed, then it’s worth paying up to £30,000 to achieve an extra QALY of population health. That means it would be worth paying £3.9 billion a year for our basic income scheme, if it were to be evaluated as purely a health policy (imagine it, for example, as a drug, or a type of physical therapy). As I have already stressed, the scheme is fiscally neutral: it costs the government no more than the current system of taxes, allowances, and benefits does. The scheme is, arguably, a healthcare intervention worth £3.9 billion, available today at zero cost using only technologies that already exist. The predicted health benefits of the medium and generous schemes were much larger still; but of course, their upfront cost is larger too.

Naturally, there are many uncertainties in an exercise such as this. We took the observed associations between income and health as causal, assuming that if you boosted income, health would follow. This is an inference, and a contentious one. The way we made it – by looking at within-individual health changes when income declined or increased – is probably about the best way currently possible. But, its validity is something reasonable people could dispute. For me it brings home the serious need for proper trials of cash transfer policies, something we have written about elsewhere. Then the causal basis of the projections could be much stronger. Even accepting the limitations though, I think the case is hard to ignore. This project has made me feel more strongly than ever that there are better societies out there, in a galaxy not at all far from our own; and that we lack only rational and imaginative leaders to guide us there.

On poverty and addiction

Reading descriptions of the lives of people living in adverse economic conditions, something that will strike you over and over again is how often addiction comes up: to alcohol, to tobacco, to other drugs, or to behaviours such as gambling. There is addiction in all strata of society, but, from the novels of Zola to today, it seems specially prevalent where people have the least access to money and power. Is this really true, and, if so, how could we possibly explain it?

Epidemiological evidence confirms that it is really true. In the USA, the prevalence of smoking is about twice as high amongst those in routine/semi-routine occupations compared to managers and professionals. Smokers of all classes try to quit; managers and professionals are more likely to succeed. Addictive substances often show double dissociations with class: people with more money can afford to consume more of the substance, since they have more money; but people with less money are more likely to end up consuming to the point where it causes them life problems. So, for example, higher-SES young people in France consume more cannabis overall; but lower-SES young people are more likely to be frequent users. The double dissociation is particularly clear for alcohol. In many studies, it is people of higher SES who consume more alcohol on average, but people of lower SES who are most likely to die from the consequences of alcohol. As for behavioral addictions, the companies that run gambling machines know where to put them: in the areas of the highest economic deprivation.

Yes, addiction is related to poverty. But so are many other things. The existence of socioeconomic gradients is such a pervasive feature of affluent societies that it extends to almost everything you can measure. Is addiction more steeply related to income than other things? The only evidence for this that I have been able to find comes from the study of alcohol-related mortality. People of lower SES are more likely to die from the consequences of alcohol; but of course they are more likely to die tout court. Probst et al. meta-analysed studies that compared the SES difference in alcohol-attributable mortality to the SES difference in overall mortality in the same populations. They concluded that the high-low SES differences in alcohol-related mortality were typically 1.5-2 fold larger than the high-low SES differences in mortality overall.

Let’s assume that these gradients reflect something about poverty causing increased addictive behaviour (not of course an easy thing to demonstrate, but I’ll come back to that at the end). How can we explain why?

First, we need to characterise what kinds of substances and activities can create addiction. Jim Orford, in his wide-ranging book Power, Powerless and Addiction suggests that things can be addictive if they (a) have the capacity to produce a short-term boost in mood; (b) they can be consumed frequently in small chunks; and (c) they entrain processes that tend to increase their own consumption over time. If you focus on (a) and (b), the socioeconomic gradient seems very intelligible. In the flow of everyday experience, people facing greater economic adversity are often in worse mood (there is abundant evidence that this is true, and there are good reasons for it); and, plausibly, they have less access to alternative mood-boosting inputs that come with affluence and high status.

There are other bodies of thought we can draw on to fill out this idea. There is the ‘rational addiction’ tradition that comes from economics. The essence of this idea is that people might, under some circumstances, choose to consume an addictive substance, even in the full knowledge that this will lead to future dependence. They will do so when it maximises their long-term utility; in other words when the value they place on all the mood boosts they will get outweighs the disutility of the present and future costs of use. The literature on rational addiction has got a bit bogged down in some rather inside-baseball issues, such as whether people reduce consumption in response to price rises that have been announced but not implemented yet. This is an important test because it establishes whether they are considering future consumption, not just present consumption, in their decisions to consume; but it distracts from the more general insights the rational addiction model might provide.

As often with rational actor models, the rational addiction model seems like a kind of useful fiction. On one hand it is obviously false. People usually don’t make those computations, certainly not explicitly. Plus, the rational addiction model in its original form cannot account for the fact that people constantly try to quit, often without success; or that they spend money on having other people force them to stop them from consuming . To explain these phenomena, you need to add something call time-inconsistent preferences, namely that what I value happening at time point t flips as t approaches the present. On the other hand, the rational addiction model is many-fold better than unhelpful and non-explanatory appeals to ‘lack of self-control’ or ‘the culture of poverty’. It sees people who consume as full and normal agents, albeit agents constrained by the option sets available to them. Those option sets are often not great. In poverty, either the marginal benefits of addictive consumption might be higher (because your mood is often worse and this makes it boosting it more valuable), or the opportunity costs of addictive consumption are lower (for example, the job you could lose is awful anyway, or there is no prospect of ever converting the cigarette money into owning a house).

A related and useful literature is that on pain and analgesia. Addictive substances tend to be analgesic: they reduce pain. Much of the drug addiction in the USA and other developed countries involves opioid drugs. These are so effective at pain relief that they have long been used in surgery. Indeed, it was their approved medical use that lies at the root of the iatrogenic addiction crisis. What is less well known is that alcohol, nicotine and cannabis all have fairly well-studied analgesic effects. It is not a metaphor when people say they drink or smoke to ease the pain.

Pain is socioeconomically distributed. There’s evidence of socioeconomic gradients in severe pain in Austria, and dental pain in the UK. Physical pain and emotional pain are on the same continuum (anti-inflammatories like ibuprofen reduce depressive symptoms after all), and I wager that emotional pain shows at least as much of a gradient as physical pain does, probably more. Studies conclude that the socio-economic gradient of pain is currently unexplained; but perhaps, in fact, its explanation is all too obvious. Pain is the unpleasant experience associated with the appraisal that you are being damaged. The ability to feel pain is there for a reason. If you can get out of the painful situation, you will. But if you have no alternative but to go on being damaged, then self-medication looks like the next best thing.

If poverty causes pain and low mood, then really there is no mystery to the fact that people in poverty rely more heavily on mood-boosters. Property (c) of addictive substances – they catalyse their own use – is still a problem. But you can see why it is easier to start and more difficult to stop if you face poverty and adversity. This leads to the simple prediction that increasing people’s incomes will reduce their consumption of analgesic-addictives.

What I love about this prediction is how counter it runs to most people’s intuitions. When Nicaragua introduced an direct cash transfer programme, a senior official predicted that “husbands [would be] waiting for wives to return in order to take the money and spend it on alcohol“. A brake on the introduction of cash transfers in Kenya was “the widespread belief that cash transfers would either be abused or misdirected in alcohol consumption“. Does the evidence back up these intuitions?

Reader, it does not. For the World Bank, David Evans and Anna Popova reviewed all the studies that they could find looking at the impact of a change in income on either consumption of or expenditure on alcohol and tobacco. They concluded that “almost without exception, studies find either no significant impact or a significant negative impact of [cash] transfers.” Restricting the analysis to the 17 estimates that came from randomized control trials, 14 went in the negative direction, and the 3 in the other direction were small. It’s worth thinking about how strong a test this is. Some people, who were generally in low and middle income countries and had strong financial constraints, were suddenly given higher incomes to spend. It’s not just that they did not use it to increase their expenditure on these addictive goods. They more often than not decreased it. Doing social science, I have heard it said, is the search for the small set of things that is both surprising and true. I would add, surprising, true, and makes a difference. It would be good if this were one of that set.

Innateness is for animals

Innate or acquired? Genes or culture? Nature or nurture? Biological or psychological? People are inveterately fond of trying to divide human capacities into two sorts. Commentators often seem to think that determining which capacity goes in which box is the main preoccupation of the evolutionary human sciences. (And because there is ‘evolutionary’ in the name, they think the evolutionary human sciences must be about claiming capacities for the innate/genes/nature side that the social sciences had wanted to put in acquired/culture/nurture; not really.)

In fact, innate/acquired, nature/nurture sorting is not something most of us are especially interested in. Our main hustle is that it is always both, rendering the distinction (at least as applied to mature adult capacities) somewhere between arcane and unhelpful. If it’s acquired, it’s because there are innate resources that make this possible; if it’s culture, it’s because the human genome enables this possibility, and so on. We are not interested in sorting, but in figuring out how and why things actually work. To butcher the famous exchange from The African Queen: the nature/nurture distinction, Mr. Allnut, is what we are put in this world to rise above.

But still, the widespread desire to sort capacities into two kinds persists. Why? Philosophers who have examined the problem agree that the innate/acquired dichotomy, and its companion nature/nurture, are folk or lay concepts: distinctions didn’t originally arise from formal scientific enquiry, and lack clear definitions in most people’s minds. Many but not all scientific constructs begin life as folk concepts: ‘water’ did, for example, but ‘the Higgs boson’ did not. Folk concepts can go on to give rise to useful scientific concepts. There is genuine debate in philosophy about whether a useful scientific concept of innateness can be constructed, and if so what it should be (see e.g. here and here). But regardless of how this debate is resolved, we can ask where the folk concept of innateness comes from and how people use it.

In a new paper, I argue that the folk concept of innateness is made for animals. More exactly, we have a specialized, early-developing way of thinking about animals (this way of thinking is sometimes known as intuitive biology). The folk concept of innateness comes as part of its workings. When we think about animals, we are typically concerned to rapidly learn and make available the answers to a few highly pertinent questions. First, what kind is it? Second, what’s the main generalization I need to know about that kind (will it eat me, for example, or can I eat it)? The cognition that we develop to deliver these functions is built for speed, not subtlety. It assumes that all the members of the kind are for important purposes the same (if one tiger’s dangerous, they all are), and that their key properties come straight out of some inner essence not modifiable by circumstance (doesn’t matter if you raise a tiger with lambs, it’s going to try to eat them sooner or later). When people (informally) describe a capacity as ‘innate’, part of our ‘nature’ and so on, what they mean is just this: the capacity is typical (it’s not just the one individual that has it, but the whole of their kind), and fixed (that capacity is not modifiable by circumstance). In other words, they think about that capacity the way they think about the capacities of animals.

Unfortunately, animals are not really like this. In fact, in animal species, individuals are different from one another, and far from interchangeable. This is so counter most people’s perceived experience that Darwin had to spend dozens of pages in the first part of On the Origin of Species convincing the reader that it was the case, since variation was so crucial to how his idea of natural selection worked. Moreover, animal behaviour is actually very strategic and flexible: it could well be that by raising your tiger differently, you end up with a very differently-behaving beast. But, intuitive biology is not there to make us good zoologists. It’s there to make us eat edible things and not get eaten by inedible ones.

The idea that the folk concept of innateness is part of intuitive biology is not new. All my paper does is to test some obvious predictions arising from it. Working with UK-based non-academic volunteers, I found that how ‘innate’ people reckon a capacity is in humans is almost perfectly predicted by the extent to which they think other animals have it too (figure 1A). If you present people with the same capacity possessed either by an animal or a human, they think it is more likely to be innate in the animal case (with a huge effect size; figure 1B). And, even, if you tell people about an alien creature and tell them that one of its capacities is innate, they imagine that alien as less human-like than if you tell them that it had to learn its capacity, or tell them nothing at all (figure 1C). So, there is a special connection between ‘X being an animal’ and ‘X’s capacities seeming ‘innate’’.

Figure 1. Some results from my studies. A. People think a capacity is innate in humans to the extent they also think it is present in other animals. B. People think the same capacity is more likely to be innate when it is found in an animal than a human. C. People think an alien is less human-like if they are told that one of its capacities is innate than if not told this.

If innateness is for animals, then we should intuitively think the capacities of humans are not innate. Indeed, several studies have shown that lay people have this prior (here and here). This is because our dominant mode for thinking about people is quite different from our dominant mode for thinking about animals. With other people, we are generally trying to manage some kind of ongoing, individual-to-individual dynamic relationship, for example of collaboration or competition. To be able to do this, you need to track individual persons, not kinds, and track what they currently know, believe, possess or are constrained by, not rely on a few context-free generalities. In other words, when we think about people (for which we use intuitive psychology), we naturally incline to thinking about what is idiosyncratic, thoughtful and contingent. Whereas for animals we pay insufficient spontaneous attention to their uniqueness and context, for humans we only pay attention to that. This sense of the idiosyncratic, the thoughtful and the contingent is what people seem to mean when they talk informally about behaviours being not innate, not in the genes, not biological and so on.

However, my participants readily assented that some capacities of humans were innate, capacities like basic sensing, moving, circadian rhythms, and homeostatic drives like hunger and thirst. These are the things about humans that you can still think about using intuitive biology: the capacities of humans qua animals. They are not the things that affect the depth of a friendship or the bitterness of a dispute; the things about people qua social agents. We tend to view other people as dual-aspect beings, having basic, embodied animal features, and complex, idiosyncratic person features; we think about these, respectively, with intuitive biology and intuitive psychology. We kind of know that these are two aspects of the same entity, but the link between the two aspects can go a bit screwy sometimes, leading to beliefs in dualism, ethereal agents, souls that leave bodies, and other shenanigans. What is often odd and jangling for people is when the language of animal bodies (genes, evolution and so on) is used in explanations for the capacities of individual people as social agents (their knowledge, decisions, and morality). That feels like it can’t be right.

This is rather a problem for researchers like me, who believe that our embodied natures and our capacities as social agents have rather a lot to do with one another (indeed, are descriptions of the same thing). If you talk about an evolved, innate or biological basis to human moral and social capacities, your audience may take you to be saying something quite different from what you intend. Specifically, you make be taken as wanting to reduce humans to beasts; to deny the critical influence of context; or to argue that human social systems must always come out the same. None of these actually follows from saying that a capacity has an evolved, innate or biological basis. It’s the folk concepts bleeding through into the scientific debate. And folk concepts, Mr. Allnut, are what we are here to rise above.  

Diamond open access journals in psychology and adjacent areas

Here is a list of diamond open access journals that may be of interest if you are publishing in psychology and adjacent fields. Diamond open access journals provide academic publishing services for free both to the reader (no paywall), and the author (no article processing fee). By switching to diamond open access journals, researchers could greatly reduce the costs of publishing to the sector, freeing up funds for jobs and grants (more on diamond open access and the arguments for it here).

The first table consists of journals that specifically include psychology or cognitive science in their mission statement. The second consists of journals in related but relevant fields. I have indicated where journals are indexed on Scopus, since findability is often an important motivation for authors. Many of the journals are indexed in other places too. Where possible I have indicated something about financial and institutional backing. A major contributor in the psychology diamond open access space is the Leibniz Institute for Psychology (ZPID), which funds the Psychology Open platform as part of its open science mission. There are some other journals on that platform I have not put into the table, mainly because their missions were more specialized.

I would like to dynamically update the list, and also extend it to a broader range of subject areas. Please email me if you have suggestions. Thanks to Shelina Vishram I have already added a number of health-related journals to table 2.

Creating these tables, I am struck just how wide our range of diamond options already is, and how interesting and engaged a lot of the mission statements are.

Open Mind“covers the broad array of content areas within cognitive science, using approaches from cognitive psychology, computer science and mathematical psychology, cognitive neuroscience and neuropsychology, comparative psychology and behavioral anthropology, decision sciences, and theoretical and experimental linguistics.”Currently supported by MIT Press and MIT libraries. Since 2017. Indexed on Scopus.
Europe’s Journal of Psychology“publishing original studies, research, critical contributions, interviews and book reviews written by and intended for psychologists worldwide…a generalist and eclectic approach”. Recent editorial statement here. ZPIDSince 2005. Indexed on Scopus. Accepts registered reports.
Journal of Social and Political Psychology“publishes articles that substantially engage with and advance the understanding of social or political problems, their reduction, and the promotion of social justice, from social or political psychological perspectives”. More here. ZPIDSince 2013. Indexed on Scopus.
Social Psychological Bulletin“original empirical research, theoretical review papers, scientific debates, and methodological contributions in the field of basic and applied social psychology. SPB actively promotes…open science…. integrative approach to social psychological science and is committed to discussing timely social issues.” ZPIDBefore 2018 was Psychologia Społeczna. Indexed on Scopus.
Interpersona: An International Journal on Personal RelationshipsResearch “on all kinds of human relationships, from weak ties to close relationships, and their relations with society and culture”. Interdisciplinary approach spans psychology, sociology, anthropology, biology, health etc. See here. ZPID on behalf of Grupo de estudos em Avaliação, Terapia e EmoçõesSince 2007. Indexed on Scopus.
Global Environmental Psychology“Theoretical and applied work on the relationship between people and their environment with a psychological emphasis”. More here. ZPID with International Association of People-Environment Studies and Deutsche Gesellschaft für PsychologieBrand new journal, open for submissions.
Clinical Psychology in Europe“We aim to publish contributions that reflect the current developments in clinical psychology, this can include stimulating papers that help towards the developments of clinical psychology research and interventions as well as advancements in diagnostics, classification, developments in treatments and improving outcomes. “ZPID with European Association of Clinical Psychology and Psychological TreatmentSince 2019. Indexed on Scopus and Pubmed.
Personality Science“Premier outlet for insights on personality and individual differences – cutting across traditional disciplinary boundaries….to unify the nascent field of a personality-centered science by bringing together work from different disciplines and perspectives even outside of psychology.”ZPID. Official journal of the European Association of Personality Science.New in 2020.
Methodology: European Journal of Research Methods for the Behavioral and Social Sciences.  “A platform for interdisciplinary exchange of methodological research and applications in [psychology, sociology, economics, pol. sci. etc.], including new methodological approaches, review articles, software information, and instructional papers that can be used in teaching. Three main disciplines are covered: data analysis, research methodology, and psychometrics.”ZPID. Official organ of the European Association of Methodology. Coalesced in 2020 from Metodologia de las Ciencias del Comportamiento and Methods of Psychological Research-Online. Indexed in Scopus.
Measurement Instruments for the Social Sciences “publishes high-quality, open access measurement instruments intended for scientific use across various disciplines (e.g., sociology, psychology, education, political science, economics etc.).. advances social science measurement and methodology also through systematic reviews, test reviews, meeting reports, and best practice approaches.”ZPIDSince 2019.
Table 1: Some diamond open access publishing options in psychology.

Biolinguistics“A peer-reviewed journal exploring (theoretical) linguistics that takes the biological foundations of human language seriously.”ZPIDSince 2007. Indexed on Scopus
Dialectica“A general analytic philosophy journal and the official organ of the European Society of Analytic Philosophy.”Swiss Academy of Humanities and Social Sciences inter aliaHas existed since 1947. Became diamond in 2023. Indexed on Scopus.
Peer Community JournalJournal allied to the constellation of ‘Peer Communities in….’ (PCIs). The PCIs do the peer reviewing and the Peer Community Journal will then publish the version of record. Currently there are PCIs in Neuroscience, Evolutionary Biology, Health and Movement Science, Ecology, Mathematical and Computational Biology, Network Science, Organizational Studies, and others. The list will grow in future. 150 supporting organisations including the CNRS, INRAE, INSERM and other major French funders, plus many universities in several countries.If your paper is reviewed by a PCI you can still send it to a different journal if you wish. Many other journals will also accept the PCI’s reviewing process. See here.
Asian Journal of Social Health and Behavior“empirical and theoretical contributions studies related to mental health and addiction, social support, socioeconomic inequality, behavior change techniques, health policy and clinical practice”Social Determinants of Health Research Center, Qazvin, Iran. Indexed on Scopus. Formerly Social Health and Behavior.
Health Behavior Research“dedicated to the translation of research to advance policy, program planning, and/or practice relevant to behavior change.”Journal of the American Academy of Health Behavior. Since 2017.
European Journal of Health Communication“open access journal for high-quality health communication research with relevance to Europe or specific European countries”Universities of Zurich and Amsterdam.Since 2020.
JAMK Journal of Health and Social Studies“a forum for original research and scholarship in the field of health care and social studies about health care delivery, care management, organization, labor force, policy, decision making along with research methods relevant to nurses, social workers and other related professionals”JAMK University of Applied Sciences (Finland)You can write your paper in Finnish (but you don’t have to). Since 2018.
Public Health Research and Practice“publishes innovative, high-quality papers that inform public health policy and practice, paying particular attention to innovations, data and perspectives from policy and practice”Sax Institute (an Australian public health not-for-profit)Indexed on Scopus. Continues the established New South Wales Public Health Bulletin
Table 2: Some diamond open access publishing options in fields adjacent to psychology.

The political economy of scientific publishing, and the promise of diamond open access

The claim that scientific publishing is broken is not even surprising any more. There are a number of different problems. Some of these are epistemic: a large number of bad or totally meaningless articles is published every year, diluting the credibility of science; undue weight is given to sexy claims in a small number of shiny journals, whose articles are disproportionately likely to be discovered to be misleading or even fraudulent; negative results still often go unpublished, and so on. Some of the big problems are instead economic, and that’s what I want to talk about here.

Scientific publishing is hugely concentrated in the hands of a few big corporations: Wiley-Blackwell, Springer Nature, Elsevier, SAGE and Taylor & Francis have been estimated to control 75% of the market. More importantly, the profit margins of these companies are really large: estimates vary, but 30-50% for the major corporations in the sector is is typical. This gives them vastly higher profit margins than Apple, Google, Coca-Cola etc. Your local supermarket chain probably has a profit margin of a few percentage points at most.

One estimate of the size of the academic publishing industry I found is $19 billion per year. Applying a 40% profit proportion to this, we can say in a back-of-an-envelope way that international academia is giving away about $7.5 billion a year to a few large corporations, not to cover the costs of the services those corporations provide, but as rent. And you know how it is: $7.5 billion here, $7.5 billion there, and soon you can be talking real money. It is of the same order as the expenditure of the US National Science Foundation for 2020 ($8.3 billion). Europe’s premier research funder, the ERC, only gives out €2.2 billion in grants per year. One careful estimate was that, of the money spent on the social sciences in Austria, about a fourth went to servicing the publishing industry. And of course, this is not really our money: it is money from governments, effectively a public subsidy for a public good, a big chunk of which is going unproductively into corporate coffers. Governments could be getting billions of dollars more research for their money, and we could be giving more jobs to our students and post-docs; or they could get the same amount of research for a smaller subsidy and build more bike lanes.

Corporate profit was not always involved in the dissemination of academic knowledge, nor is it necessary that it be so. For those of you that do not know the history, the idea that academic communication belonged to corporations, and could be monetised for profit, was substantially due to Czech-British tycoon Robert Maxwell. He paved the way – in a context of rapid growth of academia in the second half of the twentieth century- by wooing academics, creating journals and charging subscriptions to access them (the story is told here). Fast forward, and funders and authors rightly began to baulk at publicly funded epistemic labour being sequestered behind paywalls. The resulting ‘open access’ movement has helped with the problem of reader access by bringing down the paywalls, but it has not helped with the problem of corporate rents. The big publishers still own the titles. They now charge ‘article processing fees’ to authors, on average a couple of thousand dollars, and protect their profit margins that way.

In an efficiently functioning market, these rates of profit could not exist. Why? Because other corporations should enter the market, offering the same service for a lower price and just a 30% profit rate (still not bad!), and capture the market; then still others should come in with a lower price still and just a 20% profit rate….and so on, until the rate of profit is delta: the minimal amount of profit that gets a capitalist out of bed in the morning. Academics would then be getting their publishing services for just fractionally above what it really costs to produce them. That’s how markets are supposed to work, and why certain people are so keen on them. But it is certainly not working here. Why?

Academics, when they submit a paper for publication, are not just looking for a company that will make a nice PDF and put it online on a reliable server. They could do that themselves. They are paying for someone to credential it: give it a status of being epistemically reliable, important, and generally worth reading. The irony of academic publishing is that the actual credentialling labour – the careful reading by an editor, the independent peer reviewing, the accumulated culture of good practice in the field – is not provided or even paid for by the profitable publishing corporation. It is done by other academics who do it without payment, or more precisely using the time already paid for by their university and government employers. The fruit of this labour is then privatised and sold by the publishing corporations we have become entangled with. The part of the value of publishing an article that it is not given for free by academic volunteers might be as low as a couple of hundred euros.

Because the credentialing value of a journal depends heavily on its established reputation in the field, authors are not elastic to price: they will not readily substitute a newer, cheaper journal for an existing over-priced one. When people are not elastic to price, markets fail, and socially efficient solutions are not found. Authors are, effectively, conservative and myopic: they care about getting this paper published and looking as good, from a credentialling perspective, as possible; this means they are doomed to stick with the established journals even at massive personal and collective cost (and, for the most prestigious credentialing by the shiniest journals, the article processing charge may be several-fold higher than the average). The collective consequences for our sector when we are all individually myopic in this way are not something we have thought about enough.

What we face is, exactly, the famous tragedy of the commons. If I get one paper into a reputed but overpriced journal in my field, all of the reputational and career benefit of this accrues to me privately (including to my students and co-authors: we are often doing this out of quite admirable concern for those close to us). However, the cost – that academia collectively continues to waste millions of dollars paying more for its publishing services than it needs to – is evenly distributed across the whole of the sector. For example, if spend some funds from a French funder on article processing charges, that funder can give out fewer grants. But that loss is distributed across all the researchers in France, whereas the reputational benefit of my publication flows wholly to me. So we end up with a situation where the commons – the public pot of research funding – is overgrazed, and the publishing corporations are getting fat.

The good news is that although common good resources are always vulnerable, tragedies are not actually inevitable. Elinor Ostrom received a Nobel memorial prize, indeed, for showing that, in fact, communities throughout the world do find various ways of making their commons sustainable (and, as she is famous for saying, if it works in practice, you can make it work in theory). We should be able to do likewise. It would be quite logical (helpful, even) for our funders and employers simply not to allow us to use their money or time on activities that create profit for publishing corporations. My employer, the CNRS, has made clear that whilst it wants us to publish our work open access, it does not want us to pay article processing fees (instead, it, along with other funders in Europe, wants to support not-for-profit journals directly). I hope other employers will follow suit, and even take a stronger line on this. One of the things about tragedies of the commons is that you do sometimes need to impose some constraints on individual behaviour, for the sake of the public good (and to make those constraints common knowledge in the form of norms and rules). This is obvious, since it is exactly the uncoordinated myopic activities of individuals that produces the tragedy in the first place.

The big question, of course, is: if not the status quo, then what? One of the issues is that there are multiple ideas in circulation about how to reform scientific publication, whereas we need to coordinate on one. Perhaps eventually we can abandon the plethora of different journals altogether, in favour of a single archive doing the combined job of preprint server, peer review forum, and publication venue of record. There are a few contenders for this already in operation, notably Peer Community In, which started out with a focus in ecology and biology, but can expand organically into any area. Others are the European Union’s Open Research Europe and the Wellcome Trust’s Wellcome Open Research. These latter two are restricted, for now, to reports of research that was funded by the respective organisations. However, whilst waiting for such a universal system to emerge, something we can all do rapidly, with minimal change to our working practices, is to support diamond open access journals as a first resort.

A diamond open access journal is a journal that is free to read, and does not charge authors any article processing fees. Where then is the money coming from, you might reasonably ask. The answer varies from journal to journal, but the actual costs, if much of the editorial work and peer reviewing is done by academics for free, are pretty modest, often amounting to editorial assistance and server space, plus the use of one of the open-source or not-for-profit content management platforms that already exist. Some funders (the CNRS included) figure that they can simply fund or co-fund these journals directly, and still save money compared to giving their people money to pay article processing fees to the big corporations. Other bodies support diamond journals just as a contribution to the common intellectual good. In the psychological sciences, quite a few of these journals exist already, are widely indexed, and are ready for us to use. I have published a separate post listing the main ones I know about and where to find them. If we can transition to using diamond open access journals over the next five years, we will free up an enormous saving for our funders, a saving that they will hopefully plough back in to jobs for young researchers, among other things. And, I would add, it would free us from the epistemic and other distortions our entanglement with for-profit publication has brought about too.

Something you might well be saying, and I would understand, is: that sounds very high-minded, but my student/post-doc/collaborator needs a job, and it will make a huge difference to the way they are evaluated to get their paper in X or Y traditional high-esteem journal. I (or they) just can’t afford to take the broader view. I absolutely recognise it from my own life. I would however make three points.

The first is that at least we could at least have the conversation every time we publish a paper. There are indeed times – perhaps many times, at first – where we will conclude that there are strong grounds for going with this or that traditional journal. Certainly, this is still happening in my life. But it need not be every time; sometimes all your co-authors have tenure or have already achieved visibility in that space. Maybe we can allow ourselves a certain number of dirty submissions a year, but go diamond for the rest. Corporate publishing sobriety should at least be part of the discussion, alongside other factors. Interestingly, during all my years in academia, with many dozens of co-authors, these kinds of ethical and political issues have almost never cropped up in discussions of publication strategy. It feels like we ought to try to change that.

Second, although the CV advantage of a prestigious or established journal is a real thing, it’s maybe not that big, in most cases. Sure, a paper in Nature, Science or PNAS is going to make people look twice; but do evaluators really notice or care about the difference between Cognition (corporate) and Open Mind (diamond), between Social Psychological and Personality Science (corporate) and the Journal of Social and Political Psychology (diamond)? The magnitude of the reputational difference is probably small–and might not even be in the direction you think.

Which leads me to my final point: the way people are evaluated is itself constantly evolving. The journals that are well-reputed now are not those that were well-reputed a decade ago, partly because of the choices we made. Without going all Anthony Giddens on you, we structure the field of credential value as well as being structured by it. A decade ago, people didn’t want to pre-register or share their data as they thought it could disadvantage them. What a weird idea that seems now, when it is common knowledge that good open science practices are one of the first things employers look for. Likewise, there could soon be a job market and professional premium for having adopted a thoughtful politique de la publication. We could help accelerate this cultural change. Senior people could go out of their way to support newer diamond initiatives, either with their submissions or their editorial activities.

There is always a risk, when you publish in a newer or unknown journal, that people think your paper is there because it was rejected by the better known ones. Something we could do, on our websites and in our talks, is to point out positive reasons why we choose this outlet (‘Journal of first choice’; ‘we chose this journal because it provides fee-free access to both readers and authors’, ‘we chose this journal because it is a non-profit that keeps resources within the scientific community’, ‘We chose it because of its commitment to open science and epistemology’, etc.). As we become more socially aware about the consequences of our publishing behaviour, it could be good to justify our our publication decisions in the way that we have become used to justifying sample sizes.

Companion post: list of diamond open access journals in psychology.

Bayes Factor blues, and an unfashionable defence of p-values

Like many researchers, I have been trying to up my inferential game recently. This has involved, for many projects, abandoning the frequentist Null Hypothesis Significance Testing (NHST) framework, with its familiar p-values, in favour of information-thereotic model selection, and, more recently, Bayesian inference.

Until last year, I had been holding out with NHST and p-values specifically for experimental projects, although I had mostly abandoned it for epidemiological ones. Why the difference? Well, in the first place, in an experiment or randomized control trial, but usually not in an epidemiological study, the null hypothesis of no effect is actually meaningful. It really is something you are really interested in the truth of. Most possible experimental interventions are causally inert (you know, if you wear a blue hat you don’t run faster; if you drink grenadine syrup you don’t become better at speaking Hungarian; and if you sing the Habanera from Bizet’s Carmen in your fields, as far as I am aware, your beetroots don’t grow any faster). So, although the things we try out in experiments are not usually quite as daft as this, most interventions can reasonably be expected to have zero effect. This is because most things you could possibly do – indeed, most of the actions in the universe – will be causally inert with respect to the specific outcome of interest.

In an experiment, a primary thing we want to know is does my intervention belong to the small class of things has a substantial effect on this outcome, or, more likely, does it belong to the limitless class of things that make no real difference. We actually care about whether the null hypothesis is true. And the null hypothesis really is that the effect is zero, rather than just small–precisely zero in expectation – because assignment to experimental conditions is random. Because the causally inert class is very large whereas the set of things with some causal effect is very small, it makes sense for the null hypothesis of no effect to be our starting point. Only once we can exclude it – and here comes the p-value – do other questions such as how big our effect is, what direction, whether it is bigger than those of other available interventions, what mediates it, and so on, become focal.

So, I was pretty happy with NHST uniquely for the case of simple, fully designed experiments where a null of no effect was relevant and plausible, even if this approach was not optimal elsewhere.

However, in a few recent experiment projects (e.g. here), I turned to Bayesian models with the Bayes Factor taken as the central criterion for which hypothesis (null or non-null) the data support .(For those of you not familiar with the Bayes Factor, there are many good introductions on the web). The Bayes Factor has a number of appealing features as a replacement for the p-value as a measure of evidential strength (discussed here for example). First, it is (purportedly, see below) a continuous measure of evidential strength – as the evidence for your effect gets stronger, it gets bigger and bigger. Second, it can also provide evidence for the null hypothesis of no effect. That is, a Bayes Factor analysis can in principle tell you the difference between your data being inconclusive with respect to the experimental hypothesis, and your data telling you that the null hypothesis is really supported. The p-value cannot do this: a non-significant p-value is, by itself, mute on whether the null is true, or the null is false but you don’t have enough data to be confident this is the case.

Finally, perhaps the most appealing feature of the Bayes Factor is that you can continuously monitor the strength of evidence as the data come in, without inflating your false positive rate. Thus, instead of wastefully testing to some arbitrary number of participants, you can let the data tell you when they decisively support one hypothesis or the other, or when more information is still needed.

All of these features were useful, and off I and my collaborators set. However, as I have learned the hard way, it gets more sketchy behind the facade. First, the Bayes Factor is terribly sensitive to the priors chosen for the effect of interest (even if you choose the widely used and simple Savage-Dickey Density Ratio). And often, there are multiple non-stupid choices of prior. With modest amounts of data, these choices can give you Bayes Factors not just of different magnitudes, but actually in different directions.

And then, as I recently discovered, it gets worse than this. For modest samples and small effects (which, realistically, is the world we live in), Bayes Factors have some ugly habits. As the amount of data increases in the presence of a true experimental effect, the Bayes Factor can swing from inconclusive to rather decidely supporting the null hypothesis, before growing out of this phase and deciding that the experimental hypothesis is in fact true. From the paper it is a bit unclear whether this wayward swing is often numerically large enough to matter in practice. But, in principle, it would be easy to stop testing in this adolescent null phase, assuming that you have no experimental effect. If substantive, this effect would undermine one of the key attractions of using the Bayes Factor in the first place.

Disorderly behaviour of the Bayes Factor as sample size increases in the presence of a true effect. Shown is the log Bayes Factor for the null (i.e., higher is more null). From this paper by Leendert Huisman.

What to do? Better statisticians than me will doubtless have views. I will however make a couple of points.

The first is that doing Bayesian inference and using Bayes Factors are very much not the same thing. Indeed, the Bayes Factor is not really an orthodox part of the Bayesian armamentarium. People who are Bayesians dans leur ames don’t discuss or use the Bayes Factor at all, They may for all I know even regard them as degenerate or decadent. They estimate parameters and characterise their uncertainty around those estimates. The sudden popularity of Bayes Factors represents experimentalists doing NHST and wanting a Bayesian equivalent of the familiar ‘test of whether my treatment did anything’. As I mentioned at the start, in the context of the designed experiment, that seems like a reasonable thing to want.

Second, there are frequentist solutions to the shortcomings of reliance on the p-value. You can (and should) correct for multiple testing, and prespecify as few tests as possible. You can couple traditional significance tests on the null with equivalence testing. Equivalence testing asks the positive question–is my effect positively equivalent to zero–not just whether it is positively different from zero. The answers to the two questions are not mutually coupled: with an inconclusive amount of data, your effect is neither positively different from zero, nor positively equivalent to it. You just don’t know very well what it is. With NHST coupled to equivalence testing, you can hunt your effect down: has it gone to ground somewhere other than zero (yes/no); and has it gone to ground at zero (yes/no); or is it still at large somewhere? Equivalence testing should get a whole lot easier now with the availability of the updated TOSTer R package by Aaron Caldwell, building on the work of Daniel Lakens.

Equivalence testing, though, does force us to confront the interesting question of what it means for the null hypothesis to be true enough. This is a bit weaker than it being truly true, but not quite the same as it being truly false, if you take my meaning. For example, if your intervention increases your outcome measure by 0.1%, the null hypothesis of zero is not literally true; you have somehow done something. But, damnit, your mastery of the causal levers does not seem very impressive, and the logical basis or practical justification for choosing that particular intervention is probably not supported. So, in equivalence testing, you have to decide (and you should do so in advance) what the range of practical equivalence to zero is in your case – i.e. what kind of an impact are you going to identify as too small to support your causal or practical claim.

So that seems to leave the one holdout advantage of the Bayes Factor approach, the fact that you can continuously monitor the strength of evidence as the data come in, and hence decide when you have enough, without causing grave problems of multiple testing. I won’t say much about this, except that there are NHST methods for peeking at data without inflating false positives. They involve correcting your p-values, and they are not unduly restrictive.

So, if your quest is still for simple tests of whether your experimental treatment did something or nothing, the Bayes Factor does have some downsides, and you still have other options.

The paradoxes of relational mobility

In some societies, people perceive that others can desert their existing social relationships fairly easy, in favour of alternative partners. In other societies, people feel their social relationships are more permanent fixtures, never able to be abandoned. Let’s call these high-relational-mobility societies and low-relational-mobility societies respectively. It seems intuitive that people’s trust of one another will be greater in low-relational-mobility societies than in high-relational-mobility societies. Why? Well, in those societies, you have the same interaction partners for a long time; you can know that they aren’t just going to walk away when they get a better offer; they are in it for the long term, and so their time horizon is indefinite. Seems like a recipe for trust.

Interestingly, the empirical relationship is the other way around: where relational mobility is high, people have higher trust. Moreover, as shown in a recent paper by Sakura Arai, Leda Cosmides and John Tooby, individuals who perceive that others could walk away from them at any moment are actually more trustworthy, and less punitive. How can we explain this apparently paradoxical relationship?

The answer resembles classical arguments for the invisible hand in economics. In a market where buyers can shift vendors easily, there are many vendors, and it is easy for new vendors to enter, then I can be pretty confident that the price and service will be good. A vendor who took excessive profits, who downloaded costs onto the buyer, or was generally obnoxious would instantly and en masse be deserted. In a competitive and free market with low entry costs, I can trust that pretty much any partner I meet should treat me ok, merely from the fact of their existence.

Two allegories: high relational-mobility-societies are like Parisians believe restaurants in Paris to be: necessarily good, because there are so many restaurants in Paris and Parisian diners are so discerning that any restaurant that was not amazing and good value would have already ceased to exist. (By the way, from my own experience, I am extremely sceptical about this, not the cogency of the explanation, but the generalisation about Parisian restaurants that it is supposed to be an explanation of. I have however heard it from several independent sources.)

Second allegory: low relational mobility societies are more like the academic publishing market. We are stuck with a few massive actors (you know the ones), and our individual addictions to the status and prestige indicators they control means we, as a community, accept appalling behaviour – profit gouging, dubious editorial practices, frankly crap service to authors – rather than walking away.

In a world where the others in your social network have the option to fairly easily walk away, you have to treat those others pretty well (so that they won’t); and, they have to treat you pretty well (so that you won’t). Of course, if this meant that all relationships became transitory, ephemeral interactions, this might become pretty lonely and grim. That is not necessarily the case however: the experience of interpersonal intimacy is actually higher in high-relational-mobility societies. People value deep, durable and predictable relationships; relational mobility gives them the chance to cultivate those that suit them; and use the nuclear option to ensure a minimum acceptable threshold.

This also links to coercion and punishment. Social relationships inherently involve conflicts of interest. Thus, at some point in a social relationship, you always find yourself wanting someone to do something different than they spontaneously want to. At this point, one option is to punish them: to impose costs on them that they will find aversive. This might sometimes be effective in changing their behaviour, but it’s a horrible and humiliating way to treat someone.

If you know a person cannot walk away, punishment is a pretty effective tool, since it changes the relative payoffs of their different options to the favour of the one you want them to choose; and they can’t do much about it. But, if you know that someone subjected to the humiliation of punishment could just exit, you’d be much better off not trying to punish them – why should they put up with it? You should compromise on your demands of them instead. Thus, counter-intuitively, a good outside option on both sides can in principle make social relationships more dependable, more mutually beneficial, and freer from interpersonal domination. (Though of course, if one party has exit options and the other doesn’t, that’s an asymmetry of power, and not likely to be healthy.)

People with higher relational mobility scores (this is the perception that others are more relationally mobile) pay less to punish in an economic game, in participants from Japan and the US. From this paper.

This is rich idea, foreshadowed and probed in Albert Hirschman’s classic book Exit, Voice and Loyalty. It seems to tie together lots of disparate applications. To link to one of my other areas of interest, it crops up in one of the arguments for Universal Basic Income. In a world where UBI gives every individual a minimal walk away option from every job, the labour market should get better. Humiliating and unhealthy employment practices should be lessened, as employers who treated their employees this way would go the (alleged) way of bad Parisian restaurants. This leads to the counterintuitive prediction that people would work more, or at least more happily and productively, in a world where they were paid a bit for not working.

More generally, relational mobility could play some role in explaining the expanding moral circle, the observation that as societies have become richer and more urbanised, the unacceptability of humiliation and cruelty has deepened, and been extended to successively broader sets of others. Surely, if modern economic development has done one thing, it has increased relational mobility, though unevenly (more for the rich than the poor for example, more in the metropolis than the periphery). Perhaps this is the cultural consequence.

At least, modern economic development has increased relational mobility relative to (often authoritarian) agrarian and early modern societies. Some foraging societies were probably rather different. There’s a long-standing anthropological argument that one factor maintaining egalitarianism and freedom from domination in mobile hunter-gatherers is the ability of the dominated to simply melt away and go elsewhere. Much more difficult when you are tied to a particular plot of land or irrigation resource.

To pivot to an entirely different level of analysis, narcissists, famously, have highly conflictual interpersonal relationships that nevertheless persist for years (often at great cost to the partner). Narcissists are particularly prone to trying to control their partners through punishment. Although they frequently threaten to leave (presumably as a punishment), they seldom actually do. One thing that may be going on here is that narcissists have such an inflated sense of their own worth that they struggle to believe their partners could have outside options (there is some evidence consistent with this). Thus, they like to stay, and manipulate their partner into continuing to provide benefits from them, without feeling any imperative to treat that partner well in turn.

All in all, the topic of relational mobility, at all kinds of scales, seems like an important one that requires further unifying research and theory. Is higher relational mobility an unalloyed good? Does it come with particular psychological or political costs? How does it relate to the balance of kin-based and non-kin based relationships, which has been implicated in the social evolution of trust and of economic institutions? What role does it play in ‘modernity’ more generally?

Perhaps most pressingly for me, can the power of relational mobility be harnessed from the political left? The celebration of the positive power of consumer choice has come to be strongly associated with the neoliberal right. It’s easy to see through the smokescreen here: neoliberal dismantling and privatisation of public services was rhetorically justified by the progressive power of consumer choice. In many cases it actually ended up meaning the handover of a lot of public and household money to unaccountable capitalist oligopolies (chumocracies, indeed), without much practical empowerment of the citizen. Still, people on the social democratic left have an uneasy relationship with the idea that the citizens ought to be able to choose, including choosing to opt out. Maybe, though, as in the Universal Basic Income example above, there are instances where the left can make friends with the idea.

The Changing Cost of Living Study: part one, cross-sectional results

People with lower incomes have worse mental health: more depression, more anxiety. On the face of it, this seems to be a strong pragmatic and ethical case for income redistribution: if we raised the incomes of the poorest in society, we could avoid all the health and social costs of those difficulties, and, more importantly, relieve suffering. The association between income and mental health also seems to ask us to reconsider what ‘mental health disorders’ really are: it becomes less relevant to think of them as principally ‘brain diseases’ or ‘chemical imbalances’, and more compelling to understand them as psychological responses to particular socioeconomic realities.

However, nothing is simple. Lower incomes could be associated with worse mental health for at least three reasons:

  1. Lower income causes worse mental health;
  2. Worse mental health leads people to earn lower incomes;
  3. Some as-yet-unmeasured variable affects both income and mental health (and there are many you could think of).

In a cross-sectional study– that is, a study where you measure income once and mental health once — you really can’t get anywhere much in deciding the relative importance of these three possibilities. A longitudinal study is immediately a bit better. This is a study where you measure income and mental health on multiple successive occasions in the same people. If, when income decreases, mental health gets worse; and when income increases, mental health gets better, then it starts to look like possibility 1. is important. And this is especially true if the reason for income going down is not to do with something the person has done: what economists call an income shock, where some random external event causes income to change. A term researchers use for a situation when a random shock is applied to a load of people is a natural experiment. Instead of the researcher having to delberately apply a treatment to the participants, something just happens to them anyway, and the researcher tries to hang on and learn about cause and effect in the process.

Sadly, in 2022, we are living through a natural experiment about income and mental health. Prices are rising in the double digits. Incomes are not (except, apparently, for chief executives of large corporations, their incomes rose by 23% last year). Many of the things whose prices are rising fastest are not discretionary items, in particular the energy needed to cook and stay warm. What matters for wellbeing is not income per se, but non-committed income: the income you have left over after you have dealt with the things you must maintain in order to maintain your identity and basic subsistence. non-committed incomes in countries like the UK and France are going, by all projections, to fall for most people over the coming period, for reasons that really aren’t of their making. If possibility 1. – income affects mental health – is true, then, sadly, mental health is going to get worse for a lot of people.

In view of this natural experiment, a consortium of colleagues (Coralie Chevallier, Matthew Johnson, Elliott Johnson, and Kate Pickett) and I have organised a hastily assembled longitudinal study, The Changing Cost of Living Study. We have recruited small cohorts of about 250 volunteers in the UK and France, and we are going to be in touch with them every month over the coming year, to understand how their psychological states are changing with the economics of their households. To be clear, we don’t wish to profit intellectually from people’s misfortunes. But these things are happening to them anyway. As socially committed researchers, we wish to document the effects, and bear witness, to anyone who is prepared to listen.

The Changing Cost of Living Study only started last month, so we don’t have any longitudinal results yet. However, we have the baseline responses from our participants. These allow us to get to know our cohort a bit, and do a cross-sectional study at the outset. I’ll be presenting some of those cross-sectional results in this blog. For those of you who are interested, the protocol of the study is pre-registered here.

A central input measure in our survey is non-committed income, the amount of income you have left over once your unavoidable obligations (taxes, rent or mortgage, water and energy) are paid for. We calculate this each month from what our participants told us came in to their household, minus what they had to pay in mortgage or rent; council tax; water; and energy bills. We equivalise this for the number of people in the household (in other words, we take account of the fact that a household of 4 needs more money than a household of 2). And then, for display purposes, we divide people into quintiles (the 20% with the lowest non-committed incomes, the next 20%, and so on to the top of the distribution).

On the output side, there are many measures. We are particularly interested in depression, anxiety, time preference, and risk preference (more on these below). For depression and anxiety, we use clinical scales that were developed in the context of deciding if people had levels of clinical concern, and might need drug treatment. So, substantial scores don’t represent mild complaining about life, but potentially serious suffering.

Just about managing?

First, let’s look at how people are managing to get by. We asked two questions: how are you managing financially (where 1 is not at all, 5 is very well); and how do you estimate your risk of destitution (grande pauvreté in French). As you can see in figure 1, in both countries, our cohort spans the socioeconomic distribution pretty completely: the people in the lowest quintile are really struggling to get by, and seriously worried about destitution; whilst the people in the top quintile are doing pretty well. The people in the lower non-committed income quintiles are having a harder time in the France than the UK (for example, in the lowest quintile in France, the average subjective risk of destitution is 50%), but we can’t conclude anything about the difference between the two countries in general. We recruited in a different way in each case, and it just reflects the different samples that joined the study.

Figure 1. Ratings of ‘managing financially’ and ‘risk of destitution’ by non-committed income and country, baseline data. The points represent the mean; the little boxes the inter-quartile range; and the squiggly thing, which is called a violin, the distribution of responses across the possible values.

Depression and anxiety

Figure 2 shows the levels of depression across the income quintiles. These scales are typically cut up into ‘none’, ‘mild’ and ‘moderate-severe’, where ‘moderate-severe’ indicates clinical concern that might well lead to medical treatment (‘mild’ might too, sometimes). The first thing that might strike you about this figure is that the population as a whole is not doing too well: there are more people with some level of depression than without, in both countries. This is not an uncommon finding these days; and at the moment, there is quite a lot to be ‘depressed’ about.

Figure 2. Proportions of respondents reporting none, mild or moderate-severe depression, by income quintile and country.

Central to the question here, however, is how the rate varies across the quintiles. You see this most clearly by looking at the heights of the ‘none’ bars. These are highest in the highest income quintile, and lowest in the lowest. In the bottom 20% of the distribution, most people have some level of depression, and for a big chunk of them, it would be classed as moderate or severe. Anxiety tells a very similar story (figure 3; and indeed, despite their different names, depression and anxiety go closely together in most regards).

Figure 3. Proportions of respondents reporting none, mild or moderate-severe anxiety, by income quintile and country.

Can’t afford to wait?

We’ve also asked about time preference (would you prefer a smaller, immediate amount of money over a larger one that will come in the future?) and risk preference (would you prefer a smaller, sure amount, or a gamble that could produce nothing or could produce a lot?). Figure 4 shows how those measures distribute across the income quintiles.

Figure 4. Time preference (top) and risk preference (bottom), by non-committed income quintile and country.

For time, we reproduce the common finding that people whose non-committedincomes are lower can’t afford to wait. This makes sense; if you are worried about paying this month’s rent, you are simply not in a position to make an investment that does not pay off for several months. This is a kind of tax of poverty: you can never afford to do the things that affluent people can do that will make them even more affluent in future, because you are always using all your resources to fight the immediate fire.

For risk preference, the pattern is not very clear, but there might be a hint that people with lower non-committed incomes are more averse to risks (i.e. prefer a sure thing over a gamble). This again makes sense and concurs with previous findings: if one euro to spend could make a big difference for you, you can afford to invest that euro in a risky gamble. On the face of it, you might think risk-taking is a bad thing, and therefore that risk aversion is a desirable consequence of poverty. However, to work well, society needs people to take risks: to set up new companies, invest in developing rare skills, move to new places. People in poverty may not be able to afford to do this, even if it would be socially beneficial, since they don’t have the buffer for the case where it goes wrong. So they are often stuck not able to make the individually and socially useful moves they might wish to make.

These are some of our findings at baseline (there are plenty of other measures; contact me if you want to discuss the study further). It looks like we find more or less cross-section income gradients that we might expect. But the real value of the study will be longitudinal. We look forward to understanding how things change over the coming months, especially with winter coming and energy bills rising. We will report ongoing findings here. In the mean time, we would like to thank our participants for joining us in this endeavour, and our funders for allowing it to happen.

The Changing Cost of Living Study is funded by the University of York Cost of Living Research Group; the ActEarly Collaboration (UK Prevention Research Partnership), and the Agence Nationale de la Recherche (ANR-21-CE28-0009). An ongoing results webpage can be found at: This shows the descriptive statistics from the cohort and will update automatically each month.

How should we reduce the wellbeing costs of poverty?

Unless you have been in hiding for the past forty years, you will know that even in countries that are rich in aggregate, poverty is really bad for wellbeing – bad for physical health, bad for mental health, and bad for satisfaction with life in general. Definitions of poverty for developed nations generally include some notion of relativeness: it’s about having less than most people in your society. Under this definition, you can’t ever entirely make poverty go away, since numerical equality of income and wealth is unlikely (though, of course, you can make the gaps smaller, and this seems generally to be a good idea, for all kinds of reasons including those discussed below). So it is worth asking: are there places where the wellbeing burden of relatively low income is smaller, and places where it is bigger? And what do those places do differently?

I have been having a look at this using the data from the European Quality of Life Survey (2012). (This is a digression from a larger ongoing project with Tom Dickins investigating the consequences of inequality using that dataset, see pre-registration here. There are more details of the sample and measures in that document). I first plotted life satisfaction against income (measured in multiples of the median for the country) by country.

Overall, people on relatively low incomes are less satisfied with life than whose with incomes above the median. Beyond that, the life satisfaction dividend of income quickly tends to saturate. However, the figure seems to show lots of fascinating heterogeneity in the shape of the relationship. Some of this captures real things, like the compressed income distribution of Denmark, and the very dispersed one of the UK. Much of the variation in shape above the median, though, is probably pretty spurious: there are very small numbers of respondents with incomes above about 4 times the median, so those trends are not very precisely estimated (and just don’t ask about Austria). And half of all the people (by the definition of the median) are crammed into the little area between 0 and 1 on the horizontal analysis, so it’s a misleading scale.

What if we split respondents into those whose incomes are above and below the country median? By comparing the mean life satisfactions of those two groups for each country we can get a sense of the psychological cost of being relatively poor.

This seems more satisfying: those on low incomes are everywhere (except Austria) less satisfied than those on high incomes, but the magnitude of the gap is quite variable: compare, say, Denmark to Poland. So, the question becomes: what accounts for cross-country variation in the size of this gap? (There are other questions too, such as what accounts for cross-country differences in the overall mean, but here I consider only the rich-poor gap).

A couple of candidate factors leap to mind. First, there is the inequality of the income distribution itself. Where the gaps are bigger, being at the bottom of the distribution might be worse than where the gaps are smaller. This could be true for several reasons: for a start, where the income distribution is more dispersed, many of those below the median are a long way below it in absolute terms, with all the material problems that is going to cause. Or maybe, as Kate Pickett and Richard Wilkinson have tireslessly argued, where the gaps are bigger, people notice them more, and this puts them into a psychologically more unpleasant mode: stressed, competitive, and paranoid about social position. This would affect the poor more strongly. So one candidate for explaining the size of the rich-poor life satisfaction gap is the inequality of the income distribution of the country, which we measure with something called the Gini coefficient.

Another possibility is that being relatively poor is more tolerable in countries with good access to public services for everyone, especially healthcare. In the health inequalities literature, this is often referred to as the neomaterialist hypothesis (for example in this paper here). This is somewhat confusing, since it is not appreciably more materialist than all of the other possible hypotheses, nor obviously more neo, nor quite as Marxist as it sounds. Anyway, we have measures of this in the dataset: ratings of problems accessing healthcare, and ratings of problems accessing other services such as culture, public transportation and other amenities. I calculated the mean ratings of problems of access to these things just for the people whose incomes were below the country median. I then plotted the size of the rich-poor life satisfaction gap against our three potential explanatory variables: the Gini coefficient, problems accessing healthcare, and problems accessing other services. (Note, on the plot below, the gap is expressed so that positive 0.5 means the poorer half of the population have life satisfaction that is 0.5 scale points lower than the richer half. And yes, the one outlier with a negative gap – the poor are happier – is Austria).

On the face of it, there seem to be positive associations between all three predictors and the size of the life satisfaction gap. However, causal inference is tricky, not least because, unsurprisingly, all three predictors are also somewhat correlated with one another: in more unequal countries, it’s also more difficult for people with low incomes to access healthcare. I ran a model selection algorithm. The best fitting model simply contains problems accessing healthcare (and problems accessing healthcare also has the strongest bivariate correlation with the size of the life satisfaction gap, 0.5). In other words, if people on low incomes can easily access healthcare, the burden of their low income for their satisfaction with life is substantially mitigated. For every standard deviation reduction in problems accessing healthcare, the life satisfaction gap between rich and poor shrinks by half a standard deviation.

However, a second model with both problems accessing healthcare and the Gini coefficient as predictors comes out almost equally likely to be the best model to explain the data. In other words: the data are nearly as compatible with an explanation where both the dispersion of the income distribution and the problems the relatively poor have accessing healthcare contribute to the life satisfaction gap. Even in this model, though, problems accessing healthcare is the variable with the larger beta coefficient.

This is not yet a proper analysis of these data – this is only a taster, and no firm conclusions can yet be drawn. However, it does look like this particular dataset (and outcome measure) points one way rather than another in ongoing debates about how to level up health and wellbeing: should we be prioritising making cash transfers (i.e. increasing low incomes), or providing universal (free) basic services, thus alleviating some of the problems lack of money leads to ? This is a complex argument on which I have usually been on the cash transfer side. Doubtless both are required. However, this dataset does seem to point to the importance of excellent and accessible services for making modern life tolerable at all rungs of the income distribution.

To subscribe to this blog via email, enter your address in the Subscribe box on the right.

Breaking cover on the watching eyes effect

I have seldom had much to say on the watching eyes effect. Even though it is the most cited research I have ever been involved in, it was always a side project for me, and also for Melissa Bateson, and so neither of us has been very active in the debate that goes on around it. Along with our students, we did an enjoyable series of field experiments using watching eyes to impact prosocial and antisocial behaviour. The results have all been published and speak for themselves: not much more to say (we really don’t have a file drawer). However, I have just finished reading not one but two unrelated books (this one and this one) that cite our watching eyes coffee room experiment as a specimen of the species ‘cute psychology effect that failed to survive the replication crisis’, and so I feel I do need to break cover somewhat and make some remarks.

In our coffee room experiment, we found that contributions to an honesty box for paying for coffee substantially increased when we stuck photocopied images of eyes on the wall in the coffee corner, compared to when we stuck images of flowers on the wall. This makes the point that people are generally nicer, more cooperative, more ethical, when they believe they are being watched, a point that I believe, in general terms, to be true.

The account of the experiment’s afterlife in both books goes something like: this was a fun result, it made intuitive sense, but it subsequently failed to replicate, and so it belongs to the class of psychology effects that is not reliable, or at least, whose effect size is very much smaller than originally thought. It is certainly true that many psychology effects of that vintage turn out not to be reliable in just such a way; and also true that there are many null results appearing using watching eyes manipulations. I just want to point out, though, that the statement that our coffee room results have failed to replicate is not, to my knowledge, a correct one (and my knowledge might be the problem here, I have not really kept up with this stuff as well as I should).

The key point arising from our coffee room experiment was that: in (1) real-world prosocial tasks, when (2) people do not know they are taking part in an experiment, (3) few real eyes are around , and (4) the rate of spontaneous prosociality is low, then displaying images of watching eyes can increase the rate of prosocial compliance. I do not know of any attempt at a direct replication, with either a positive or a null result. We can’t do one because we don’t have a kitchen with an honesty box any more, and besides, our study population knows all about our antics by now. Someone else should do one. Indeed, many people should.

There have been some conceptual replications published, preserving all of features (1) – (4), but focusing on a different behaviour and setting than paying for one’s coffee in a coffee room. Some of these are by our students (here and here for example). Some are not: for example, see this 2016 study on charitable donations in a Japanese tavern or izakaya and the anti-dog littering campaign developed and evaluated by charity Keep Britain Tidy. All of these can be considered positive replications in that features (1)-(4) were present, a watching eyes image intervention was used, and there was a positive effect of the eye images on the behaviour. The effect sizes may have been smaller than our original study: it is hard to compare directly given the different designs, and I have not tried to do so. But, all these studies found evidence for an effect.

Given the existence of positive conceptual replications, and the lack, to my knowledge, of any null replication, why did both books describe our coffee room result as one that had not replicated? They were referring, no doubt, to the presence in the literature of several studies in which (a) participants completed an artificial prosociality task such as a Dictator Game, when (b) they knew they were taking part in an experiment, (c) they were therefore under the observation of the experimenter in all conditions, and (d) the rate of prosociality was high at baseline; and the watching eyes effect was null.

It’s perhaps not terribly surprising that watching eyes effects are often null under circumstances (a)-(d), instead of (1)-(4). When the rate of prosociality is already high, it is not easy for a subtle intervention to make it any higher. Besides, anyone who knows they are taking part in an experiment already feels, quite realistically, that their behaviour is under scrutiny, so some eye images are unlikely to do much more on top of that. That’s the whole concern about studying prosociality in the lab: baseline rates of prosociality may be atypically high, exactly because people know that the experimenter is watching. But this should not be confused with the claim that the watching eyes effect has been shown to be unreliable under the rather different circumstances (1)-(4). That might turn out to be the case too, but, to my knowledge, it has not thus far.

There are two possible sources of the book authors’ confusion with respect to the afterlife of the effects observed in our coffee room study. The first is that they are using our coffee room experiment as a metonym for the whole of the watching eyes literature. The original studies of the watching eyes effect, the ones that preceded ours (notably this one), were done under circumstances (a)-(d), and as we have seen, those effects have not reliably replicated. But it fallacious to say thereby that our rather different studies have not replicated. Something about watching eyes effects did not replicate, our study is something about watching eyes, therefore our study did not replicate. Doesn’t quite follow. By chance, we might have stumbled on a set of circumstances where watching eyes effects are real and potentially useful, even though they turn out to be more fragile and transitory in the domain – experimental economic games – where they were first documented. Testing whether this is right requires replications that have the right properties to be sure. Doing more (easy because in the lab) replications with the wrong properties does not seem to add much at this point.

Second, the book authors were probably influenced by a published meta-analysis arguing that watching eyes do not increase generosity. Whatever its merits, that meta-analysis, by design, only included studies done under circumstances (a)-(c) (and therefore for which (d) is usually true). It did not include our coffee room study, any of our conceptual replications of our coffee room study, or any of the conceptual replications of our coffee room study done by anyone else. So, it can hardly be taken as showing that the effects in our coffee room study are not replicable. That would be like my claiming that Twenty-Twenty cricket matches are short and fun, and you responding by saying that you have been to a whole series of test matches and they were long and boring, not short and fun. True, but not relevant to my claim. My claim was not that all cricket is short and fun, only that certain forms of it may be.

It’s really important, in psychology, that we attempt and publish replications, do meta-analyses, and admit when findings turn out to be false positives. But, it’s also important to understand what the implicational scope of a non-replication is. Replication study B says nothing about the replicatory potential of the effects in study A if constitutive pillars of study A’s design are completely absent from study B, even if the manipulation is similar. Also, we really ought to do more field experiments, where participants do not know they are in an experiment and are really going about their business, if the question at hand is to do with real-world behaviour and interventions thereupon.

I am quite happy to accept the truth however the dust settles on the watching eyes effect, but for real-world prosocial behaviours in field settings when no-one is really watching and participants don’t know they are taking part in an experiment, I’m not prepared to bet against it just yet.

Subscribe to this blog by entering your email in the subscribe box on the right. Regular posts on psychology, behavioural science and society.

Live fast and die young (maybe)

Quite a few big ideas have made it across from evolutionary theory into the human sciences in the last few years. I can’t think of any that has been more culturally successful than the ‘live fast, die young principle’. This principle, which was originally articulated by George C Williams in the late 1950s, says something like the following: if you live in a world where the unavoidable risk of mortality is high, you should prioritise the present relative to the future. Specifically, you should try to reproduce sooner, even at the expense of your neglected body falling apart down the line. After all, what is the point in investing in longevity when some mishap will probably do you in anyway before you reach a peaceful old age?

The principle was originally invoked to explain inter-species differences in the timing of reproduction, and in senescence (the tendency of bodies to fall apart in multiple ways after a certain number of years of life, without clear external causes). But it has come to crop everywhere: in psychology (to explain individual differences in impulsivity, and the impact of early-life trauma), in sociology (to explain socioeconomic differences) , in anthropology and history (to explain social change). I’ve even found it invoked in explaining how travel agencies responded to the disruption caused by the pandemic. And then of course, there is the famous story of Henry Ford, asking engineers to tour the scrapyards of America, finding parts of model T cars that were still in good condition in scrapped carcasses. They found that the kingpin was never worn out; ‘make them less well!’, came the response.

On the face of it, this is a beautiful example of theory guiding observation, science in the hypothetico-deductive mode working well. Williams produced an a priori theoretical argument. Subsequent data from many species supported its central prediction: species that experience a higher mortality rate in their natural environments mature sooner (and smaller); have larger litters; have shorter inter-birth intervals, and may senesce sooner. Then, it was like the joke about confirmation bias: once you are aware of it, you see it everywhere. For people spotting it (for example in the behaviour of travel agencies), it was nice to link back to the prestige of evolutionary biology and the idea that the pattern was predicted by theory. But there is a problem. The problem is not with the bit that says that the living fast and dying young pattern of behaviour occurs; this does indeed seem to be an empirical regularity of some generality. The problem lies in the bit that says theory predicts it will occur.

It seems to be a well-kept secret that there is no consensus in evolutionary biology that Williams’ theoretical argument was correct. That’s putting it mildly. As a 2019 review in Trends in Ecology and Evolution put it: ‘[Williams’] idea still motivates empirical studies, although formal, mathematical theory shows it is wrong’. The authors of that paper suggest that Williams’ argument persists not because it is sound, but because it is intuitive. Intuitive it definitely is. I remember acting as editor for another important paper showing mathematically that –other things being equal — the risk of unavoidable mortality can have no impact on the optimal timing of reproduction, or any other trade-off parameter for that matter. And I remember thinking: but it obviously must, you need to rewrite the equations until they say so! (I didn’t say this in my editorial review of course).

The difficulty is that, until now, there has been no explanation of why Williams’ argument does not work that is anything like as intuitive as the original argument was. A new paper by Charlotte de Vries, Mattias Galipaud and Hanna Kokko comes as close as anything I have ever encountered to giving me an intuitive magic for the failure of the argument that is nearly strong enough to battle the intuitive magic of the argument itself. (For those of you who don’t know Hanna, the combination of brilliant theoretical insight and limpid clarity in communication is exactly the kind of behaviour she has form for.) The paper, as well as explaining the difficulty with Williams’ original argument, signposts the ways we might rescue the live fast, die young principle, and in the process shows how the scientific method – theory leads to prediction leads to test – is not entirely as we like to imagine.

Alright, let’s roll our sleeves up. Let us imagine a population living in a dangerous world, whose members put everything into reproducing in the first year of their lives. Then they are knackered, and die off even if the dangers of their world haven’t got to them first. There is no selection for reproducing less when young and being healthy for a second year, because so few are going to make it that far anyway. Now, due to a change in the ecology, the rate of unavoidable mortality goes down. Now, many more individuals can be around in the second year. By reproducing a bit less in the first year, individuals can be in better health in the second year, leading to higher total lifetime reproductive success. And this delaying is now more worth doing, because they now have a better chance of making it through and reaping the benefit. This is Williams’ original argument, and it still seems to make a lot of sense.

We are talking about evolution though, as Ernst Mayr taught us, thinking about evolution requires thinking about populations, not about isolated individuals. If the rate of mortality goes down, the population is going to grow exponentially, at a faster rate than it did previously. In an exponentially growing population, an offspring that you have sooner is more valuable in fitness terms than one you have later. De Vries et al. give us a nice figure to see why this is the case.

The thing that gets maximised by evolution is the proportionate representation in the population of some lineage or type. And it is easy to see that in a growing population, an offspring placed into the population earlier (the left hand star on the figure) can become ancestor to a greater fraction of the population by time 3 than an offspring placed into the population later (the right hand star). This is because it is placed in when the cone is narrower, and its descendants begin their exponential growth in number sooner. So, when a population is growing exponentially, there is a fitness bonus attaching to any offspring you manage to have soon. To look at this the other way about, there is a relative fitness penalty attached to any offspring you have later in time.

So when the rate of unavoidable mortality goes down, two things happen. The chances of making it to a second year goes up, increasing the expected return on investments in reproductive capacity in the second year. And, because the population begins to grow exponentially, the relative fitness penalty for an offspring being placed a year later gets bigger. Your chances of having an offspring in the second year go up; and the relative value to your fitness of a offspring a year later goes down; and these two effects perfectly cancel one another. The change in the risk of unavoidable mortality ends up having no effect at all on your optimal trade-off between reproduction and health.

Aha, you might say. But this penalty for delayed offspring only applies in exponentially growing populations. So Williams’ argument would work in a population where mortality decreased, but the population size remained stable. Indeed, but the trouble is that the only thing that keeps populations from growing exponentially is mortality. Imagining mortality reducing without that causing the population to grow exponentially is like imagining putting air into a balloon without that balloon starting to get bigger.

Exponentially growing populations are eventually limited and stabilized by competition amongst their members (so-called density-dependent regulation). The way to rescue Williams’ argument is to incorporate some kind of density-dependent regulation that checks population growth, but still rewards those who delay reproductive effort. The problem is that there are many ways that density-dependent regulation can work (as competition increases, the old lose out, the young lose out, fecundity is reduced amongst inexperienced breeders, it is reduced amongst experienced breeders, new juveniles can’t find nest sites, old adults can’t defend their nest sites, etc.). de Vries et al. consider ten different scenarios for density-dependent regulation. They find that under some of these, reducing unavoidable mortality selects for delaying reproduction (the Williams pattern); in some reducing unavoidable mortality selects for accelerating reproduction (the anti-Williams pattern); and in some, reducing unavoidable mortality has no effect either way.

There is perhaps no way of saying a priori which of the ten scenarios for density-dependent regulation actually captures what happens in any particular real population. So, we can’t know a priori what our theoretical model should be. Instead, de Vries et al. say, reasonably enough, we can use the fact that we do observe the live fast, die young pattern across many species to narrow down what the right theoretical model is. In other words, we can predict what our theory should be using data. That is, since populations with higher mortality do evolve earlier reproduction, we can infer that they should be modelled using the models in which this is predicted to happen. Those are models where the burden of density-dependent competition falls particularly on juveniles trying to start reproducing, or on fecundity. So, since we stumbled on empirical regularity (populations with higher mortality evolve earlier reproduction and senescence), we learned something, indirectly, about what assumptions we should make about how populations work.

I take a few lessons away three lessons from this example. One, when people say ‘evolutionary theory predicts…..’, they are often just peddling an intuition, an intuition that may not work out, or may only work out under restrictive assumptions. In this case, if we wanted to say ‘theory predicts’ the live fast die young pattern, what we ought to say ‘perfectly reasonable theory predicts either this pattern, or the opposite, or no pattern at all, depending on assumptions’.

Two, our intuitions don’t do population thinking. We compute what the payoff to one individual would be of doing A rather than B. We don’t spontaneously think about what the changing population distribution would be if A rather than B become common. But evolution is a population process, and so you can’t work out what will happen without modelling populations (you often get misleading results just by totting up costs and benefits to one representative individual).

Third, theory does not really come before data, even in a relatively ‘theoretical’ and mathematically formalized discipline like evolutionary biology. It’s the empirical findings that tell us what kinds of theoretical models we should construct, and how to constrain them. This is worth noting for those who are advocating more formal theory as a way out of psychology’ current crisis. Sure, we need to build more models, but model-making prior to any data is blind. You don’t just figure out the right model in a vacuum and go off and test it. Data – observation of the world, often rather descriptive and interest-driven – tell us what theoretical assumptions we should be making, almost as much as theoretical models then tell us which further data to collect. It’s a cycle, a game of tag between models and data, in which data predict theory as well as the opposite.

Subscribe to this blog by entering your email in the subscribe box on the right. Regular posts on psychology, behavioural science and society.

The bosses pretend to have theories, and we pretend to test them

Leo Tiokhin has hosted a new blog series on the use of formal models in metascience and, more generally, in psychology. The starting point for the series is the increasing recognition that psychology’s weaknesses don’t just lie in its recent replicatory embarassments. The underlying theories that all those (possibly non-replicable) experiments aim to test are also weak. That is: the theory as stated could give rise to multiple patterns in the data, and the data could be compatible with multiple theories, given how vaguely these are stated. In my contribution to Leo’s series, I invoked the old Soviet joke: the bosses pretend to have theories, and we pretend to test them.

Several contributors to the series point out the virtue, given this problem, of formalizing theories in mathematical or computational models. This undoubtedly has merit: if you convert a verbal psychological theory into a formal model, then you expose all your tacit assumptions; you are forced to make decisions where you had left these vague; you discover if your conclusions really must follow from your premises; and you are left with a much tighter statement of what your do-or-die predictions are. This is all good, and true.

However, my contribution, and also to some extent the one by Willem Frankenhuis, Karthik Panchanation and Paul Smaldino, provide a line of argumentation in the opposite direction. Theories in psychology are often weak exemplars of theories. One move is to make them stronger through formalization. The opposite move is to not claim that they are theories. I think for many areas of psychology, that makes a lot more sense. There are many important avenues of scientific enquiry that do not exactly have theories: descriptions of psychological phenomena; ontologies of psychological processes; uncovering of which things pattern together; working out which levers move which parts in which direction, and which levers move none. These enquiries can certainly feature, and meaningfully test, hypotheses, in local kind of way, but may not be underlain by anything as grandiose as a fully-identified theory.

Of course, there is always some kind of idea underlying research questions. Often in psychology this is better described as an interpretative framework, or a proto-theory. To try to press it into the mould of fully identified theory may be to subject it to the heartbreak of premature definition, which can take years to get over. The problem has been that psychologists have had to claim to be using a ‘theory’ to get their papers published (a little recognized form of publication bias). A psychology paper has needed to start with a ‘theory’ with a three-letter acronym like an opera has needed to start with an overture: terror management theory, error management theory, planned behaviour theory, reasoned action theory, social identity theory, social cognitive theory, social norms theory, regulatory focus theory, regulatory fit theory, life history theory, life course theory – I think you know the game. I even coined an acronym for the generation of these acronyms: CITEing, or Calling It Theory for Effect.

None of these frameworks is ready to be implemented as a computational model, which raises all kinds of interesting questions. What kinds of beasts are these? Would it be better if we did not need to invoke them or their ilk at all, and could just state what questions we want to answer, what parallels there are elsewhere, and what our hunches are? Although having theories in science is great, it might not be a prerequisite. Precisely stating your theory is especially excellent when you do actually have one. You should not feel pressured to state one as a rhetorical move, if in fact you are doing description or proto-theory.

People often misunderstand the pre-registration revolution as being the requirement to only do confirmatory analyses. But, as Willem and I have argued, it’s not this at all. It’s the freedom to do confirmatory analyses when these are appropriate, and exploratory ones when these are appropriate, and be clear and unashamed about which it really is: don’t muddle the one with the trappings of the other. Likewise with theories: be clear when you really have one, and be clear when you don’t. Just as having better theories can lead to a better psychology, so, possibly, invoking no theory, in some cases.

People sometimes link this point to the claim that ‘psychology is a young science’, not ready for its Newton or Darwin yet. That’s starting to look a little disreputable. I personally think there should be a one hundred year cut-off on the old ‘young science’ ploy, which means psychology has overstayed. The deeper problem for theories in psychology, as David Pietraszewski and Annie Wertz have recently argued, is not insufficient time, but a constant flip-flop about what its proper level of analysis is. Some frameworks work at the intentional level of analysis (the intuitive way we speak about people, with beliefs, desires, feelings, selves, things they can deliberately control and things they can’t), and others at the functional level of analysis (i.e. how do the information processing mechanisms actually work, which may or may not look isomorphic to intentional-level descriptions of the same processes). Add to the mix that evolutionary psychologists are sometimes also thinking at the level of ultimate causation (fitness consequences), and there is a heady recipe for total incoherence about what we are trying to do and what kind of thing would make us satisfied we had in fact done it. Hence the constant churn of what seem like adequate theories to some people, and seem entirely unlike adequate theories to other people. This is the big problem psychology needs to sort out: what level of analysis do we want in an explanation. The answer may be sometimes one, sometimes another, but ‘theories’, and authors, need clearly to say which they are trying to do.

Big thanks to Leo for hosting this interesting series.

My muse is not (or, possibly, is) a horse

I’ve written one thing in my life that people really want to read: a 2017 essay called Staying in the game. When I first posted it, the unprecedented traffic over a couple of days caused my web site host to suspend the service. A lot of people commented or emailed when it came out. Many people have read it since. Every few months it has a little outbreak of virality, usually via Twitter or Facebook. The most recent one was this week. Given that people seem to be interested in the essay, and more generally in understanding the creative processes of their fellow academics, I thought it might be fun to write some more about the history of this essay, how it came about.

Staying in the game existed for some time, in several versions. It tries to do several things. It contains a self-help or how-to guide for actual or aspiring academics, a kind of Seven habits of moderately effective (and slightly nerdy) people. There is something of the confessional in it (and that, I think, is what people, especially younger academic colleagues, like). I wanted to say that it is OK, normal, permitted, to struggle in your academic career, to not do as well as you hoped or think you ought to have done. We senior people have been there too. There are longeurs and surprises, so we should all be compassionate to ourselves and not make too much of a big deal out of it. And there is a third part, which is about my ambivalence, both personal and philosophical, towards the markers of value that we tend to draw on in academia–the prizes, the status markers, the impact factors, and so forth. If you want to give me these, that would be welcome (my address is freely available); but I worry about them.

Each of these three parts was originally conceived of as a separate project, possibly a whole separate book in some cases, but in any event a completely separate paper or chapter. But when I started to write an essay called My muse is not (or, possibly, is) a horse, the different threads kept tangling each other so much that in the end I thought, well damn it, I might as well just deal with them all here and now, and that is what Staying in the game ends up doing. I never wrote all the other bits. And the starting point – whether one’s muse should or not be likened to a horse, fell to the cutting room floor.

The how-to guide part was inspired by my observation, from reading various writers’ and mathematicians’ accounts of their process, of how much convergence there seemed to be: 2-3 hours concentrated time, usually in the morning, every day, with no multi-tasking, and a fair dose of quiet ritual surrounding it. I was going to systematically review these, and the various Writer’s Way type courses, pointing up the points of convergence, linking this to some light consensual evolutionary psychology about which ways of working are natural for human beings. This was possibly going to be a whole book project, but it ended up barely a few pages with some promissory examples. When I started the essay that was to become Staying in the game I kept forward-referring to this as-yet-nonexistent scholarly work, to such a degree that I realised it might not be necessary for me to ever actually do the scholarly work: I could just assert the things I wanted to say, not as forward references to a future piece of scholarship, but just, in the traditional Oxford manner, as assertions.

The actual thing I started to write was, as I have mentioned, an essay called My muse is not (or, possibly, is) a horse. The title comes from a wonderful letter written by Nick Cave to MTV in 1996, when he had been nominated for a best male artist award. (Disclaimer: this letter, which I found in a book, is wonderful. I know nothing else about Nick Cave, either his music or his political views.) Cave begins the letter with extremely gracious thanks to MTV for supporting and recognising him. I love this: his purpose in the letter is to spurn their accolade, but he begins humbly and with generous recognition of their benign intentions, not condescending or disdaining his nominators in any way, but genuinely thanking them. Then he goes on:

Having said that, I feel that it is necessary for me to request that my nomination…be withdrawn and furthermore any awards or nominations…that may arise in future years be presented to those who feel more comfortable with the competitive nature of these award ceremonies. I myself do not…I have always been of the opinion that my music…exists beyond the realms inhabited by those who would reduce things to mere measuring.

My relationship with my muse is a delicate one at the best of times and I feel that it is my duty to protect her from influences that may offend her fragile nature. She comes to me with [a] gift, and in return I treat her with the respect I feel she deserves-in this case this means not subjecting her to the indignities of judgement and competition.

I have thought a lot about this, in the academic context. We professors are a strange combination of plumbers and poets. No-one would expect a poet to be able produce any particular poem, to order, to a timetable (except the poor poet laureate, how awful must that be?). For poets, we recognise the individuality and autonomy of the muse: they need to write about what they want to write about, whenever they want to write it. Plumbers though, you would rather like it if they came to your house at the time appointed and could fix your leak on demand, like a reliable automaton. And we professors are somewhere in between. Our work, both the topic and the style, is deeply personal, creative, unpredictable; the ideas we will have, the ways of expressing them we will dream up; the ways we go about them and the scope of the bits we chew on. And yet, as a community we think it is quite fine to assess ourselves on simple common yardsticks. We pretty much expect–and account for in spreadsheets– so much volume per person per unit time, and quality measurable on a linear scale. This is quite problematic. I agree that professors, in the public pay and providing a public good, ought to work hard and produce enough value to justify their subvention. And we need to figure out what research is better or worse. But we are not machines: we are people making meaning. What meaning we create is highly personal and hard to account for retrospectively, let alone prospectivelely. The value to society, although very important, is hard to assess. So when universities review our performance each year, often quite crudely and numerically, or we review ourselves, this is totally understandable and also quite reductive. I don’t have any good answers. I merely point out the tensions.

I can see why Nick Cave would want to opt out of the process of judgment. What he fears, actually, is not being judged a failure, but being judged a success: he knows that could change the authenticity of what he does. There’s a lot about that in Staying in the game, the compromise, as an academic, between what you have to do to earn your crust, what you get rewarded for, and what you do from personal identity and the search for meaning. Cave sums it up this way:

My muse is not a horse and I am in no horse race, and if indeed she was, I still would not harness her to this tumbrel… muse may spook! May bolt! May abandon me completely!

My first thought on reading was that Nick Cave is a person who knows a lot about muses. My second thought, though, was that quite possibly he is a person who does not know a lot about horses.

The point of saying his muse is not a horse is that a horse is an automaton, a machine that can be put to work in service of any goal, substituable, predictable, quantifiable (so many horse power). Whereas the muse…the muse is….well, each muse is unique; beautiful when flowing; temperamental; departs from type-specific expectations in unruly ways; needs to be wooed and soothed and petted and given the best possible living conditions; has individual needs and strengths; is stubborn, foul, resentful at times; and needs to be treated as having final value, not just as instrumental. Like a living thing really, rather than a machine. An animal. A big, powerful animal, one that can be domesticated but has a wild ancestor and is stubborn at times. Like a…well like a horse. Indeed, in the passage above metaphor, the muse is first not a horse, and then, in fact, a horse, a horse that should not be harnessed to the wrong thing (a tumbrel, of all things, why does no-one ever mention tumbrels except in the context of the French revolutionary terror?), for fear she may bolt.

This post has no conclusion. I said all I needed to say in Staying in the game and deleted the rest, but it has been nice to resurrect some of the history here. Five years on, I still like Staying in the game, though I do worry that it is a bit too normatively preachy: fat shaming for people who like to check their email in the morning. It was meant to liberate people from anxiety about work, but more than once I have had people say to me ‘Oh, Daniel, I can’t possibly talk to you, I haven’t done my proper work yet today!’ I’m sorry about that. We are all doing the best we can. I am certainly no better than you.

Nick Cave ends his letter on a perfect note:

So once again, to the people at MTV, I appreciate the zeal and energy….I truly do and say thank you and again I say thank you but no…no thank you.

Why does inequality produce high crime and low trust? And why doesn’t making punishments harsher solve the problem?

Societies with higher levels of inequality have more crime, and lower levels of social trust. That’s quite a hard thing to explain: how could the distribution of wealth (which is a population-level thing) change decisions and attitudes made in the heads of individuals, like whether to offend? After all, most individuals don’t know what the population-level distribution of wealth is, only how much they have got, perhaps compared to a few others around them. Much of the extra crime in high-inequality societies is committed by people at the bottom end of the socioeconomic distribution, so clearly individual-level of resources might have something to do with the decision; but that is not so for trust: the low trust of high-inequality societies extends to everyone, rich and poor alike.

In a new paper, Benoit de Courson and I attempt to provide a simple general model of why inequality might produce high crime and low trust. (By the way, it’s Benoit’s first paper, so congratulations to him.) It’s a model in the rational-choice tradition: it assumes that when people offend (we are thinking about property crime here), they are not generally doing so out of psychopathology or error. They do so because they are trying their best to achieve their goals given their circumstances.

So what are their goals? In the model, we assume people want to maximise their level of resources in the very long term. But-and it’s a critical but- we assume that there is a ‘desperation threshold’: a level of resources below which it is disastrous to drop. The idea comes from classic models of foraging: there’s a level of food intake you have to achieve or, if you are a small bird, you starve to death. We are not thinking of the threshold as literal starvation. Rather, it’s the level of resources below which it becomes desperately hard to participate in your social group any more, below which you become destitute. If you get close to this zone, you need to get out, and immediately.

In the world of the model, there are three things you can do: work alone, which is unprofitable but safe; cooperate with others, which is profitable just as long as they do likewise; or steal, which is great if you get away with it but really bad if you get caught (we assume there are big punishments for people caught stealing). Now, which of these is the best thing to do?

The answer turns out to be: it depends. If your current resources are above the threshold, then, under the assumptions we make, it is not worth stealing. Instead, you should cooperate as long as you judge that the others around you are likely to do so too, and just work alone otherwise. If your resources are around or below the threshold, however, then, under our assumptions, you should pretty much always steal. Even if it makes you worse off on average.

This is a pretty remarkable result: why would it be so? The important thing to appreciate is that with our threshold, we have introduced a sharp non-linearity in the fitness function, or utility function, that is assumed to be driving decisions. Once you fall down below that threshold, your prospects are really dramatically worse, and you need to get back up immediately. This makes stealing a worthwhile risk. If it happens to succeed, it’s the only action with a big enough quick win to leap you back over the threshold in one bound. If, as is likely, it fails, you are scarcely worse off in the long run: your prospects were dire anyway, and they can’t get much direr. So the riskiness of stealing – it sometimes you gives you a big positive outcome and sometimes a big negative one – becomes a thing you should seek rather than avoid.

Fig. 1. The right action to choose, in Benoit’s model, according to your current resources and the trustworthiness of others in your population. The threshold of desperation is shown as zero on the x-axis.

So, in summary, the optimal action to choose is as shown in figure 1. If you are doing ok, then your job is to figure out how trustworthy your fellow citizens are (how likely to cooperate): you should cooperate if they are trustworthy enough, and hunker down alone otherwise. If you are desperate, you basically have no better option than to steal.

Now then, we seem to be a long way from inequality, which is where we started. What is it about unequal populations that generates crime? Inequality is basically the spread of the distribution of resources: where inequality is high, the spread is wide. A wide spread pretty much guarantees that at least some individuals will find themselves down below the threshold at least some of the time; and figure 1 shows what we expect them to do. If the spread is narrower, then fewer people hit the threshold, and fewer people have incentives to start offending. Thus, the inequality of the resource distribution ends up determining the occurrence of stealing, even though no agent in this model ‘knows’ what that distribution looks like: each individuals only knows resources what they have, and how other individuals behaved in recent interactions.

What about trust? We assume that individuals build up trust through interacting cooperatively with others and finding that it goes ok. In low-inequality populations, where no-one is desperate and hence no-one starts offending, individuals rapidly learn that others can be trusted, everyone starts to cooperate, and all are better off over time. In high-inequality populations, the desperate are forced to steal, and the well-off are forced not to cooperate for fear of being victimized. One of the main results of Benoit’s model is that in high-inequality populations, only a few individuals actually ever steal, but still this behaviour dominates the population-level outcome, since all the would-be cooperators soon switch to distrusting solitude. It is a world of gated communities.

Another interesting feature is that making punishments more severe has almost no effect at all on the results shown in figure 1. If you are below the threshold, you should steal even if the punishment is arbitrarily large. Why? Because of the non-linearity of the utility function: if your act succeeds, your prospects are suddenly massively better, and if it fails, there is scarcely any worse off that it is possible to be. This result could be important. Criminologists and economists have worried why it is that making sentences tougher does not seem to deter offending in the way it feels intuitively like it ought. This is potentially an answer. When you have basically nothing left to lose, it really does not matter how much people take off you.

In fact, our analyses suggest some conditions under which making sentences tougher would actually be counterproductive. Mild punishments disincentivize at the margin. Severe sentences can make individuals so much worse off that there may be no feasible legitimate way for them to ever regain the happy zone above the threshold. By imposing a really big cost on them through a huge punishment, you may be committing them to a life where the only recourse is ever more desperate attempts to leapfrog themselves back to safety via illegitimate means.

So if making sentences tougher does not solve the problems of crime in high-inequality populations, according to the model, is there anything that does? Well, yes: and readers of this blog may not be surprised to hear me mention it. Redistribution. If people who are facing desperation can expect their fortunes to improve by other means, such as redistributive action, then they don’t need to employ such desperate means as stealing. They will get back up there anyway. Our model shows that a shuffling of resources so that the worst off are lifted up and the top end is brought down can dramatically reduce stealing, and hence increase trust. (In an early version of this work, we simulated the effects of a scenario we named ‘Corbyn victory’: remember then?).

The idea of a desperation threshold does not seem too implausible, but it is a key assumption of our model, on which all the results depend. Our next step is to try to build experimental worlds in which such a threshold is present – it is not a feature of typical behavioural-economic games – and see if people really do respond as predicted by the model.

De Courson, B., Nettle, D. Why do inequality and deprivation produce high crime and low trust?. Scientific Reports 11, 1937 (2021).

Why is Universal Basic Income suddenly such a great idea?

The idea of an unconditional basic income, paid to all (UBI), has a long history. Very long in fact. Yet, although the policy has been deemed philosophically and (sometimes) economically attractive, it has generally languished in the bailiwick of enthusiasts, mavericks, philosophers and policy nerds (these are, by the way, overlapping categories). But now, with the global pandemic, UBI is very much back in the spotlight. Previous sceptics are coming out with more enthusiastic assessments (for example, here and here). Spain apparently aims to roll out a UBI scheme ‘as soon as possible‘ in response to the crisis, with the aim that this becomes a ‘permanent instrument’ of how the Spanish state works. And even the US Congress relief cheques for citizens, though short-term, have a UBI-like quality to them. So why, all over the place, does UBI suddenly seem like such a great idea?

Answering this question requires answering another, prior one: why didn’t people think it was such a great idea before? To understand why people’s objections have gone away, you need to understand what they were before, as well as why they seem less compelling in this time of upheaval. UBI is a policy that appears to suffer from ‘intuition problems’. You can model it all you like and show that it would be feasible, efficient and cost effective; but many people look at it and think ‘Mah! Giving people money without their having to do anything! Something wrong with that!’. It’s like a musical chord that is not quite in tune; and that’s a feeling that it is hard to defeat with econometrics. But intuitions such as these might be very context-dependent: and the context of society certainly has changed in the last few months.

To try to understand if the acceptibility of UBI to the public has changed for these pandemic-affected times, and, if so, why, Matthew Johnson, Elliott Johnson, Rebecca Saxe and I collected data on April 7th from 400 UK and 400 US residents. This was not a representative sample from either country, but we had a good balance of genders and a spread of age.

We first described a UBI policy to respondents, and asked them to rate how good an idea they found it, both for normal times, and for the times of this pandemic and its aftermath. As the figure below shows, they almost universally thought it was a better idea for times of the pandemic and its aftermath than before-on average, 16 points better on a 1-100 scale.

Ratings of how good an idea a UBI scheme is, for normal and pandemic times, UK and USA samples. Shown are medians, inter-quartile ranges, and the distribution of the data.

Actually, these participants thought UBI was a better idea for normal times than I would have expected, which is hard to interpret without some historical data on this participant pool. Support for UBI has found to vary a lot, in the past, depending on how you frame the policy and what alternatives you pit it against. In our study, it was not up against any alternative scheme; just rated as a good or bad idea.

Now, why was UBI thought a better idea for pandemic times than normal times? We listed nine of the most obvious advantages and disadvantages of the policy, and asked respondents to say how important they felt each of these would be for their overall assessment of the policy – again, as a policy for normal times, and for pandemic times. The advantages were: knowing there is a guaranteed income reduces stress and anxiety; the policy is simple and efficient; the universality gives a value to every individual in society; and the system cannot be cheated. The disadvantages were: it’s expensive; you would be paying money to the rich, who do not need it; people might use it irresponsibly, like on gambling or drugs; people would be less prone to work for money; and people who did not deserve it would get it. All of these pros and cons were rating as having some importance for the desirability of the policy in normal times, though naturally with different weightings for different people.

Rated importance of nine advantages and disadvantages for the overall assessment of the desirability of UBI, for normal times, and for the times of the pandemic and its aftermath.

So what was different when viewing the policy for pandemic times? Basically, three key advantages (reduces stress and anxiety; efficiency and simplicity; and giving a value to every individual) became much more important in pandemic times; whilst three of the key drawbacks (people might use it irresponsibly; the labour market consequences; and receipt by the undeserving), became rather less important. I guess these findings make sense; given the rapidity with which the pandemic has washed over the population, you really need something simple and efficient; given how anxiety-provoking it is, it is imperative to reassure people; and given that millions of people are economically inactive anyway, not through their own choice, potential labour market consequences are moot. Rather to our surprise, the expense of the policy was not rated as the most important consideration for normal times; and nor had this become a less important consideration now, when figures of £330 billion or $1 trillion seem to be flying around all over the place.

The strongest predictors of supporting UBI in normal times were rating highly: the importance of stress and anxiety reduction; the efficiency of the policy; and the valuing of every individual. So it is no mystery that in pandemic times, when those particular three things are seen as much more important, that the overall level of support for the policy should go up. In other words, what the pandemic seems to do is make all people weight highly the considerations that the most pro-UBI people already rated highly for normal times anyway. Perhaps the most intriguing of the pandemic-related shifts in importance of the different factors was the increase in importance of giving every individual in society a value. It is not obvious to me why the pandemic should make us want every individual to have a value, any more than we should want this the rest of the time. Perhaps because the pandemic is some kind of common threat, that we can only solve by all working collaboratively? Perhaps because the pandemic reminds us of our massive interdependence? Because we are all in some important sense alike in the face of the disease?

Whatever the reason, our respondents felt it was more important, in these times, for every person in society to be accorded a value. And for me, that is one of the most philosophically appealing aspects of UBI. Not that it decreases income inequality, which unless it is very large, it will probably not do to any appreciable extent; not just that it gives people certainty in rapidly fluctuating times, which it would do; but that its existence constitutes a particular type of social category, a shared citizenship. Getting your UBI would be one of those few things that we can all do – like having one vote or use of an NHS hospital – to both reflect and constitute our common membership of, and share in, a broader social entity. In other words, in addition to all its pragmatic and adminstrative appeal, UBI bestows a certain dignity on everyone, that may help promote health, foster collective efficacy, and mitigate the social effects of the myriad and obvious ways we are each valued differently by society. And these times, apparently, are making the value of this unconditional dignity more apparent.

One last point: people who consider themselves on the left of politics were more favourable to UBI than those on the right (particularly in the US sample; which is interesting given the places of Milton Friedman and Richard Nixon in UBI’s pedigree). But the boost in support for the policy that came from pandemic applied absolutely across the political spectrum. Even those on the right wing of our sample thought it was a pretty good idea for pandemic times (with, of course, the caveat that this was not a representative sample, and we did not offer them any alterrnative to UBI that they might have preferred). So, just possibly, an advantage of UBI schemes in this uncertain time is that pretty much everyone, whatever their ideology, can see what the appeal of the scheme is. That may yet prove important.

Support for UBI for normal times (solid lines) and pandemic times (dotted lines), for the UK and USA, against self-placement on a scale of 1=left-wing to 100=right-wing.

June 2nd 2020 update: We have now written this study up. You can download the preprint here.

This is no time for utilitarianism!

An interesting feature of the current crisis is the number of times we hear our leaders proclaiming that there are not weighing costs against benefits: ‘We will do whatever it takes!’. ‘We will give the hospitals whatever they need!’. And even, memorably, from the UK Chancellor, ‘We will set no limit on what we spend on this!’. No limit. I mean when did the UK Treasury ever say that? Maybe only during the war, which is a clue.

Such statements seem timely and reassuring just at the moment. When people are timorous enough to question whether some of this largesse might actually be sensible – for example, whether the long-term costs of some decisions might be greater than the benefits – it seems in incredibly poor taste. But people are dying! Those commentators are roundly excoriated on social media for letting the side down.

All of this is something of a puzzle. The whole essence of evidence-based policy, of policy modelling, is that you always calculate benefits and costs; of course this is difficult, and is never a politically neutral exercise, given that there are so many weightings and ways one might do so. Nonetheless, the weighing of costs and benefits is something of a staple of policy analysis, and also a hallmark of rationality. So why, in this time of crisis, would our politicians of all stripes be so keen to signal that they are not going to do the thing which policymakers usually do, which is calculate the costs and benefits and make the trade-offs?

Calculating costs and benefits comes from the moral tradition of utilitarianism: weighing which course provides the greatest good for the greatest number. What our politicians are saying at the moment comes from the deontological moral tradition, namely the tradition of saying that some things are just intrinsically right or wrong. ‘Everyone should have a ventilator!’; ‘Everyone should have a test!’; ‘No-one should be left behind!’. Deontological judgements are more intuitive than utilitarian ones. So the question is: in this crisis in particular, should our leaders be so keen to show themselves deontologists?

‘We will fight until the marginal cost exceeds the marginal benefit!’, said no war leader ever.

Some clue to this comes from recent research showing that people rate those who make utilitarian decisions as less trustworthy and less attractive to collaborate with than those who make deontological decisions. The decisions come from the infamous trolley problem: would you kill one person to save the lives of five? Across multiple studies, participants preferred and trusted decision-makers who would not; decision-makers who just thought you should never kill anyone, period.

The authors of this research speculate on the reasons we might spontaneously prefer deontologists. If you are to be my partner-in-arms, I would like to know that you will never trade me off as collateral damage, never treat me as a mere means to some larger end. I want to know that you will value me intrinsically. Thus, if you want to gain my trust, you need to show not just that you weight my outcomes highly, but that you will not even calculate the costs and benefits of coming to my aid. You will just do whatever it takes. Hence, we prefer deontologists and trust them more.

I am not sure this account quite works, though it feels like there is something to it. If I were one of the parties in the unfortunate trolley dilemma, then under a veil of Rawlsian ignorance I ought to want a utilitarian in charge, since I have a five-fold greater chance of being in the set who would benefit from a utilitarian decision than being the unfortunate one. If my collaboration partners are rationally utilitarian, I am per definition a bit more likely to benefit from this than lose, in the long run. But maybe there is a slightly different account that does work. For example, mentally simulating the behaviour of deontologists is easier; you know what they will and won’t do. Utilitarians: well, you have no idea what set of costs and benefits they might currently be appraising, so you are slightly in the dark about what they will do next. So perhaps we prefer deontologists as collaboration partners because at least we can work out what they are likely to when the chips are down.

In a time of crisis, like this one, what our leaders really need is to be trusted, to bring the populace along with them. That, it seems to me, is why we are suddenly hearing all this deontological rhetoric. They are saying: trust us, come with us, we are not even thinking about the costs, not even in private. And there is a related phemonenon. Apparently, deontological thinking is contagious. When we see others following moral rules irrespective of cost, it makes us more prone to do so too. I suspect this is because of the public-good nature of morality:- there is no benefit to my abiding by a moral rule unless everyone else is going to do so. We are quite pessimistic about the moral behaviour of others, especially in times of crisis, and so we need the visible exemplar, the reassurance, that others are being deontological, to ressure ourselves into doing so. In the current crisis, society needs people to incur costs for public-good benefits they cannot directly perceive, which is why, again and again, our leaders rightly proclaim not just the rules, but the unconditional moral force of those rules. Don’t calculate the infection risk for your particular journey; just don’t make it! (This is also why leaders who proclaim the rules but do not follow them themselves, as in the case of Scotland’s chief medical officer, are particular subjects of negative attention.)

I am not saying this outbreak of deontology is a bad thing; even in the long run it will be hard to write the definitive book on that. Indeed, perhaps it would be nice to have a bit more of this deontological spirit the rest of the time. The UK government recently decided that every homeless person should have the offer of a place to stay by the end of the week. Whatever the cost. To which I respond: why could that not have been true in any week in the last thirty years? Why only now? In normal life, governments are utilitarian about such matters, not weighing homelessness reduction as highly as other policy goals, and not prepared to do the relatively little it actually takes because they believe the costs are too high. Evidently, the populace’s intuitive preference for deontologists extends only to certain moral decisions, and certain times (such as times when we are all facing the same external threat). At other times, governments can get away with meanness and inaction: the populace does not notice, does not care, or can be convinced that solving the problem is too hard. Many people in progressive policy circles are no doubt asking: if we can achieve so much so fast in this time of crisis, how can we hang on to some of that spirit for solving social problems when the crisis is over?

Are people selfish or cooperative in the time of COVID-19?

On March 12th 2020, in a press conference, the UK’s chief scientific advisor Patrick Vallance stated that, in times of social challenge like the current pandemic, the people’s response is an outbreak of altruism. On the other hand, we have seen plenty of examples in the current crisis of bad behaviour: people fighting over the last bag of pasta, price gouging, flouting restrictions, and so on. So there is probably the raw material to tell both a positive and a negative story of human nature under severe threat, and both might even be true.

Rebecca Saxe and I are trying to study intuitive theories of human nature. That is, not what people actually do in times of threat or pandemic, but what people believe other people will do in such times. This is important, because so much of our own behaviour is predicated on predictions about what others will do: if I think everyone else is going to panic buy, I should probably do so too; if I think they won’t, there is no need for me to do so. We have developed a method where we ask people about hypothetical societies to which various events happen, and get our participants to predict how the individuals there will behave ‘given what you know about human nature’.

Our most recent findings (unpublished study, protocol here) suggest that (our 400 UK) participants’ intuitive theories of the response of human nature to adversity are more pessimistic than optimistic. For example, we asked what proportion of the total harvest (a) should ideally; and (b) would in practice get shared out between villagers in two agrarian villages, one living normally, and one facing an epidemic. Participants said the amount that should ideally be shared out would be nearly the same in the two cases; but the amount that actually would get shared out would be much lower in the epidemic (figure 1). Why? In the epidemic, they predicted, villagers would become more selfish and less moral; less cooperative and more nepotistic; less rule-bound and more likely to generate conflict (figure 2). One consequence of all of this predicted bad behaviour was that our participants endorsed the need for strong leadership, policing, and severe punishment in the epidemic village more than the baseline village, and felt there was less need to take the voices of the villagers into account. This is the package often referred to as right-wing authoritarianism, so our data suggest that the desire for this can be triggered by a perceived social threat and the expectation of lawlessness in the response to it. 

Figure 1. How much ideally should, and actually will get shared out in a normal village, and a village facing an epidemic. The epidemic is seen as massively reducing actual sharing, not the amount of sharing that is morally right. n = 400, for full protocol see here.
Figure 2. How much will various morally good and behaviours be seen in a normal village, and one facing an epidemic, as people are told to work together. n = 400, for full protocol see here.

We also asked the same participants about their predictions of the response of their fellow citizens to the current real pandemic (the data were collected last Friday, March 20th). There was really strong endorsement of the proposition that other people will behave selfishly; and rather low or variable endorsement of the proposition that others will behave cooperatively (figure 3). Overall, our participants gave slightly more endorsement to the idea that the pandemic will lead to conflict and distrust than the idea that it will lead to solidarity.

Figure 3. During the current pandemic, how much do you agree that others will behave selfishly (red); and that they will behave cooperatively (blue). n = 400, for full protocol see here.

So how do we square this with Vallance’s claim that there will be an outbreak of altruism, and indeed the evidence that, in under 24 hours, more than a quarter of a million people have registered as NHS volunteer responders. Well, Saxe and I are studying intuitive theories of human nature (my expectation of how you all will behave), not human nature itself (how you all actually behave). And there may be a systematic gaps between our intuitive theories of behaviour and that behaviour itself.  It might even make sense that there should be such gaps. For example, what may matter for people is often avoiding the worst-case scenarios (giving all your labour when no-one else gives any; forbearing to take from the common pot when everyone else is emptying it fast), rather than predicting the most statistically likely scenarios. Thus, our intuitive theories may sometimes function to detect actually rare outcomes that are bad to not see coming when they do come (what is often known as error management). And we don’t know, when our participants say that they think that others will be selfish during the pandemic, whether they mean they think that ALL others will be selfish, or that there is a small minority who might be selfish, but this minority is important enough to attend to.

There may be very good reasons for prominent figures like Vallance to point out his expectation of an outbreak of altruism. Humans can not only behave prosocially, but also signal their intention to do so, and thus break the spiral of ‘I am only doing this because I think everyone else is going to do so’. So, if intuitive theories of human nature have a hair-trigger for detecting the selfishness of others, than it becomes important not just to actually be cooperative with one another; but to signal clearly and credibly that we are going to doing so. This is where what psychologists call ‘descriptive norms’ (beliefs about what others are doing) become so important. I will if you will. I will if you are.

One more thing of interest in our study: I have a longstanding interest in Universal Basic Income as a policy measure. We asked our 400 participants whether government assistance in this pandemic time, and normal times, should come unconditionally to every citizen, or be based on assessment of needs. We find much stronger support for unconditionality (43%) in these times than normal times (19%). This may be the moment when Universal Basic Income’s combination of extreme simplicity, ease of administration, and freedom from dependency on complex and difficult-to-track information really speak for themselves. So much that seemed politically impossible, completely off the table, as recently as January, has now actually happened, or is being quite seriously discussed. And, perhaps, once you introduce certain measures, once the pessimistic theories of human nature are defeated in their predictions of how others will respond, then people will get a taste for them.

The view from the top of the hierarchy of evidence

About five years ago I began doing meta-analyses. (If, as they say, you lose a tooth for every meta-analysis you conduct, I am now gumming my way through my food.) I was inspired by their growing role as the premier source of evidence in the health and behavioural sciences. Yes, I knew, individual studies are low-powered, depend on very specific methodological assumptions, and are often badly done; but I was impressed by the argument that if we systematically combine each of these imperfect little beams of light into one big one, we are sure to see clearly and discover The Truth. Meta-analysis was how I proposed to counter my mid-life epistemological crisis.

I was therefore depressed to read a paper by John Ionnidis, he of ‘Why most published research findings are false’ fame, on how the world is being rapidly filled up with redundant, mass produced, and often flawed meta-analyses. It is, he argues, the same old story of too much output, produced too fast, with too little thought and too many author degrees of freedom, and often publication biases and flagrant conflicts of interest to boot. Well, it’s the same old story but now at the meta-level.

Just because Ionnidis’ article said this didn’t mean I believed it of course. Perhaps it’s true in some dubious research areas where there are pharmaceutical interests, I thought, but the bits of science I care about are protected from the mass production of misleading meta-analyses because, among other reasons, the stakes are so low.

However, I have been somewhat dismayed in preparing a recent grant application on post-traumatic stress disorder (PTSD) and telomere length. The length of telomeres (DNA-protein caps on the ends of chromosomes) is a marker of ageing, and there is an argument out there (though the evidence is weaker than you might imagine, at least for adulthood) that stress accelerates telomere shortening. And having PTSD is certainly a form of stress. So: do people suffering from PTSD have shorter telomeres?

It seems that they do. There are three relevant meta-analyses all coming to the same conclusion. One of those was done by Gillian Pepper in my research group. It was very general, and only a small subset of the studies it covered were about PTSD in particular, but it did find that PTSD was associated with shorter telomere length. As I wanted some confidence about the size of the difference, I looked closely at the other two, more specialist, meta-analyses.

A meta-analysis specifically on PTSD (by Li et al) included five primary studies, and concluded that PTSD was reported with shorter telomere length by -0.19 (95% confidence interval -0.27 to -0.10). All good; but then I thought: 0.19 what? It would be normal in meta-analyses to report standardised mean differences; that is, differences between groups expressed in terms of the variability in the total sample of that particular study. But when I looked closely, this particular meta-analysis had expressed its differences absolutely, in units of the T/S ratio, the measure of relative telomere length generally used in epidemiology. The problem with this, however, is that the very first thing you ever learn about the T/S ratio is that it is not comparable across studies. A person with a T/S ratio of 1 from one particular lab might have a T/S ratio of 1.5 0r 0.75 from another lab. The T/S ratio tells you about the relative telomere lengths of several samples run in the same assay on the same PCR machine with the same control gene at the same time, but it does not mean anything that transfers across studies like ‘1 kilo’, ‘1 metre’ or ‘400 base pairs’ do.

If you don’t use standardized mean differences, integrating multiple T/S ratio studies to obtain an overall estimate of how much shorter the telomeres of PTSD sufferers are is a bit like taking one study that finds men are 6 inches taller than women, and another study that finds men are 15 centimetres taller than women, and concluding that the truth is that men are taller than women by 10.5. And the problems did not stop there: for two of the five primary studies, standard errors from the original papers had been coded as standard deviations in the meta-analysis, resulting in the effect sizes being overstated by nearly an order of magnitude. The sad thing about this state of affairs is that anyone who habitually and directly worked with T/S data would be able to tell you instantly that you can’t compare absolute T/S across studies, and that a standard deviation of 0.01 for T/S in a population study simply couldn’t be a thing. You get a larger standard deviation than that when you run the very same sample multiple times, let alone samples from different people. Division of labour in science is a beautiful thing, of course, and efficient, but having the data looked over by someone who actually does primary research using this technique would very quickly pick up nonsensical patterns.

I hoped the second meta-analysis (by Darrow et al.) would save me, and in lots of ways it was indeed much better. For PTSD, it included the same five studies as the first, and sensibly used standardized mean differences rather than just differences. However, even here I found an anomaly. The authors reported that PTSD was associated with a much bigger difference in telomere length than other psychological disorders were. This naturally piqued my interest, so I looked at the forest plot for the PTSD studies. Here it is:

Excerpt from figure 2 of meta-analysis by Darrow et al.

You can see that most of the five studies find PTSD patients have shorter telomeres than controls by maybe half a standard deviation or less. Then there is one (Jergovic 2014) that apparently reports an almost five-sigma difference in telomere length between PTSD sufferers and controls. Five sigma! That’s the level of evidence that you get when you find the Higgs boson! It would mean that PTSD suffers had telomeres something like 3500 base pairs shorter than controls. It is simply inconceivable given everything we know about telomeres–given everything, indeed, we know about whole-organism biology, epidemiology and life. There really are not any five-sigma effects.

Of course, I looked it up, and the five-sigma effect is not one. This meta-analysis too had mis-recorded standard errors as standard deviations for this study. Correcting this, the forest plot should look like this:

Forest plot of the PTSD data from the meta-analysis by Darrow et al., with the ‘standard deviations’ corrected to standard errors in the study by Jergovic 2014.

Still an association overall, but the study by Jergovic 2014 is absolutely in line with the other four studies in finding the difference to be small. Overall, PTSD is no more strongly associated with telomere length than any other psychiatric disorder is. (To be clear, there are consistent cross-sectional associations between telomere length and psychatric disorders, though we have argued that the interpretation of these might not be what you think it is). What I find interesting is that no-one, author or peer-reviewer, looked at the forest plot and said, ‘Hmm…five sigma. That’s fairly unlikely. Maybe I need to look into it further’. It took me all of ten minutes to do this.

I don’t write this post to be smug. This was a major piece of work well done by great researchers. It probably took them many months of hard labour. I am completely sure that my own meta-analyses contain errors of this kind, probably at the same frequency, if not a higher one. I merely write to reflect the fact that, in science, the main battle is not against nature, but against our own epistemic limitations; and our main problem is not insufficient quantity of research, but insufficient quality control. We are hampered by many things: our confirmation biases, our acceptance of things we want to believe without really scrutinizing the evidence carefully enough (if the five-sigma had been in the other direction, you can be sure the researchers would have weeded it out), our desire to get the damned paper finished, the end of our funding, and the professional silos that we live in. And, as Ionnidis argued, vagaries in meta-analyses constitute a particular epistemic hazard, given the prestige and authority accorded to meta-analytic conclusions, sitting as they are supposed to do atop the hierarchy of evidence.

These two meta-analyses are of a relatively simple area, and cover the same 5 primary studies, and though they come reassuringly to the same qualitative conclusion, I still have no clear sense of how much shorter the telomeres of people with PTSD are than those of other people. The effect sizes found in the five primary studies as reported by Darrow et al. and by Li et al. are no better correlated than chance. So the two meta-analyses of the same five studies don’t even agree which study it was found the largest effect:

Two published meta-analyses of the same five studies show no better than chance agreement in their views of what the relative effect sizes were. Even allowing for the fact that they measure the effects on different scales, you might at least hope the rank order would be the same.

I hoped that meta-analysis would lift us above the epistemic haze, and perhaps it still will. But let’s not be too sanguine: as well as averaging out human error and researcher degrees of freedom, it is going to introduce a whole extra layer. What next? Meta-meta-analysis, of course. And after that…..?

Blue/Orange to play Durham

We are delighted to announce that Blue/Orange will play in Durham on Tuesday March 21st 2017, at 19:30pm, at the Empty Shop HQ in Framwellgate Bridge, DH1 4SJ. Tickets are available from here.

Wes is in no condition to sell fruit

This simple space is going to be wonderful for the piece. I like productions where I can carry the set down with me on the train. And don’t forget, of course, performances in Newcastle on the Friday and Saturday of the same week, at Northern Stage.

And this little birdie got none……

We’ve just published a new paper on the effects of early-life adversity in starlings. We are particularly interested in how early adversity affects the shortening of telomeres. Telomeres are the protective DNA caps on the ends of our (and their) chromosomes, whose length is often used as a marker of biological age. We have found previously that nestlings who are smaller than their brood mates lose telomeres faster in the first few weeks of life. A bad start ages you.

However, we didn’t know what it is about being at a disadvantage in the brood that accelerates telomere loss. Is it that you don’t get so much to eat? Or is it the stress of having to struggle and beg more to hold your own when those around you are bigger and stronger? Or a bit of both? We decided to test this in a hand-rearing experiment. Here, we would play the parents and thus decide what each bird experienced in its formative days.

andrews-clareMany collaborators contributed to this study.
Clare Andrews did much of the hard work

We took four siblings from each of eight wild broods. From each family, one sibling was fed all it wanted nine times a day; a second was fed nine times a day but only 70% of what the first sibling got; the third was fed all it wanted nine times a day but had to do an additional 18 minutes a day of begging; and the fourth, who had it toughest of all, received 70% of what the third sibling did, and also did the extra begging.

Telomeres shortened rapidly in early life, but they shortened differentially according to early experience
Telomeres shortened rapidly in early life, but they shortened differentially according to early experience

The birds all survived – in fact these adversities are well within the natural range of what a wild starling might experience. Though they all fledged into normal adult birds, we found a difference in their telomeres (in red blood cells) when they were two months old. The more adversity we had given them, the greater the magnitude of their telomere shortening over their early life. Everyone’s telomeres get shorter in this period anyway, since a lot of cell division is going on, but the birds with more adversity showed more shortening. It seems that both the amount you get to eat, and the amount you have to struggle for it, both affect the pace of your cellular ageing, and do so additively (that is, if you have both adversities, it’s worse than having either one alone). This is important, since we know in both mammals and birds that conditions experienced in early life can affect survival. Cellular ageing might point to a mechanism by which this could occur.


We also made some other curious observations. The birds showed some differences in adult inflammation according to their developmental histories, but some combinations of adversity increased adult inflammation, whilst others reduced it. Birds that had had little food, but not had to beg for it, went on to become relatively obese as juveniles (which is interesting given the links between childhood stress and obesity in humans), but birds that had had little food and had to beg for it remained lean. Thus, it looks like early-life experience matters for your biology as an adult, but it matters in complex ways, not as simple as saying ‘more early-life adversity = more adult problems’.

The behavioural constellation of deprivation

We are used to the idea that the poor behave in a certain way- living for the day, devil may care, fatalistic, impulsive, enjoying life while they can- whilst the rich are more future-oriented, self-controlled and cautious. Just read the novels of Zola, for example, for vivid descriptions the appeal of present consumption over savings amongst those with the poorest lot in society.


There’s actually a lot of evidence that, statistically, this is more than just a myth. People of lower socioeconomic status-in Britain for example-do tend to discount the future more heavily, are less health-conscious and future oriented. This can lead to a particular discourse about poverty: that people’s poverty is the consequence of their impulsive behaviours, and hence that poverty is in some sense the poor’s fault or failings that leads to their being poor (there would be no other good reason to be poor, right?).

In a ngpew paper coming out in the journal Behavioral and Brain Sciences, Gillian presents a different analysis of this phenomenon. What if the relative present-orientation that often goes with poverty were not a failing or weakness (or some kind of primordial character trait), but a sensible response to certain kinds of structural conditions?

Imagine a world where you felt that regardless of what efforts you made, you would be likely to be killed or lose everything at a relatively young age. What would you do? Would it make you all the more careful to eat your kale, have your testicular cancer screening and often check the pressure in your car tyres? Probably not. You would probably quite sensibly conclude that there was not much payoff to doing those things since something else would probably get you long before the benefits of those tedious efforts could be realised. You’d try to enjoy the life you could have while you had it. Thus, your present-orientedness, would be an appropriate response to the conditions under which you had to live.

That, in essence, is Gillian’s argument. People of lower socioeconomic position in contemporary societies tend to be exposed to worlds where they are in less of a position to realise the returns on future-oriented investments, because more uncontrollable bad shocks happen to them than happen to the rich (and, with growing inequality, a more conditional benefits system, and increased economic precariousness, this is tending to become even more true). If you accept that this is true, then much of their average behaviour, ex hypothesi, makes sense. It’s not a moral failing; you would respond that way too.

Of course, it’s not quite that simple, as a perusal of the paper will reveal. One of the big themes Gillian is interested in in the paper is the idea that the consequences of poverty become progressively embedded via feedback processes. If you start out down one lifestyle path – perhaps for a small but comprehensible reason – the alternative track becomes further and further away and the chance of reaching it, or the point of trying, less and less. This kind of embedding can even become transgenerational, as the behavioural strategies of one generation determine the starting input given to the next. But the big take-home message of the paper is that structural disadvantage is in the explanatory driving seat for the behaviour of the poor. The kinds of interventions, for example for health inequalities, that are of the greatest importance are those that address the structural disadvantages that make poor people likely (whatever they do) to have positive futures they can control and rely on.


Behavioral and Brain Sciences is a peer commentary journal, so we are bating our breaths to see what colleagues in the social as well as biological sciences make of it all. One of the issues in this area is that so many disciplines have something to say about social inequality that it is easy to end up (a) relabelling ideas that in fact already exist in other disciplines; or (b) having ideas that are actually a bit different from the received ideas, but are mis-recognised as being something rather different from what you intend them to be. Let’s hope we have steered between that particular Scylla and Charibdis.



Hitting the Wall & Blue/Orange


longimage_hittingthewallI am very excited about our imminent production of Matthew Warburton’s Hitting the Wall at Northern Stage on November 30th.

In 2012, Wayne Soutter, a middle-aged father of two, attempted to swim the as-yet unconquered sea-channel between the Mull of Kintyre and Ireland. Hitting The Wall is a theatrical recreation of that extraordinary endeavour. Cold seas. Strong winds. Treacherous tides and 50 foot jellyfish. What could possibly go right?

Based on blogs and interviews with Wayne and Paul (the boat captain on the attempt), Hitting The Wall asks why we choose to do things that might better be left undone.

It’s going to be fun evening, with a chance to discuss the mad endeavour (actually, both mad endeavours, the swim and play about the swim) with the creative team afterwards.

You can buy tickets from here.

I am also equally ecstatic to announce that Straw Bear’s next production after that will be Joe Penhall’s extraordinary Blue/Orange, on March 24th and 25th 2017. Blue/Orange is about many things, most especially schizophrenia, race and Welsh rarebit, and was described (when it originally appeared in 2000) as the finest new play in the English language for a generation. We are presenting it in association with Brain Awareness Week. The cast has been finalised. More updates soon.

When methods meet

x160915165623_57372-jpg-pagespeed-ic-ahw2nzeu4iThe Scottish Graduate School of Social Science has made some interesting short films about the different methods available to social scientists, and in particular, whether they can fruitfully be brought together. In one of the films, the ethnographer Sam Hillyard and I discussed classic ethnography and experiments; can they be brought together, how are they different, and in what ways are they alike?

You can access the film and an associated worksheet here.