How should we reduce the wellbeing costs of poverty?

Unless you have been in hiding for the past forty years, you will know that even in countries that are rich in aggregate, poverty is really bad for wellbeing – bad for physical health, bad for mental health, and bad for satisfaction with life in general. Definitions of poverty for developed nations generally include some notion of relativeness: it’s about having less than most people in your society. Under this definition, you can’t ever entirely make poverty go away, since numerical equality of income and wealth is unlikely (though, of course, you can make the gaps smaller, and this seems generally to be a good idea, for all kinds of reasons including those discussed below). So it is worth asking: are there places where the wellbeing burden of relatively low income is smaller, and places where it is bigger? And what do those places do differently?

I have been having a look at this using the data from the European Quality of Life Survey (2012). (This is a digression from a larger ongoing project with Tom Dickins investigating the consequences of inequality using that dataset, see pre-registration here. There are more details of the sample and measures in that document). I first plotted life satisfaction against income (measured in multiples of the median for the country) by country.

Overall, people on relatively low incomes are less satisfied with life than whose with incomes above the median. Beyond that, the life satisfaction dividend of income quickly tends to saturate. However, the figure seems to show lots of fascinating heterogeneity in the shape of the relationship. Some of this captures real things, like the compressed income distribution of Denmark, and the very dispersed one of the UK. Much of the variation in shape above the median, though, is probably pretty spurious: there are very small numbers of respondents with incomes above about 4 times the median, so those trends are not very precisely estimated (and just don’t ask about Austria). And half of all the people (by the definition of the median) are crammed into the little area between 0 and 1 on the horizontal analysis, so it’s a misleading scale.

What if we split respondents into those whose incomes are above and below the country median? By comparing the mean life satisfactions of those two groups for each country we can get a sense of the psychological cost of being relatively poor.

This seems more satisfying: those on low incomes are everywhere (except Austria) less satisfied than those on high incomes, but the magnitude of the gap is quite variable: compare, say, Denmark to Poland. So, the question becomes: what accounts for cross-country variation in the size of this gap? (There are other questions too, such as what accounts for cross-country differences in the overall mean, but here I consider only the rich-poor gap).

A couple of candidate factors leap to mind. First, there is the inequality of the income distribution itself. Where the gaps are bigger, being at the bottom of the distribution might be worse than where the gaps are smaller. This could be true for several reasons: for a start, where the income distribution is more dispersed, many of those below the median are a long way below it in absolute terms, with all the material problems that is going to cause. Or maybe, as Kate Pickett and Richard Wilkinson have tireslessly argued, where the gaps are bigger, people notice them more, and this puts them into a psychologically more unpleasant mode: stressed, competitive, and paranoid about social position. This would affect the poor more strongly. So one candidate for explaining the size of the rich-poor life satisfaction gap is the inequality of the income distribution of the country, which we measure with something called the Gini coefficient.

Another possibility is that being relatively poor is more tolerable in countries with good access to public services for everyone, especially healthcare. In the health inequalities literature, this is often referred to as the neomaterialist hypothesis (for example in this paper here). This is somewhat confusing, since it is not appreciably more materialist than all of the other possible hypotheses, nor obviously more neo, nor quite as Marxist as it sounds. Anyway, we have measures of this in the dataset: ratings of problems accessing healthcare, and ratings of problems accessing other services such as culture, public transportation and other amenities. I calculated the mean ratings of problems of access to these things just for the people whose incomes were below the country median. I then plotted the size of the rich-poor life satisfaction gap against our three potential explanatory variables: the Gini coefficient, problems accessing healthcare, and problems accessing other services. (Note, on the plot below, the gap is expressed so that positive 0.5 means the poorer half of the population have life satisfaction that is 0.5 scale points lower than the richer half. And yes, the one outlier with a negative gap – the poor are happier – is Austria).

On the face of it, there seem to be positive associations between all three predictors and the size of the life satisfaction gap. However, causal inference is tricky, not least because, unsurprisingly, all three predictors are also somewhat correlated with one another: in more unequal countries, it’s also more difficult for people with low incomes to access healthcare. I ran a model selection algorithm. The best fitting model simply contains problems accessing healthcare (and problems accessing healthcare also has the strongest bivariate correlation with the size of the life satisfaction gap, 0.5). In other words, if people on low incomes can easily access healthcare, the burden of their low income for their satisfaction with life is substantially mitigated. For every standard deviation reduction in problems accessing healthcare, the life satisfaction gap between rich and poor shrinks by half a standard deviation.

However, a second model with both problems accessing healthcare and the Gini coefficient as predictors comes out almost equally likely to be the best model to explain the data. In other words: the data are nearly as compatible with an explanation where both the dispersion of the income distribution and the problems the relatively poor have accessing healthcare contribute to the life satisfaction gap. Even in this model, though, problems accessing healthcare is the variable with the larger beta coefficient.

This is not yet a proper analysis of these data – this is only a taster, and no firm conclusions can yet be drawn. However, it does look like this particular dataset (and outcome measure) points one way rather than another in ongoing debates about how to level up health and wellbeing: should we be prioritising making cash transfers (i.e. increasing low incomes), or providing universal (free) basic services, thus alleviating some of the problems lack of money leads to ? This is a complex argument on which I have usually been on the cash transfer side. Doubtless both are required. However, this dataset does seem to point to the importance of excellent and accessible services for making modern life tolerable at all rungs of the income distribution.

To subscribe to this blog via email, enter your address in the Subscribe box on the right.

Breaking cover on the watching eyes effect

I have seldom had much to say on the watching eyes effect. Even though it is the most cited research I have ever been involved in, it was always a side project for me, and also for Melissa Bateson, and so neither of us has been very active in the debate that goes on around it. Along with our students, we did an enjoyable series of field experiments using watching eyes to impact prosocial and antisocial behaviour. The results have all been published and speak for themselves: not much more to say (we really don’t have a file drawer). However, I have just finished reading not one but two unrelated books (this one and this one) that cite our watching eyes coffee room experiment as a specimen of the species ‘cute psychology effect that failed to survive the replication crisis’, and so I feel I do need to break cover somewhat and make some remarks.

In our coffee room experiment, we found that contributions to an honesty box for paying for coffee substantially increased when we stuck photocopied images of eyes on the wall in the coffee corner, compared to when we stuck images of flowers on the wall. This makes the point that people are generally nicer, more cooperative, more ethical, when they believe they are being watched, a point that I believe, in general terms, to be true.

The account of the experiment’s afterlife in both books goes something like: this was a fun result, it made intuitive sense, but it subsequently failed to replicate, and so it belongs to the class of psychology effects that is not reliable, or at least, whose effect size is very much smaller than originally thought. It is certainly true that many psychology effects of that vintage turn out not to be reliable in just such a way; and also true that there are many null results appearing using watching eyes manipulations. I just want to point out, though, that the statement that our coffee room results have failed to replicate is not, to my knowledge, a correct one (and my knowledge might be the problem here, I have not really kept up with this stuff as well as I should).

The key point arising from our coffee room experiment was that: in (1) real-world prosocial tasks, when (2) people do not know they are taking part in an experiment, (3) few real eyes are around , and (4) the rate of spontaneous prosociality is low, then displaying images of watching eyes can increase the rate of prosocial compliance. I do not know of any attempt at a direct replication, with either a positive or a null result. We can’t do one because we don’t have a kitchen with an honesty box any more, and besides, our study population knows all about our antics by now. Someone else should do one. Indeed, many people should.

There have been some conceptual replications published, preserving all of features (1) – (4), but focusing on a different behaviour and setting than paying for one’s coffee in a coffee room. Some of these are by our students (here and here for example). Some are not: for example, see this 2016 study on charitable donations in a Japanese tavern or izakaya and the anti-dog littering campaign developed and evaluated by charity Keep Britain Tidy. All of these can be considered positive replications in that features (1)-(4) were present, a watching eyes image intervention was used, and there was a positive effect of the eye images on the behaviour. The effect sizes may have been smaller than our original study: it is hard to compare directly given the different designs, and I have not tried to do so. But, all these studies found evidence for an effect.

Given the existence of positive conceptual replications, and the lack, to my knowledge, of any null replication, why did both books describe our coffee room result as one that had not replicated? They were referring, no doubt, to the presence in the literature of several studies in which (a) participants completed an artificial prosociality task such as a Dictator Game, when (b) they knew they were taking part in an experiment, (c) they were therefore under the observation of the experimenter in all conditions, and (d) the rate of prosociality was high at baseline; and the watching eyes effect was null.

It’s perhaps not terribly surprising that watching eyes effects are often null under circumstances (a)-(d), instead of (1)-(4). When the rate of prosociality is already high, it is not easy for a subtle intervention to make it any higher. Besides, anyone who knows they are taking part in an experiment already feels, quite realistically, that their behaviour is under scrutiny, so some eye images are unlikely to do much more on top of that. That’s the whole concern about studying prosociality in the lab: baseline rates of prosociality may be atypically high, exactly because people know that the experimenter is watching. But this should not be confused with the claim that the watching eyes effect has been shown to be unreliable under the rather different circumstances (1)-(4). That might turn out to be the case too, but, to my knowledge, it has not thus far.

There are two possible sources of the book authors’ confusion with respect to the afterlife of the effects observed in our coffee room study. The first is that they are using our coffee room experiment as a metonym for the whole of the watching eyes literature. The original studies of the watching eyes effect, the ones that preceded ours (notably this one), were done under circumstances (a)-(d), and as we have seen, those effects have not reliably replicated. But it fallacious to say thereby that our rather different studies have not replicated. Something about watching eyes effects did not replicate, our study is something about watching eyes, therefore our study did not replicate. Doesn’t quite follow. By chance, we might have stumbled on a set of circumstances where watching eyes effects are real and potentially useful, even though they turn out to be more fragile and transitory in the domain – experimental economic games – where they were first documented. Testing whether this is right requires replications that have the right properties to be sure. Doing more (easy because in the lab) replications with the wrong properties does not seem to add much at this point.

Second, the book authors were probably influenced by a published meta-analysis arguing that watching eyes do not increase generosity. Whatever its merits, that meta-analysis, by design, only included studies done under circumstances (a)-(c) (and therefore for which (d) is usually true). It did not include our coffee room study, any of our conceptual replications of our coffee room study, or any of the conceptual replications of our coffee room study done by anyone else. So, it can hardly be taken as showing that the effects in our coffee room study are not replicable. That would be like my claiming that Twenty-Twenty cricket matches are short and fun, and you responding by saying that you have been to a whole series of test matches and they were long and boring, not short and fun. True, but not relevant to my claim. My claim was not that all cricket is short and fun, only that certain forms of it may be.

It’s really important, in psychology, that we attempt and publish replications, do meta-analyses, and admit when findings turn out to be false positives. But, it’s also important to understand what the implicational scope of a non-replication is. Replication study B says nothing about the replicatory potential of the effects in study A if constitutive pillars of study A’s design are completely absent from study B, even if the manipulation is similar. Also, we really ought to do more field experiments, where participants do not know they are in an experiment and are really going about their business, if the question at hand is to do with real-world behaviour and interventions thereupon.

I am quite happy to accept the truth however the dust settles on the watching eyes effect, but for real-world prosocial behaviours in field settings when no-one is really watching and participants don’t know they are taking part in an experiment, I’m not prepared to bet against it just yet.

Subscribe to this blog by entering your email in the subscribe box on the right. Regular posts on psychology, behavioural science and society.

Live fast and die young (maybe)

Quite a few big ideas have made it across from evolutionary theory into the human sciences in the last few years. I can’t think of any that has been more culturally successful than the ‘live fast, die young principle’. This principle, which was originally articulated by George C Williams in the late 1950s, says something like the following: if you live in a world where the unavoidable risk of mortality is high, you should prioritise the present relative to the future. Specifically, you should try to reproduce sooner, even at the expense of your neglected body falling apart down the line. After all, what is the point in investing in longevity when some mishap will probably do you in anyway before you reach a peaceful old age?

The principle was originally invoked to explain inter-species differences in the timing of reproduction, and in senescence (the tendency of bodies to fall apart in multiple ways after a certain number of years of life, without clear external causes). But it has come to crop everywhere: in psychology (to explain individual differences in impulsivity, and the impact of early-life trauma), in sociology (to explain socioeconomic differences) , in anthropology and history (to explain social change). I’ve even found it invoked in explaining how travel agencies responded to the disruption caused by the pandemic. And then of course, there is the famous story of Henry Ford, asking engineers to tour the scrapyards of America, finding parts of model T cars that were still in good condition in scrapped carcasses. They found that the kingpin was never worn out; ‘make them less well!’, came the response.

On the face of it, this is a beautiful example of theory guiding observation, science in the hypothetico-deductive mode working well. Williams produced an a priori theoretical argument. Subsequent data from many species supported its central prediction: species that experience a higher mortality rate in their natural environments mature sooner (and smaller); have larger litters; have shorter inter-birth intervals, and may senesce sooner. Then, it was like the joke about confirmation bias: once you are aware of it, you see it everywhere. For people spotting it (for example in the behaviour of travel agencies), it was nice to link back to the prestige of evolutionary biology and the idea that the pattern was predicted by theory. But there is a problem. The problem is not with the bit that says that the living fast and dying young pattern of behaviour occurs; this does indeed seem to be an empirical regularity of some generality. The problem lies in the bit that says theory predicts it will occur.

It seems to be a well-kept secret that there is no consensus in evolutionary biology that Williams’ theoretical argument was correct. That’s putting it mildly. As a 2019 review in Trends in Ecology and Evolution put it: ‘[Williams’] idea still motivates empirical studies, although formal, mathematical theory shows it is wrong’. The authors of that paper suggest that Williams’ argument persists not because it is sound, but because it is intuitive. Intuitive it definitely is. I remember acting as editor for another important paper showing mathematically that –other things being equal — the risk of unavoidable mortality can have no impact on the optimal timing of reproduction, or any other trade-off parameter for that matter. And I remember thinking: but it obviously must, you need to rewrite the equations until they say so! (I didn’t say this in my editorial review of course).

The difficulty is that, until now, there has been no explanation of why Williams’ argument does not work that is anything like as intuitive as the original argument was. A new paper by Charlotte de Vries, Mattias Galipaud and Hanna Kokko comes as close as anything I have ever encountered to giving me an intuitive magic for the failure of the argument that is nearly strong enough to battle the intuitive magic of the argument itself. (For those of you who don’t know Hanna, the combination of brilliant theoretical insight and limpid clarity in communication is exactly the kind of behaviour she has form for.) The paper, as well as explaining the difficulty with Williams’ original argument, signposts the ways we might rescue the live fast, die young principle, and in the process shows how the scientific method – theory leads to prediction leads to test – is not entirely as we like to imagine.

Alright, let’s roll our sleeves up. Let us imagine a population living in a dangerous world, whose members put everything into reproducing in the first year of their lives. Then they are knackered, and die off even if the dangers of their world haven’t got to them first. There is no selection for reproducing less when young and being healthy for a second year, because so few are going to make it that far anyway. Now, due to a change in the ecology, the rate of unavoidable mortality goes down. Now, many more individuals can be around in the second year. By reproducing a bit less in the first year, individuals can be in better health in the second year, leading to higher total lifetime reproductive success. And this delaying is now more worth doing, because they now have a better chance of making it through and reaping the benefit. This is Williams’ original argument, and it still seems to make a lot of sense.

We are talking about evolution though, as Ernst Mayr taught us, thinking about evolution requires thinking about populations, not about isolated individuals. If the rate of mortality goes down, the population is going to grow exponentially, at a faster rate than it did previously. In an exponentially growing population, an offspring that you have sooner is more valuable in fitness terms than one you have later. De Vries et al. give us a nice figure to see why this is the case.

The thing that gets maximised by evolution is the proportionate representation in the population of some lineage or type. And it is easy to see that in a growing population, an offspring placed into the population earlier (the left hand star on the figure) can become ancestor to a greater fraction of the population by time 3 than an offspring placed into the population later (the right hand star). This is because it is placed in when the cone is narrower, and its descendants begin their exponential growth in number sooner. So, when a population is growing exponentially, there is a fitness bonus attaching to any offspring you manage to have soon. To look at this the other way about, there is a relative fitness penalty attached to any offspring you have later in time.

So when the rate of unavoidable mortality goes down, two things happen. The chances of making it to a second year goes up, increasing the expected return on investments in reproductive capacity in the second year. And, because the population begins to grow exponentially, the relative fitness penalty for an offspring being placed a year later gets bigger. Your chances of having an offspring in the second year go up; and the relative value to your fitness of a offspring a year later goes down; and these two effects perfectly cancel one another. The change in the risk of unavoidable mortality ends up having no effect at all on your optimal trade-off between reproduction and health.

Aha, you might say. But this penalty for delayed offspring only applies in exponentially growing populations. So Williams’ argument would work in a population where mortality decreased, but the population size remained stable. Indeed, but the trouble is that the only thing that keeps populations from growing exponentially is mortality. Imagining mortality reducing without that causing the population to grow exponentially is like imagining putting air into a balloon without that balloon starting to get bigger.

Exponentially growing populations are eventually limited and stabilized by competition amongst their members (so-called density-dependent regulation). The way to rescue Williams’ argument is to incorporate some kind of density-dependent regulation that checks population growth, but still rewards those who delay reproductive effort. The problem is that there are many ways that density-dependent regulation can work (as competition increases, the old lose out, the young lose out, fecundity is reduced amongst inexperienced breeders, it is reduced amongst experienced breeders, new juveniles can’t find nest sites, old adults can’t defend their nest sites, etc.). de Vries et al. consider ten different scenarios for density-dependent regulation. They find that under some of these, reducing unavoidable mortality selects for delaying reproduction (the Williams pattern); in some reducing unavoidable mortality selects for accelerating reproduction (the anti-Williams pattern); and in some, reducing unavoidable mortality has no effect either way.

There is perhaps no way of saying a priori which of the ten scenarios for density-dependent regulation actually captures what happens in any particular real population. So, we can’t know a priori what our theoretical model should be. Instead, de Vries et al. say, reasonably enough, we can use the fact that we do observe the live fast, die young pattern across many species to narrow down what the right theoretical model is. In other words, we can predict what our theory should be using data. That is, since populations with higher mortality do evolve earlier reproduction, we can infer that they should be modelled using the models in which this is predicted to happen. Those are models where the burden of density-dependent competition falls particularly on juveniles trying to start reproducing, or on fecundity. So, since we stumbled on empirical regularity (populations with higher mortality evolve earlier reproduction and senescence), we learned something, indirectly, about what assumptions we should make about how populations work.

I take a few lessons away three lessons from this example. One, when people say ‘evolutionary theory predicts…..’, they are often just peddling an intuition, an intuition that may not work out, or may only work out under restrictive assumptions. In this case, if we wanted to say ‘theory predicts’ the live fast die young pattern, what we ought to say ‘perfectly reasonable theory predicts either this pattern, or the opposite, or no pattern at all, depending on assumptions’.

Two, our intuitions don’t do population thinking. We compute what the payoff to one individual would be of doing A rather than B. We don’t spontaneously think about what the changing population distribution would be if A rather than B become common. But evolution is a population process, and so you can’t work out what will happen without modelling populations (you often get misleading results just by totting up costs and benefits to one representative individual).

Third, theory does not really come before data, even in a relatively ‘theoretical’ and mathematically formalized discipline like evolutionary biology. It’s the empirical findings that tell us what kinds of theoretical models we should construct, and how to constrain them. This is worth noting for those who are advocating more formal theory as a way out of psychology’ current crisis. Sure, we need to build more models, but model-making prior to any data is blind. You don’t just figure out the right model in a vacuum and go off and test it. Data – observation of the world, often rather descriptive and interest-driven – tell us what theoretical assumptions we should be making, almost as much as theoretical models then tell us which further data to collect. It’s a cycle, a game of tag between models and data, in which data predict theory as well as the opposite.

Subscribe to this blog by entering your email in the subscribe box on the right. Regular posts on psychology, behavioural science and society.

The bosses pretend to have theories, and we pretend to test them

Leo Tiokhin has hosted a new blog series on the use of formal models in metascience and, more generally, in psychology. The starting point for the series is the increasing recognition that psychology’s weaknesses don’t just lie in its recent replicatory embarassments. The underlying theories that all those (possibly non-replicable) experiments aim to test are also weak. That is: the theory as stated could give rise to multiple patterns in the data, and the data could be compatible with multiple theories, given how vaguely these are stated. In my contribution to Leo’s series, I invoked the old Soviet joke: the bosses pretend to have theories, and we pretend to test them.

Several contributors to the series point out the virtue, given this problem, of formalizing theories in mathematical or computational models. This undoubtedly has merit: if you convert a verbal psychological theory into a formal model, then you expose all your tacit assumptions; you are forced to make decisions where you had left these vague; you discover if your conclusions really must follow from your premises; and you are left with a much tighter statement of what your do-or-die predictions are. This is all good, and true.

However, my contribution, and also to some extent the one by Willem Frankenhuis, Karthik Panchanation and Paul Smaldino, provide a line of argumentation in the opposite direction. Theories in psychology are often weak exemplars of theories. One move is to make them stronger through formalization. The opposite move is to not claim that they are theories. I think for many areas of psychology, that makes a lot more sense. There are many important avenues of scientific enquiry that do not exactly have theories: descriptions of psychological phenomena; ontologies of psychological processes; uncovering of which things pattern together; working out which levers move which parts in which direction, and which levers move none. These enquiries can certainly feature, and meaningfully test, hypotheses, in local kind of way, but may not be underlain by anything as grandiose as a fully-identified theory.

Of course, there is always some kind of idea underlying research questions. Often in psychology this is better described as an interpretative framework, or a proto-theory. To try to press it into the mould of fully identified theory may be to subject it to the heartbreak of premature definition, which can take years to get over. The problem has been that psychologists have had to claim to be using a ‘theory’ to get their papers published (a little recognized form of publication bias). A psychology paper has needed to start with a ‘theory’ with a three-letter acronym like an opera has needed to start with an overture: terror management theory, error management theory, planned behaviour theory, reasoned action theory, social identity theory, social cognitive theory, social norms theory, regulatory focus theory, regulatory fit theory, life history theory, life course theory – I think you know the game. I even coined an acronym for the generation of these acronyms: CITEing, or Calling It Theory for Effect.

None of these frameworks is ready to be implemented as a computational model, which raises all kinds of interesting questions. What kinds of beasts are these? Would it be better if we did not need to invoke them or their ilk at all, and could just state what questions we want to answer, what parallels there are elsewhere, and what our hunches are? Although having theories in science is great, it might not be a prerequisite. Precisely stating your theory is especially excellent when you do actually have one. You should not feel pressured to state one as a rhetorical move, if in fact you are doing description or proto-theory.

People often misunderstand the pre-registration revolution as being the requirement to only do confirmatory analyses. But, as Willem and I have argued, it’s not this at all. It’s the freedom to do confirmatory analyses when these are appropriate, and exploratory ones when these are appropriate, and be clear and unashamed about which it really is: don’t muddle the one with the trappings of the other. Likewise with theories: be clear when you really have one, and be clear when you don’t. Just as having better theories can lead to a better psychology, so, possibly, invoking no theory, in some cases.

People sometimes link this point to the claim that ‘psychology is a young science’, not ready for its Newton or Darwin yet. That’s starting to look a little disreputable. I personally think there should be a one hundred year cut-off on the old ‘young science’ ploy, which means psychology has overstayed. The deeper problem for theories in psychology, as David Pietraszewski and Annie Wertz have recently argued, is not insufficient time, but a constant flip-flop about what its proper level of analysis is. Some frameworks work at the intentional level of analysis (the intuitive way we speak about people, with beliefs, desires, feelings, selves, things they can deliberately control and things they can’t), and others at the functional level of analysis (i.e. how do the information processing mechanisms actually work, which may or may not look isomorphic to intentional-level descriptions of the same processes). Add to the mix that evolutionary psychologists are sometimes also thinking at the level of ultimate causation (fitness consequences), and there is a heady recipe for total incoherence about what we are trying to do and what kind of thing would make us satisfied we had in fact done it. Hence the constant churn of what seem like adequate theories to some people, and seem entirely unlike adequate theories to other people. This is the big problem psychology needs to sort out: what level of analysis do we want in an explanation. The answer may be sometimes one, sometimes another, but ‘theories’, and authors, need clearly to say which they are trying to do.

Big thanks to Leo for hosting this interesting series.

My muse is not (or, possibly, is) a horse

I’ve written one thing in my life that people really want to read: a 2017 essay called Staying in the game. When I first posted it, the unprecedented traffic over a couple of days caused my web site host to suspend the service. A lot of people commented or emailed when it came out. Many people have read it since. Every few months it has a little outbreak of virality, usually via Twitter or Facebook. The most recent one was this week. Given that people seem to be interested in the essay, and more generally in understanding the creative processes of their fellow academics, I thought it might be fun to write some more about the history of this essay, how it came about.

Staying in the game existed for some time, in several versions. It tries to do several things. It contains a self-help or how-to guide for actual or aspiring academics, a kind of Seven habits of moderately effective (and slightly nerdy) people. There is something of the confessional in it (and that, I think, is what people, especially younger academic colleagues, like). I wanted to say that it is OK, normal, permitted, to struggle in your academic career, to not do as well as you hoped or think you ought to have done. We senior people have been there too. There are longeurs and surprises, so we should all be compassionate to ourselves and not make too much of a big deal out of it. And there is a third part, which is about my ambivalence, both personal and philosophical, towards the markers of value that we tend to draw on in academia–the prizes, the status markers, the impact factors, and so forth. If you want to give me these, that would be welcome (my address is freely available); but I worry about them.

Each of these three parts was originally conceived of as a separate project, possibly a whole separate book in some cases, but in any event a completely separate paper or chapter. But when I started to write an essay called My muse is not (or, possibly, is) a horse, the different threads kept tangling each other so much that in the end I thought, well damn it, I might as well just deal with them all here and now, and that is what Staying in the game ends up doing. I never wrote all the other bits. And the starting point – whether one’s muse should or not be likened to a horse, fell to the cutting room floor.

The how-to guide part was inspired by my observation, from reading various writers’ and mathematicians’ accounts of their process, of how much convergence there seemed to be: 2-3 hours concentrated time, usually in the morning, every day, with no multi-tasking, and a fair dose of quiet ritual surrounding it. I was going to systematically review these, and the various Writer’s Way type courses, pointing up the points of convergence, linking this to some light consensual evolutionary psychology about which ways of working are natural for human beings. This was possibly going to be a whole book project, but it ended up barely a few pages with some promissory examples. When I started the essay that was to become Staying in the game I kept forward-referring to this as-yet-nonexistent scholarly work, to such a degree that I realised it might not be necessary for me to ever actually do the scholarly work: I could just assert the things I wanted to say, not as forward references to a future piece of scholarship, but just, in the traditional Oxford manner, as assertions.

The actual thing I started to write was, as I have mentioned, an essay called My muse is not (or, possibly, is) a horse. The title comes from a wonderful letter written by Nick Cave to MTV in 1996, when he had been nominated for a best male artist award. (Disclaimer: this letter, which I found in a book, is wonderful. I know nothing else about Nick Cave, either his music or his political views.) Cave begins the letter with extremely gracious thanks to MTV for supporting and recognising him. I love this: his purpose in the letter is to spurn their accolade, but he begins humbly and with generous recognition of their benign intentions, not condescending or disdaining his nominators in any way, but genuinely thanking them. Then he goes on:

Having said that, I feel that it is necessary for me to request that my nomination…be withdrawn and furthermore any awards or nominations…that may arise in future years be presented to those who feel more comfortable with the competitive nature of these award ceremonies. I myself do not…I have always been of the opinion that my music…exists beyond the realms inhabited by those who would reduce things to mere measuring.

My relationship with my muse is a delicate one at the best of times and I feel that it is my duty to protect her from influences that may offend her fragile nature. She comes to me with [a] gift, and in return I treat her with the respect I feel she deserves-in this case this means not subjecting her to the indignities of judgement and competition.

I have thought a lot about this, in the academic context. We professors are a strange combination of plumbers and poets. No-one would expect a poet to be able produce any particular poem, to order, to a timetable (except the poor poet laureate, how awful must that be?). For poets, we recognise the individuality and autonomy of the muse: they need to write about what they want to write about, whenever they want to write it. Plumbers though, you would rather like it if they came to your house at the time appointed and could fix your leak on demand, like a reliable automaton. And we professors are somewhere in between. Our work, both the topic and the style, is deeply personal, creative, unpredictable; the ideas we will have, the ways of expressing them we will dream up; the ways we go about them and the scope of the bits we chew on. And yet, as a community we think it is quite fine to assess ourselves on simple common yardsticks. We pretty much expect–and account for in spreadsheets– so much volume per person per unit time, and quality measurable on a linear scale. This is quite problematic. I agree that professors, in the public pay and providing a public good, ought to work hard and produce enough value to justify their subvention. And we need to figure out what research is better or worse. But we are not machines: we are people making meaning. What meaning we create is highly personal and hard to account for retrospectively, let alone prospectivelely. The value to society, although very important, is hard to assess. So when universities review our performance each year, often quite crudely and numerically, or we review ourselves, this is totally understandable and also quite reductive. I don’t have any good answers. I merely point out the tensions.

I can see why Nick Cave would want to opt out of the process of judgment. What he fears, actually, is not being judged a failure, but being judged a success: he knows that could change the authenticity of what he does. There’s a lot about that in Staying in the game, the compromise, as an academic, between what you have to do to earn your crust, what you get rewarded for, and what you do from personal identity and the search for meaning. Cave sums it up this way:

My muse is not a horse and I am in no horse race, and if indeed she was, I still would not harness her to this tumbrel… muse may spook! May bolt! May abandon me completely!

My first thought on reading was that Nick Cave is a person who knows a lot about muses. My second thought, though, was that quite possibly he is a person who does not know a lot about horses.

The point of saying his muse is not a horse is that a horse is an automaton, a machine that can be put to work in service of any goal, substituable, predictable, quantifiable (so many horse power). Whereas the muse…the muse is….well, each muse is unique; beautiful when flowing; temperamental; departs from type-specific expectations in unruly ways; needs to be wooed and soothed and petted and given the best possible living conditions; has individual needs and strengths; is stubborn, foul, resentful at times; and needs to be treated as having final value, not just as instrumental. Like a living thing really, rather than a machine. An animal. A big, powerful animal, one that can be domesticated but has a wild ancestor and is stubborn at times. Like a…well like a horse. Indeed, in the passage above metaphor, the muse is first not a horse, and then, in fact, a horse, a horse that should not be harnessed to the wrong thing (a tumbrel, of all things, why does no-one ever mention tumbrels except in the context of the French revolutionary terror?), for fear she may bolt.

This post has no conclusion. I said all I needed to say in Staying in the game and deleted the rest, but it has been nice to resurrect some of the history here. Five years on, I still like Staying in the game, though I do worry that it is a bit too normatively preachy: fat shaming for people who like to check their email in the morning. It was meant to liberate people from anxiety about work, but more than once I have had people say to me ‘Oh, Daniel, I can’t possibly talk to you, I haven’t done my proper work yet today!’ I’m sorry about that. We are all doing the best we can. I am certainly no better than you.

Nick Cave ends his letter on a perfect note:

So once again, to the people at MTV, I appreciate the zeal and energy….I truly do and say thank you and again I say thank you but no…no thank you.

Why does inequality produce high crime and low trust? And why doesn’t making punishments harsher solve the problem?

Societies with higher levels of inequality have more crime, and lower levels of social trust. That’s quite a hard thing to explain: how could the distribution of wealth (which is a population-level thing) change decisions and attitudes made in the heads of individuals, like whether to offend? After all, most individuals don’t know what the population-level distribution of wealth is, only how much they have got, perhaps compared to a few others around them. Much of the extra crime in high-inequality societies is committed by people at the bottom end of the socioeconomic distribution, so clearly individual-level of resources might have something to do with the decision; but that is not so for trust: the low trust of high-inequality societies extends to everyone, rich and poor alike.

In a new paper, Benoit de Courson and I attempt to provide a simple general model of why inequality might produce high crime and low trust. (By the way, it’s Benoit’s first paper, so congratulations to him.) It’s a model in the rational-choice tradition: it assumes that when people offend (we are thinking about property crime here), they are not generally doing so out of psychopathology or error. They do so because they are trying their best to achieve their goals given their circumstances.

So what are their goals? In the model, we assume people want to maximise their level of resources in the very long term. But-and it’s a critical but- we assume that there is a ‘desperation threshold’: a level of resources below which it is disastrous to drop. The idea comes from classic models of foraging: there’s a level of food intake you have to achieve or, if you are a small bird, you starve to death. We are not thinking of the threshold as literal starvation. Rather, it’s the level of resources below which it becomes desperately hard to participate in your social group any more, below which you become destitute. If you get close to this zone, you need to get out, and immediately.

In the world of the model, there are three things you can do: work alone, which is unprofitable but safe; cooperate with others, which is profitable just as long as they do likewise; or steal, which is great if you get away with it but really bad if you get caught (we assume there are big punishments for people caught stealing). Now, which of these is the best thing to do?

The answer turns out to be: it depends. If your current resources are above the threshold, then, under the assumptions we make, it is not worth stealing. Instead, you should cooperate as long as you judge that the others around you are likely to do so too, and just work alone otherwise. If your resources are around or below the threshold, however, then, under our assumptions, you should pretty much always steal. Even if it makes you worse off on average.

This is a pretty remarkable result: why would it be so? The important thing to appreciate is that with our threshold, we have introduced a sharp non-linearity in the fitness function, or utility function, that is assumed to be driving decisions. Once you fall down below that threshold, your prospects are really dramatically worse, and you need to get back up immediately. This makes stealing a worthwhile risk. If it happens to succeed, it’s the only action with a big enough quick win to leap you back over the threshold in one bound. If, as is likely, it fails, you are scarcely worse off in the long run: your prospects were dire anyway, and they can’t get much direr. So the riskiness of stealing – it sometimes you gives you a big positive outcome and sometimes a big negative one – becomes a thing you should seek rather than avoid.

Fig. 1. The right action to choose, in Benoit’s model, according to your current resources and the trustworthiness of others in your population. The threshold of desperation is shown as zero on the x-axis.

So, in summary, the optimal action to choose is as shown in figure 1. If you are doing ok, then your job is to figure out how trustworthy your fellow citizens are (how likely to cooperate): you should cooperate if they are trustworthy enough, and hunker down alone otherwise. If you are desperate, you basically have no better option than to steal.

Now then, we seem to be a long way from inequality, which is where we started. What is it about unequal populations that generates crime? Inequality is basically the spread of the distribution of resources: where inequality is high, the spread is wide. A wide spread pretty much guarantees that at least some individuals will find themselves down below the threshold at least some of the time; and figure 1 shows what we expect them to do. If the spread is narrower, then fewer people hit the threshold, and fewer people have incentives to start offending. Thus, the inequality of the resource distribution ends up determining the occurrence of stealing, even though no agent in this model ‘knows’ what that distribution looks like: each individuals only knows resources what they have, and how other individuals behaved in recent interactions.

What about trust? We assume that individuals build up trust through interacting cooperatively with others and finding that it goes ok. In low-inequality populations, where no-one is desperate and hence no-one starts offending, individuals rapidly learn that others can be trusted, everyone starts to cooperate, and all are better off over time. In high-inequality populations, the desperate are forced to steal, and the well-off are forced not to cooperate for fear of being victimized. One of the main results of Benoit’s model is that in high-inequality populations, only a few individuals actually ever steal, but still this behaviour dominates the population-level outcome, since all the would-be cooperators soon switch to distrusting solitude. It is a world of gated communities.

Another interesting feature is that making punishments more severe has almost no effect at all on the results shown in figure 1. If you are below the threshold, you should steal even if the punishment is arbitrarily large. Why? Because of the non-linearity of the utility function: if your act succeeds, your prospects are suddenly massively better, and if it fails, there is scarcely any worse off that it is possible to be. This result could be important. Criminologists and economists have worried why it is that making sentences tougher does not seem to deter offending in the way it feels intuitively like it ought. This is potentially an answer. When you have basically nothing left to lose, it really does not matter how much people take off you.

In fact, our analyses suggest some conditions under which making sentences tougher would actually be counterproductive. Mild punishments disincentivize at the margin. Severe sentences can make individuals so much worse off that there may be no feasible legitimate way for them to ever regain the happy zone above the threshold. By imposing a really big cost on them through a huge punishment, you may be committing them to a life where the only recourse is ever more desperate attempts to leapfrog themselves back to safety via illegitimate means.

So if making sentences tougher does not solve the problems of crime in high-inequality populations, according to the model, is there anything that does? Well, yes: and readers of this blog may not be surprised to hear me mention it. Redistribution. If people who are facing desperation can expect their fortunes to improve by other means, such as redistributive action, then they don’t need to employ such desperate means as stealing. They will get back up there anyway. Our model shows that a shuffling of resources so that the worst off are lifted up and the top end is brought down can dramatically reduce stealing, and hence increase trust. (In an early version of this work, we simulated the effects of a scenario we named ‘Corbyn victory’: remember then?).

The idea of a desperation threshold does not seem too implausible, but it is a key assumption of our model, on which all the results depend. Our next step is to try to build experimental worlds in which such a threshold is present – it is not a feature of typical behavioural-economic games – and see if people really do respond as predicted by the model.

De Courson, B., Nettle, D. Why do inequality and deprivation produce high crime and low trust?. Scientific Reports 11, 1937 (2021).

Why is Universal Basic Income suddenly such a great idea?

The idea of an unconditional basic income, paid to all (UBI), has a long history. Very long in fact. Yet, although the policy has been deemed philosophically and (sometimes) economically attractive, it has generally languished in the bailiwick of enthusiasts, mavericks, philosophers and policy nerds (these are, by the way, overlapping categories). But now, with the global pandemic, UBI is very much back in the spotlight. Previous sceptics are coming out with more enthusiastic assessments (for example, here and here). Spain apparently aims to roll out a UBI scheme ‘as soon as possible‘ in response to the crisis, with the aim that this becomes a ‘permanent instrument’ of how the Spanish state works. And even the US Congress relief cheques for citizens, though short-term, have a UBI-like quality to them. So why, all over the place, does UBI suddenly seem like such a great idea?

Answering this question requires answering another, prior one: why didn’t people think it was such a great idea before? To understand why people’s objections have gone away, you need to understand what they were before, as well as why they seem less compelling in this time of upheaval. UBI is a policy that appears to suffer from ‘intuition problems’. You can model it all you like and show that it would be feasible, efficient and cost effective; but many people look at it and think ‘Mah! Giving people money without their having to do anything! Something wrong with that!’. It’s like a musical chord that is not quite in tune; and that’s a feeling that it is hard to defeat with econometrics. But intuitions such as these might be very context-dependent: and the context of society certainly has changed in the last few months.

To try to understand if the acceptibility of UBI to the public has changed for these pandemic-affected times, and, if so, why, Matthew Johnson, Elliott Johnson, Rebecca Saxe and I collected data on April 7th from 400 UK and 400 US residents. This was not a representative sample from either country, but we had a good balance of genders and a spread of age.

We first described a UBI policy to respondents, and asked them to rate how good an idea they found it, both for normal times, and for the times of this pandemic and its aftermath. As the figure below shows, they almost universally thought it was a better idea for times of the pandemic and its aftermath than before-on average, 16 points better on a 1-100 scale.

Ratings of how good an idea a UBI scheme is, for normal and pandemic times, UK and USA samples. Shown are medians, inter-quartile ranges, and the distribution of the data.

Actually, these participants thought UBI was a better idea for normal times than I would have expected, which is hard to interpret without some historical data on this participant pool. Support for UBI has found to vary a lot, in the past, depending on how you frame the policy and what alternatives you pit it against. In our study, it was not up against any alternative scheme; just rated as a good or bad idea.

Now, why was UBI thought a better idea for pandemic times than normal times? We listed nine of the most obvious advantages and disadvantages of the policy, and asked respondents to say how important they felt each of these would be for their overall assessment of the policy – again, as a policy for normal times, and for pandemic times. The advantages were: knowing there is a guaranteed income reduces stress and anxiety; the policy is simple and efficient; the universality gives a value to every individual in society; and the system cannot be cheated. The disadvantages were: it’s expensive; you would be paying money to the rich, who do not need it; people might use it irresponsibly, like on gambling or drugs; people would be less prone to work for money; and people who did not deserve it would get it. All of these pros and cons were rating as having some importance for the desirability of the policy in normal times, though naturally with different weightings for different people.

Rated importance of nine advantages and disadvantages for the overall assessment of the desirability of UBI, for normal times, and for the times of the pandemic and its aftermath.

So what was different when viewing the policy for pandemic times? Basically, three key advantages (reduces stress and anxiety; efficiency and simplicity; and giving a value to every individual) became much more important in pandemic times; whilst three of the key drawbacks (people might use it irresponsibly; the labour market consequences; and receipt by the undeserving), became rather less important. I guess these findings make sense; given the rapidity with which the pandemic has washed over the population, you really need something simple and efficient; given how anxiety-provoking it is, it is imperative to reassure people; and given that millions of people are economically inactive anyway, not through their own choice, potential labour market consequences are moot. Rather to our surprise, the expense of the policy was not rated as the most important consideration for normal times; and nor had this become a less important consideration now, when figures of £330 billion or $1 trillion seem to be flying around all over the place.

The strongest predictors of supporting UBI in normal times were rating highly: the importance of stress and anxiety reduction; the efficiency of the policy; and the valuing of every individual. So it is no mystery that in pandemic times, when those particular three things are seen as much more important, that the overall level of support for the policy should go up. In other words, what the pandemic seems to do is make all people weight highly the considerations that the most pro-UBI people already rated highly for normal times anyway. Perhaps the most intriguing of the pandemic-related shifts in importance of the different factors was the increase in importance of giving every individual in society a value. It is not obvious to me why the pandemic should make us want every individual to have a value, any more than we should want this the rest of the time. Perhaps because the pandemic is some kind of common threat, that we can only solve by all working collaboratively? Perhaps because the pandemic reminds us of our massive interdependence? Because we are all in some important sense alike in the face of the disease?

Whatever the reason, our respondents felt it was more important, in these times, for every person in society to be accorded a value. And for me, that is one of the most philosophically appealing aspects of UBI. Not that it decreases income inequality, which unless it is very large, it will probably not do to any appreciable extent; not just that it gives people certainty in rapidly fluctuating times, which it would do; but that its existence constitutes a particular type of social category, a shared citizenship. Getting your UBI would be one of those few things that we can all do – like having one vote or use of an NHS hospital – to both reflect and constitute our common membership of, and share in, a broader social entity. In other words, in addition to all its pragmatic and adminstrative appeal, UBI bestows a certain dignity on everyone, that may help promote health, foster collective efficacy, and mitigate the social effects of the myriad and obvious ways we are each valued differently by society. And these times, apparently, are making the value of this unconditional dignity more apparent.

One last point: people who consider themselves on the left of politics were more favourable to UBI than those on the right (particularly in the US sample; which is interesting given the places of Milton Friedman and Richard Nixon in UBI’s pedigree). But the boost in support for the policy that came from pandemic applied absolutely across the political spectrum. Even those on the right wing of our sample thought it was a pretty good idea for pandemic times (with, of course, the caveat that this was not a representative sample, and we did not offer them any alterrnative to UBI that they might have preferred). So, just possibly, an advantage of UBI schemes in this uncertain time is that pretty much everyone, whatever their ideology, can see what the appeal of the scheme is. That may yet prove important.

Support for UBI for normal times (solid lines) and pandemic times (dotted lines), for the UK and USA, against self-placement on a scale of 1=left-wing to 100=right-wing.

June 2nd 2020 update: We have now written this study up. You can download the preprint here.

This is no time for utilitarianism!

An interesting feature of the current crisis is the number of times we hear our leaders proclaiming that there are not weighing costs against benefits: ‘We will do whatever it takes!’. ‘We will give the hospitals whatever they need!’. And even, memorably, from the UK Chancellor, ‘We will set no limit on what we spend on this!’. No limit. I mean when did the UK Treasury ever say that? Maybe only during the war, which is a clue.

Such statements seem timely and reassuring just at the moment. When people are timorous enough to question whether some of this largesse might actually be sensible – for example, whether the long-term costs of some decisions might be greater than the benefits – it seems in incredibly poor taste. But people are dying! Those commentators are roundly excoriated on social media for letting the side down.

All of this is something of a puzzle. The whole essence of evidence-based policy, of policy modelling, is that you always calculate benefits and costs; of course this is difficult, and is never a politically neutral exercise, given that there are so many weightings and ways one might do so. Nonetheless, the weighing of costs and benefits is something of a staple of policy analysis, and also a hallmark of rationality. So why, in this time of crisis, would our politicians of all stripes be so keen to signal that they are not going to do the thing which policymakers usually do, which is calculate the costs and benefits and make the trade-offs?

Calculating costs and benefits comes from the moral tradition of utilitarianism: weighing which course provides the greatest good for the greatest number. What our politicians are saying at the moment comes from the deontological moral tradition, namely the tradition of saying that some things are just intrinsically right or wrong. ‘Everyone should have a ventilator!’; ‘Everyone should have a test!’; ‘No-one should be left behind!’. Deontological judgements are more intuitive than utilitarian ones. So the question is: in this crisis in particular, should our leaders be so keen to show themselves deontologists?

‘We will fight until the marginal cost exceeds the marginal benefit!’, said no war leader ever.

Some clue to this comes from recent research showing that people rate those who make utilitarian decisions as less trustworthy and less attractive to collaborate with than those who make deontological decisions. The decisions come from the infamous trolley problem: would you kill one person to save the lives of five? Across multiple studies, participants preferred and trusted decision-makers who would not; decision-makers who just thought you should never kill anyone, period.

The authors of this research speculate on the reasons we might spontaneously prefer deontologists. If you are to be my partner-in-arms, I would like to know that you will never trade me off as collateral damage, never treat me as a mere means to some larger end. I want to know that you will value me intrinsically. Thus, if you want to gain my trust, you need to show not just that you weight my outcomes highly, but that you will not even calculate the costs and benefits of coming to my aid. You will just do whatever it takes. Hence, we prefer deontologists and trust them more.

I am not sure this account quite works, though it feels like there is something to it. If I were one of the parties in the unfortunate trolley dilemma, then under a veil of Rawlsian ignorance I ought to want a utilitarian in charge, since I have a five-fold greater chance of being in the set who would benefit from a utilitarian decision than being the unfortunate one. If my collaboration partners are rationally utilitarian, I am per definition a bit more likely to benefit from this than lose, in the long run. But maybe there is a slightly different account that does work. For example, mentally simulating the behaviour of deontologists is easier; you know what they will and won’t do. Utilitarians: well, you have no idea what set of costs and benefits they might currently be appraising, so you are slightly in the dark about what they will do next. So perhaps we prefer deontologists as collaboration partners because at least we can work out what they are likely to when the chips are down.

In a time of crisis, like this one, what our leaders really need is to be trusted, to bring the populace along with them. That, it seems to me, is why we are suddenly hearing all this deontological rhetoric. They are saying: trust us, come with us, we are not even thinking about the costs, not even in private. And there is a related phemonenon. Apparently, deontological thinking is contagious. When we see others following moral rules irrespective of cost, it makes us more prone to do so too. I suspect this is because of the public-good nature of morality:- there is no benefit to my abiding by a moral rule unless everyone else is going to do so. We are quite pessimistic about the moral behaviour of others, especially in times of crisis, and so we need the visible exemplar, the reassurance, that others are being deontological, to ressure ourselves into doing so. In the current crisis, society needs people to incur costs for public-good benefits they cannot directly perceive, which is why, again and again, our leaders rightly proclaim not just the rules, but the unconditional moral force of those rules. Don’t calculate the infection risk for your particular journey; just don’t make it! (This is also why leaders who proclaim the rules but do not follow them themselves, as in the case of Scotland’s chief medical officer, are particular subjects of negative attention.)

I am not saying this outbreak of deontology is a bad thing; even in the long run it will be hard to write the definitive book on that. Indeed, perhaps it would be nice to have a bit more of this deontological spirit the rest of the time. The UK government recently decided that every homeless person should have the offer of a place to stay by the end of the week. Whatever the cost. To which I respond: why could that not have been true in any week in the last thirty years? Why only now? In normal life, governments are utilitarian about such matters, not weighing homelessness reduction as highly as other policy goals, and not prepared to do the relatively little it actually takes because they believe the costs are too high. Evidently, the populace’s intuitive preference for deontologists extends only to certain moral decisions, and certain times (such as times when we are all facing the same external threat). At other times, governments can get away with meanness and inaction: the populace does not notice, does not care, or can be convinced that solving the problem is too hard. Many people in progressive policy circles are no doubt asking: if we can achieve so much so fast in this time of crisis, how can we hang on to some of that spirit for solving social problems when the crisis is over?

Are people selfish or cooperative in the time of COVID-19?

On March 12th 2020, in a press conference, the UK’s chief scientific advisor Patrick Vallance stated that, in times of social challenge like the current pandemic, the people’s response is an outbreak of altruism. On the other hand, we have seen plenty of examples in the current crisis of bad behaviour: people fighting over the last bag of pasta, price gouging, flouting restrictions, and so on. So there is probably the raw material to tell both a positive and a negative story of human nature under severe threat, and both might even be true.

Rebecca Saxe and I are trying to study intuitive theories of human nature. That is, not what people actually do in times of threat or pandemic, but what people believe other people will do in such times. This is important, because so much of our own behaviour is predicated on predictions about what others will do: if I think everyone else is going to panic buy, I should probably do so too; if I think they won’t, there is no need for me to do so. We have developed a method where we ask people about hypothetical societies to which various events happen, and get our participants to predict how the individuals there will behave ‘given what you know about human nature’.

Our most recent findings (unpublished study, protocol here) suggest that (our 400 UK) participants’ intuitive theories of the response of human nature to adversity are more pessimistic than optimistic. For example, we asked what proportion of the total harvest (a) should ideally; and (b) would in practice get shared out between villagers in two agrarian villages, one living normally, and one facing an epidemic. Participants said the amount that should ideally be shared out would be nearly the same in the two cases; but the amount that actually would get shared out would be much lower in the epidemic (figure 1). Why? In the epidemic, they predicted, villagers would become more selfish and less moral; less cooperative and more nepotistic; less rule-bound and more likely to generate conflict (figure 2). One consequence of all of this predicted bad behaviour was that our participants endorsed the need for strong leadership, policing, and severe punishment in the epidemic village more than the baseline village, and felt there was less need to take the voices of the villagers into account. This is the package often referred to as right-wing authoritarianism, so our data suggest that the desire for this can be triggered by a perceived social threat and the expectation of lawlessness in the response to it. 

Figure 1. How much ideally should, and actually will get shared out in a normal village, and a village facing an epidemic. The epidemic is seen as massively reducing actual sharing, not the amount of sharing that is morally right. n = 400, for full protocol see here.
Figure 2. How much will various morally good and behaviours be seen in a normal village, and one facing an epidemic, as people are told to work together. n = 400, for full protocol see here.

We also asked the same participants about their predictions of the response of their fellow citizens to the current real pandemic (the data were collected last Friday, March 20th). There was really strong endorsement of the proposition that other people will behave selfishly; and rather low or variable endorsement of the proposition that others will behave cooperatively (figure 3). Overall, our participants gave slightly more endorsement to the idea that the pandemic will lead to conflict and distrust than the idea that it will lead to solidarity.

Figure 3. During the current pandemic, how much do you agree that others will behave selfishly (red); and that they will behave cooperatively (blue). n = 400, for full protocol see here.

So how do we square this with Vallance’s claim that there will be an outbreak of altruism, and indeed the evidence that, in under 24 hours, more than a quarter of a million people have registered as NHS volunteer responders. Well, Saxe and I are studying intuitive theories of human nature (my expectation of how you all will behave), not human nature itself (how you all actually behave). And there may be a systematic gaps between our intuitive theories of behaviour and that behaviour itself.  It might even make sense that there should be such gaps. For example, what may matter for people is often avoiding the worst-case scenarios (giving all your labour when no-one else gives any; forbearing to take from the common pot when everyone else is emptying it fast), rather than predicting the most statistically likely scenarios. Thus, our intuitive theories may sometimes function to detect actually rare outcomes that are bad to not see coming when they do come (what is often known as error management). And we don’t know, when our participants say that they think that others will be selfish during the pandemic, whether they mean they think that ALL others will be selfish, or that there is a small minority who might be selfish, but this minority is important enough to attend to.

There may be very good reasons for prominent figures like Vallance to point out his expectation of an outbreak of altruism. Humans can not only behave prosocially, but also signal their intention to do so, and thus break the spiral of ‘I am only doing this because I think everyone else is going to do so’. So, if intuitive theories of human nature have a hair-trigger for detecting the selfishness of others, than it becomes important not just to actually be cooperative with one another; but to signal clearly and credibly that we are going to doing so. This is where what psychologists call ‘descriptive norms’ (beliefs about what others are doing) become so important. I will if you will. I will if you are.

One more thing of interest in our study: I have a longstanding interest in Universal Basic Income as a policy measure. We asked our 400 participants whether government assistance in this pandemic time, and normal times, should come unconditionally to every citizen, or be based on assessment of needs. We find much stronger support for unconditionality (43%) in these times than normal times (19%). This may be the moment when Universal Basic Income’s combination of extreme simplicity, ease of administration, and freedom from dependency on complex and difficult-to-track information really speak for themselves. So much that seemed politically impossible, completely off the table, as recently as January, has now actually happened, or is being quite seriously discussed. And, perhaps, once you introduce certain measures, once the pessimistic theories of human nature are defeated in their predictions of how others will respond, then people will get a taste for them.

The view from the top of the hierarchy of evidence

About five years ago I began doing meta-analyses. (If, as they say, you lose a tooth for every meta-analysis you conduct, I am now gumming my way through my food.) I was inspired by their growing role as the premier source of evidence in the health and behavioural sciences. Yes, I knew, individual studies are low-powered, depend on very specific methodological assumptions, and are often badly done; but I was impressed by the argument that if we systematically combine each of these imperfect little beams of light into one big one, we are sure to see clearly and discover The Truth. Meta-analysis was how I proposed to counter my mid-life epistemological crisis.

I was therefore depressed to read a paper by John Ionnidis, he of ‘Why most published research findings are false’ fame, on how the world is being rapidly filled up with redundant, mass produced, and often flawed meta-analyses. It is, he argues, the same old story of too much output, produced too fast, with too little thought and too many author degrees of freedom, and often publication biases and flagrant conflicts of interest to boot. Well, it’s the same old story but now at the meta-level.

Just because Ionnidis’ article said this didn’t mean I believed it of course. Perhaps it’s true in some dubious research areas where there are pharmaceutical interests, I thought, but the bits of science I care about are protected from the mass production of misleading meta-analyses because, among other reasons, the stakes are so low.

However, I have been somewhat dismayed in preparing a recent grant application on post-traumatic stress disorder (PTSD) and telomere length. The length of telomeres (DNA-protein caps on the ends of chromosomes) is a marker of ageing, and there is an argument out there (though the evidence is weaker than you might imagine, at least for adulthood) that stress accelerates telomere shortening. And having PTSD is certainly a form of stress. So: do people suffering from PTSD have shorter telomeres?

It seems that they do. There are three relevant meta-analyses all coming to the same conclusion. One of those was done by Gillian Pepper in my research group. It was very general, and only a small subset of the studies it covered were about PTSD in particular, but it did find that PTSD was associated with shorter telomere length. As I wanted some confidence about the size of the difference, I looked closely at the other two, more specialist, meta-analyses.

A meta-analysis specifically on PTSD (by Li et al) included five primary studies, and concluded that PTSD was reported with shorter telomere length by -0.19 (95% confidence interval -0.27 to -0.10). All good; but then I thought: 0.19 what? It would be normal in meta-analyses to report standardised mean differences; that is, differences between groups expressed in terms of the variability in the total sample of that particular study. But when I looked closely, this particular meta-analysis had expressed its differences absolutely, in units of the T/S ratio, the measure of relative telomere length generally used in epidemiology. The problem with this, however, is that the very first thing you ever learn about the T/S ratio is that it is not comparable across studies. A person with a T/S ratio of 1 from one particular lab might have a T/S ratio of 1.5 0r 0.75 from another lab. The T/S ratio tells you about the relative telomere lengths of several samples run in the same assay on the same PCR machine with the same control gene at the same time, but it does not mean anything that transfers across studies like ‘1 kilo’, ‘1 metre’ or ‘400 base pairs’ do.

If you don’t use standardized mean differences, integrating multiple T/S ratio studies to obtain an overall estimate of how much shorter the telomeres of PTSD sufferers are is a bit like taking one study that finds men are 6 inches taller than women, and another study that finds men are 15 centimetres taller than women, and concluding that the truth is that men are taller than women by 10.5. And the problems did not stop there: for two of the five primary studies, standard errors from the original papers had been coded as standard deviations in the meta-analysis, resulting in the effect sizes being overstated by nearly an order of magnitude. The sad thing about this state of affairs is that anyone who habitually and directly worked with T/S data would be able to tell you instantly that you can’t compare absolute T/S across studies, and that a standard deviation of 0.01 for T/S in a population study simply couldn’t be a thing. You get a larger standard deviation than that when you run the very same sample multiple times, let alone samples from different people. Division of labour in science is a beautiful thing, of course, and efficient, but having the data looked over by someone who actually does primary research using this technique would very quickly pick up nonsensical patterns.

I hoped the second meta-analysis (by Darrow et al.) would save me, and in lots of ways it was indeed much better. For PTSD, it included the same five studies as the first, and sensibly used standardized mean differences rather than just differences. However, even here I found an anomaly. The authors reported that PTSD was associated with a much bigger difference in telomere length than other psychological disorders were. This naturally piqued my interest, so I looked at the forest plot for the PTSD studies. Here it is:

Excerpt from figure 2 of meta-analysis by Darrow et al.

You can see that most of the five studies find PTSD patients have shorter telomeres than controls by maybe half a standard deviation or less. Then there is one (Jergovic 2014) that apparently reports an almost five-sigma difference in telomere length between PTSD sufferers and controls. Five sigma! That’s the level of evidence that you get when you find the Higgs boson! It would mean that PTSD suffers had telomeres something like 3500 base pairs shorter than controls. It is simply inconceivable given everything we know about telomeres–given everything, indeed, we know about whole-organism biology, epidemiology and life. There really are not any five-sigma effects.

Of course, I looked it up, and the five-sigma effect is not one. This meta-analysis too had mis-recorded standard errors as standard deviations for this study. Correcting this, the forest plot should look like this:

Forest plot of the PTSD data from the meta-analysis by Darrow et al., with the ‘standard deviations’ corrected to standard errors in the study by Jergovic 2014.

Still an association overall, but the study by Jergovic 2014 is absolutely in line with the other four studies in finding the difference to be small. Overall, PTSD is no more strongly associated with telomere length than any other psychiatric disorder is. (To be clear, there are consistent cross-sectional associations between telomere length and psychatric disorders, though we have argued that the interpretation of these might not be what you think it is). What I find interesting is that no-one, author or peer-reviewer, looked at the forest plot and said, ‘Hmm…five sigma. That’s fairly unlikely. Maybe I need to look into it further’. It took me all of ten minutes to do this.

I don’t write this post to be smug. This was a major piece of work well done by great researchers. It probably took them many months of hard labour. I am completely sure that my own meta-analyses contain errors of this kind, probably at the same frequency, if not a higher one. I merely write to reflect the fact that, in science, the main battle is not against nature, but against our own epistemic limitations; and our main problem is not insufficient quantity of research, but insufficient quality control. We are hampered by many things: our confirmation biases, our acceptance of things we want to believe without really scrutinizing the evidence carefully enough (if the five-sigma had been in the other direction, you can be sure the researchers would have weeded it out), our desire to get the damned paper finished, the end of our funding, and the professional silos that we live in. And, as Ionnidis argued, vagaries in meta-analyses constitute a particular epistemic hazard, given the prestige and authority accorded to meta-analytic conclusions, sitting as they are supposed to do atop the hierarchy of evidence.

These two meta-analyses are of a relatively simple area, and cover the same 5 primary studies, and though they come reassuringly to the same qualitative conclusion, I still have no clear sense of how much shorter the telomeres of people with PTSD are than those of other people. The effect sizes found in the five primary studies as reported by Darrow et al. and by Li et al. are no better correlated than chance. So the two meta-analyses of the same five studies don’t even agree which study it was found the largest effect:

Two published meta-analyses of the same five studies show no better than chance agreement in their views of what the relative effect sizes were. Even allowing for the fact that they measure the effects on different scales, you might at least hope the rank order would be the same.

I hoped that meta-analysis would lift us above the epistemic haze, and perhaps it still will. But let’s not be too sanguine: as well as averaging out human error and researcher degrees of freedom, it is going to introduce a whole extra layer. What next? Meta-meta-analysis, of course. And after that…..?