5. Traitors, the prisoner's dilemma and thoughts on co-operation

Feb 05, 2024

So another series of the BBC’s fun reality show, Traitors, has now finished. Since I’m trying to rewrite a module about learning within organisations, I thought it would be a good moment to pause and reflect on the difficulties of fostering group co-operation. Traitors is a simple set-up: put a group of strangers in a closed environment and make them compete for a pot of money. Now the interesting bit - invite three members of the ‘cast’ to betray the rest by becoming traitors! This affords them the opportunity to eliminate fellow group members and take the money for themselves.

This proposition is intriguing because it offers fascinating insights into game theory and the unwritten rules of co-operation. A quick recap here for those people unfamiliar with the basics of game theory. Game theory is the mathematical study of strategic decision making between self-interested actors. Humans use heuristics to make simple decisions about day to day life decisions. Examples of these heuristics are:

“do what the herd does”
“stick to what you know”

But interpersonal decision making does not lend itself to these kinds of static heuristics because they have a dynamic element: what you do will affect what another person does … and what they do in response will affect what you do … and so on. So you need strategies that are dynamically effective and can respond adequately to the behaviour of others.

The classic and favourite game theory thought experiment is the ‘prisoner’s dilemma’. It is a pretty simple set-up:

You and one other person commit a crime. The police catch up with you and put you in separate holding cells. Now you are offered a deal:

Confess to the crime and rat out your partner and you get 3 years in prison. If you both confess, you both get 5 years

You’re not sure how much the police know so your initial response is to say nothing and hope to be released without charge. But the police throw a spanner into the works:

If you don’t confess - and your partner does - you get 7 years in prison

It may not be immediately obvious but this presents 4 discrete strategic outcomes:

You betray your partner and she says nothing: you get 3 years in prison, she gets 7
You both say nothing and you are both released without charge
Your partner betrays you and you say nothing and you get 7 years in prison while she gets 3
You betray each other and both get 5 years in prison

Which one do you choose?

In the prisoner’s dilemma, your best strategy may appear to be betrayal since it limits your losses. But human relations are more complex than this. For example, in multi-round games, betraying a partner may lead to a retaliatory response. So while you may reduce your immediate prison sentence by betraying your partner, if you are part of a wider criminal syndicate, the costs of defection (ratting out your partner) may outweigh the immediate advantages of reducing your prison sentence. To put it in plainer terms, is a shorter prison sentence worth a knife in your back?

Game theory lets us think about the complexity of decision choices made within and between groups and, as it turns out, offers a helpful explanation for evolutionary co-operation among non kin, known as reciprocal altruism. Game playing strategies can be tested algorithmically by programming a decision heuristic into a machine and asking it to play a set number of rounds. For example, your rule may be:

Repeat whatever you did in the previous round if it was successful. If it failed, do the opposite

This is known as the pavlov strategy and is one of many different approaches that were trialled in a series of famous 1980 game theory tournaments designed by political scientist, Robert Axelrod. Anyone with a passing knowledge of Axelrod’s work knows that ‘tit for tat’ (i.e. do to your opponent whatever she just did to you) is the most effective (yield optimising) game playing strategy over multiple rounds. But what if you can’t see who you are playing against? Now you become reliant upon honest signals to identify your opponent. This is the premise of Traitors. The show has some additional complex elements:

you can’t see your adversary (the ‘traitors’ are hidden to all but one another)
this is a multi-round game where friendships and trust begin to develop
alongside the daily ritual of eliminating a fellow team member is the opportunity to maximise rewards by co-operating on challenges

What I love about Traitors is how the complex dynamics of social relationships play out over time. Every evening, a ‘round table’ is held where the group is given the opportunity to express their views regarding the identity of the traitors and seek to vote them out. Initially, levels of trust between participants are low. Unconsciously, they look for seemingly obvious signals of trustworthiness: likeability, social copying, conformity, sameness - and instantly distrust the weirdos, oddballs (and bigheads). The group looks for consistent behaviour and are thrown by characters who appear edgy or atypical. The group correctly intuits that traitors must be experiencing a higher cognitive load as they try to balance the cognitive dissonance of lying while trying to form trusting relationships … but they almost always pick this up in the wrong people (in the early rounds, the oddballs are targetted for their nervous or awkward behaviour). The most successful traitors are past masters at managing this dissonance, and some are effortlessly skilled in projecting conformist, non aggressive groupthinky behaviours while hiding their true feelings. Typically they have careers that call for this skill - sales, recruitment etc - where their true feelings must linger behind a likeable, friendly, unthreatening carapace. As the game proceeds, the non-conformist behaviour of the oddballs is better contextualised and true bonds of friendship emerge between unlikely and dissimilar characters. This is a joy to behold and very much mediated by trust created within the team-games.

As Traitors reaches its final rounds, the pay-off decisions begin to change substantially. For the last standing traitors, the reward is now big enough to compel risky behaviour and the dopamine rush begins to bubble to the surface as the traitors silently revel in their success. Because the game has a finite number of rounds, the incentives for co-operation diminish and the last standing traitor(s) begin to eliminate former allies indiscriminately. Although there will be some accountability once they are exposed at the end of the game, the very fact that the game is over means all participants can return to their former lives with low costs for non co-operation. This is the end-game and the traitors have an enormous advantage over the ‘poor trusting fools’ (one of the strategies trounced in Axelrod’s game theory tournaments where the player always co-operates). No spoilers but one of the final participants in this year’s series was adeptly herded to the final because of their propensity to always co-operate.

So what lessons can we draw when thinking about how to incentivise learning in organisations? Learning is the essence of co-operative, reciprocal altruism. Sharing your knowledge with others gives up some of your own advantage with a pay-off that only occurs at the level of the group (company, organisation, economy). So what environments will incentivise learning and which will encourage cheating / hoarding and betrayal? Let’s consider some useful questions that may help us design this environment:

Are there rewards or costs for betraying / undermining (competing with) other group members?
Are there rewards or costs for betraying / undermining (competing with) other groups?
Are there rewards or costs for betraying / undermining (competing with) stakeholders (customers, suppliers, intermediaries)
Is there an endpoint pay-off or do the rounds continue ad infinitum?
Are people held accountable for their behaviour?

Let’s consider some worked examples for these questions:

Are group members rewarded for competing against one another? The best example of this is stack ranking or rank & yank where employees are rated on a curve. Stack ranking is known to undermine learning and innovation but companies still love it because they (erroneously) believe in simplistic nonsense about survival of the fittest promoting industriousness. It does but usually to the detriment of the system and in the form of gaming
Are groups rewarded for competing against one another? Imagine an organisation that undertakes some form of annual budgeting. This produces a finite set of resources which are then fought over in a zero sum game. It’s going to be pretty hard to get those department heads to co-operate. Smaller co-operative alliances will also form to try and optimise outcomes for their group members. This is readily apparent in political machinations and horse-trading that occurs as Governments prepare their annual budgets
Is cheating on customers / stakeholders rewarded? The goto example here is Enron where rewards were poorly matched against effort and sanctions for cheating were non-existent. Enron also endorsed stack ranking so there was a toxic mix of within group competition in addition to poor behaviour towards outsiders. Enron generated dizzying, bloated rewards for cheaters but the same behaviours can occur under conditions of diminishing returns. For example Wells Fargo’s notorious ‘going for Gr-eight’ strategy of requiring their salespeople to sell 8 banking products to every customer. Imagine that in a small town with 5-10 other banks …
Is there an end date to the game? Participants will tailor their decision strategies to the timing of a game. Let’s say a group with strong feelings of hostility towards an out group is offered rewards (or cessation of sanctions) for co-operation. What happens when they see an end to this deal? The break-up of the Soviet Union offers a trenchant reminder of what happens when delicately balanced co-operation structures are upended. Rewatching the American drama series, Narcos, I see the same fascinating dynamics play out as the conditions for co-operation (in that case, the Columbian drug gangs’ joint opposition to extradition) are removed. The end of colonial empires wrought similar bloodletting as delicately balanced ethnic equilibria collapsed
Is there accountability for actions? People are less likely to cheat if they feel their actions are being observed by others, so much so that a classic behavioural nudge is to put a set of eyes on a sign asking people to refrain from a behaviour. Where there is no accountability, cheating will become endemic. 50 years of financialisation in the West led to the 2008 crisis where the suckers (poor trusting fools paying their taxes) picked up the cheaters’ bill. The recent bailout of Silicon Valley Bank suggests that the policy response to the 2008 crisis has done little to curtail behaviour that reduces group fitness

Taking a purely game theoretical approach to fostering learning and innovation in organisations, you would want to design an environment that:

Recognises and rewards co-operative behaviour (within group and between groups) not just individualistic self-maximising behaviour
Rewards at the level of the system not just at the level of the individual
Sanctions non co-operative (cheating, hoarding, blocking, bullying) behaviours that undermine group resilience (without, by the way, accidentally incentivising groupthink)
Offers transparent frameworks of accountability to uphold these values
Avoids or at least reduces the potential for zero sum (you win, I lose) games, especially between departments
Recognises breakpoints and their potential to upset delicate co-operative ecosystems
Looks to optimise the overall system, not individual parts of the system

If this all sounds kumbaya there is an important caveat that reveals the reason why many organisations don’t incorporate these modes of working. Learning and innovation are not, in themselves, maximising strategies. When we maximise, we seek to gain the highest amount of reward without regard for system consequences (think: Peter Thiel’s beloved power law rule). In the typical game theoretical set-up that demonstrates this, participants are given the opportunity to cut down trees in a forest that regenerates slowly. Maximisers chop the forest down as quickly as possible to win the game. Optimisers take it slowly and chop down trees slowly enough for the trees to regenerate. This prolongs the game and ultimately leads to higher yields for everyone, (albeit with less variability of rewards among participants, i.e. a more normal distribution of such rewards).

Companies, investors or shareholders that see themselves in a short term, zero sum race will chop down the forest to win the market - whether this is reducing wages, cutting quality, evergreening patents or enshittification. Any surplus is pocketed instead of being re-invested for the long term. For the people running or investing in these companies, the short term is all that matters and by the time their vision of the long term arrives, they’ll all be holed up in a bunker or ruining another planet.

If we really want companies to behave like learning organisations and allow a little slack for the forest to regenerate, we need to establish a level of system resilience one level higher than that of the organisation and run back through our checklist:

Reduce pay-offs for companies that undermine market resilience (e.g. by erecting barriers to entry, monopolistic practices)
Reduce pay-offs for companies that undermine system resilience (e.g. by failing to pay the cost of externalities such as pollution or lowered industrial growth)
Create international frameworks of accountability that prevent companies from banking (offshoring) surpluses without contributing to associated costs
Avoid ‘beggar thy neighbour’, zero sum policies between countries
Recognise the potential for breakpoints (such as industry disruptions) to upset delicately balanced enterprise ecosystems
Design institutions that optimise the overall system (human flourishing on a healthy planet?), not individual parts of the system

Well that’s the plan anyway. Based on two series of the Traitors, I’d say the odds are even for co-operative strategies prevailing. Not unlike the odds we face after two rounds of global, industrial conflict. Still, the game continues and there is plenty of room for the ‘faithful’ to work together to defeat the small group of cheaters that are busy chopping down the forest.

The Organisational Anthropologist

Discussion about this post