Punishment opportunities can create high incentives to cooperate in social dilemmas such as the Prisoner's Dilemma or the more general public goods game. However, punishment alone is unable to establish persistent cooperative behavior. Additional prerequisites must be met such as structured populations with limited local interactions or in the present case that individual carry some sort of reputation which indicates whether they will readily punish non-cooperative actions.

The assumption that the cooperative stage is immediately followed by a punishment stage results in four distinct strategic patterns: G1 social types that cooperate and punish defectors; G2 paradoxical or bully type that defects but nevertheless punishes defectors (this strategy is called paradoxical because it fares poorly against its own kin); G3 asocial type that neither cooperates nor punishes; and G4 mild types that cooperate but avoid the costly punishment. Through reputation, individuals may occasionally learn about the punishing type of the opponent. If the opponent does not punish, then G1 and G4 can temporarily switch to defection because they know that they can get away with it. Similarly, if the opponent does punish, then G2 and G3 can momentarily opt for cooperation to avoid punishment. Interestingly, it turns out that the former mechanism is much more important.

In well-mixed populations, punishment and reputation lead to a bi-stable situation where the initial configuration determines whether the population ends up in a social or asocial homogenous state. By adjusting the parameter values, the social state, in which individuals cooperate and punish defectors, can always become risk-dominant which means it has the larger basin of attraction than the asocial state where everybody defects and does not punish.

Dynamical regimes

All of the following examples and suggestions are meant as inspirations for further experimenting with the virtual lab. If your browser has JavaScript enabled, the following links open a new window containing a running lab that has all necessary of set as appropriate.

Population compositions, manifolds & fixed points

Each point in the interior of the simplex S4 indicates a particular composition of the four strategies in the population. Each corner of S4 indicates homogenous populations with all social, mild, asocial or bully players, respectively. All corners are fixed points but the social and asocial ones are stable (closed dot) while the other two are saddle points and hence unstable (open dots).

For the replicator dynamics an invariant of motion (x1 x3)/(x2 x4) = k leads to a foliation of the simplex S4 in invariant manifolds Wk - which are much more convenient to illustrate the dynamics. However, strictly speaking these invariant manifolds occur only for infinitely large populations. For this reason a small figure along the bottom of the graph indicates the current value of k.


Time evolution of the frequency of the four strategic types G1 to G4 in well-mixed populations where individuals engage in different kinds of interactions.

Color code:BullyAsocialMild

Punishment: pairwise interactions

Despite the incentive created by the punishment opportunity, cooperation eventually will break down resulting in a homogenous state of the G3 strategy (defect, no punishment). Note that this scenario, i.e. a prisoner's dilemma followed by punishment opportunities, is closely related to the ultimatum game.


Punishment: group interactions

For social interactions in larger groups the results are essentially identical to the results for pairwise interactions (depicted for N = 10). The only difference is that the fixed point Q is shifted towards the mild strategy G4. This means that the system may linger around in a cooperative state for a longer time but eventually random shocks will send the system to the unstable part Q-G4 and inevitably the asocial state will be reached.


Reputation: pairwise interactions

The dynamics of the system changes significantly when introducing reputation. With a small probability (mu) players may learn whether they are facing a non-punisher and take advantage of this information by temporarily switching to defection - knowing they can get away with it. Similarly they may learn with a small probability (nu) that they are matched with a punisher and decide to temporarily switch to cooperation in order to avoid the punishment. The latter mechanism turns out to be less important in this case.

Through reputation an interior fixed point m appears and the system becomes bi-stable where the initial configuration determines whether the social or asocial state will be reached.


Reputation: group interactions

The transition from pairwise interactions to interactions in larger groups again results only in minor changes. The interior fixed point m approaches G4 and suggests that the basin of attraction of the social state G1 actually increases for increasing group sizes.

Virtual lab

The applet below illustrates the different components. Along the bottom there are several buttons to control the execution and the speed of the simulations. Of particular importance are the Param button and the data views pop-up list on top. The former opens a new panel that allows to set and change various parameters concerning the game as well as the population structure. The latter displays the simulation data in different ways. Clicking on the examples below opens a new window with a larger applet and all parameters preset accordingly.

Color code:BullyAsocialMild
 New socialNew bullyNew asocialNew mild
Payoff code:Low       High

Note: The pale strategy colors are very useful to get an intuition of the activitiy in the system. The shades of grey of the payoff scale are augmented by blueish and reddish shades, which indicate the payoffs for mutual cooperation and defection, respectively.

Java applet on punishment and reputation. Sorry, but you are missing the fun part!
ParamsPop up panel to set various parameters.
ViewsPop up list of different data presentations.
ResetReset simulation
RunStart/resume simulation
NextNext generation
PauseInterrupt simulation
SliderIdle time between updates. On the right your CPU clock determines the update speed while on the left updates are made roughly once per second.
MouseMouse clicks on the graphics panels generally start, resume or stop the simulations.
Data views
Structure - Strategy Snapshot of the spatial arrangement of strategies. Mouse clicks cyclically change the strategy of the respective site for the preparation of custom initial configurations.
Mean frequency Time evolution of the strategy frequencies.
Simplex S4 Frequencies plotted on a manifold of the simplex S4. Mouse clicks set the initial frequencies of strategies (the manifold k is determined by the initial frequencies set on the parameter panel).
Structure - Fitness Snapshot of the spatial distribution of payoffs.
Mean Fitness Time evolution of the mean payoff of each strategy together with the average population payoff.
Histogram - Fitness Histogram of payoffs for each strategy.

Game parameters

The list below is restricted to the few parameters particularly related to punishment and reputation in public goods game. Follow the link for a complete list and descriptions of all other parameters e.g. referring to update mechanisms of players and the population.

multiplication factor r of public good.
cost of cooperation c (investment into public good).
fine imposed on defecting co-player through punishment.
punishment is costly and the punisher has to bear these costs.
Rep. Mu:
reputation - probability to learn that all co-players are non-punishers and taking advantage of this knowledge by temporarily switching to defection.
Rep. Nu:
reputation - probability to learn that at least one co-players punishes and taking advantage of this information to avoid punishment by temporarily switching to cooperation.
Init coop/punish, init coop/none, init defect/punish, init defect/none:
initial fractions of the social (cooperate and punish, G1), mild (cooperate but do not punish, G4), bully/paradoxical (defect and punish, G2) and rational (defect, don't punish, G3) strategies. If this does not add up to 100%, the values are scaled accordingly.