Empirical Project 2 Solutions
These are not model answers. They are provided to help students, including those doing the project outside a formal class, to check their progress while working through the questions using the Excel or R walkthroughs. There are also brief notes for the more interpretive questions. Students taking courses using Doing Economics should follow the guidance of their instructors.
Part 2.1 Collecting data by playing a public goods game
Note
Unless otherwise specified, numerical values are shown to one decimal place.
 The example data used here is from Excel walkthrough 2.1.
The average contributions over the course of the game fluctuate around a mean value of about 11.
 For period 1, the contribution in the example game (9) is lower than the average contribution in Hermann et al. (2008). Results in Hermann et al. (2008) display a downward pattern over time, unlike those in the example game, which fluctuate without a clear trend.

There are many possible reasons why results may be similar or different, including:
 social norms about how much people should contribute
 groups are not anonymous in your experiment (you know who your group members are even though the contribution of each member is anonymous); if you are friends with your group members you may be able to sustain higher contributions.
Part 2.2 Describing the data
 Solution figure 2.2 shows the mean contribution in each period for both experiments.
Without punishment  With punishment 

10.58  10.64 
10.63  11.95 
10.41  12.66 
9.81  12.97 
9.31  13.33 
8.45  13.50 
7.84  13.57 
7.38  13.64 
6.39  13.57 
4.38  12.87 
Mean contributions by period, with and without punishment.
 Solution figure 2.3 shows the comparison of mean contributions over time.
 The mean contributions in the experiments are the same in Period 1. Over time, the mean contribution in the experiment with punishment remains relatively stable, even increasing, while that in the experiment without punishment falls steadily. The divergence over time leads to a large difference (8.5) by the end of Period 10.
 Solution figure 2.4 shows the mean contribution in the first and last period for both experiments.

Solution figure 2.5 provides the standard deviation for Periods 1 and 10 for both experiments.
To check the rule of thumb, check that most values fall within two standard deviations of the mean, which corresponds to the intervals shown by the square brackets:
 [8.6, 12.6] for Period 1, without punishment
 [7.5, 13.7] for Period 1, with punishment
 [2.3, 6.5] for Period 10, without punishment
 [9.1, 16.7] for Period 10, with punishment.
Inspecting the data shows that the rule of thumb applies here.
Period  Without punishment  With punishment 

1  2.02  3.21 
10  2.19  3.90 
Standard deviations in both experiments.

The fact that the mean contributions for both experiments in Period 1 are the same does not mean that the two sets of data are the same. The standard deviation corresponding to the experiment with punishment is greater, meaning that the data of the experiment with punishment are spread out over a wider range of values around the mean.
This example emphasizes the importance of using more than one summary statistic to get a better picture of what the data looks like.
 The maximum and minimum for Periods 1 and 10 for both experiments are given in Solution figure 2.6.
Without punishment  With punishment  

Period  Minimum  Maximum  Minimum  Maximum 
1  7.96  14.10  5.82  16.02 
10  1.30  8.68  6.20  17.51 
Minimum and maximum values for both experiments.
 Solution figure 2.7 provides summary tables for Periods 1 and 10 for both experiments.
Mean  Standard deviation  Minimum  Maximum  

Contribution (Period 1, without punishment)  10.58  2.02  7.96  14.10 
Contribution (Period 10, without punishment)  4.38  2.19  1.30  8.68 
Contribution (Period 1, with punishment)  10.64  3.21  5.82  16.02 
Contribution (Period 10, with punishment)  12.87  3.90  6.20  17.51 
Summary tables for contributions in both experiments.

The two experiments have the same mean in Period 1. Over time, the mean contribution in the experiment without punishment decreases, while that in the experiment with punishment increases relative to the Period 1 values.
The standard deviation in the experiment with punishment is greater in both periods. The standard deviations in both experiments increase over time, although the increase for the experiment with punishment is greater.
The difference between maximum and minimum for the experiment with punishment is greater.
Part 2.3 Did changing the rules of the game have a significant effect on behaviour?
 Solution figure 2.8 provides a possible outcome for the experiment.
Outcome sequence 1  Outcome sequence 2 

Tails  Heads 
Heads  Heads 
Heads  Tails 
Heads  Tails 
Tails  Tails 
Tails  Tails 
Example data from two cointoss experiments.
 In the example above, both the number of heads and the sequence of heads and tails are different, illustrating that we can still get different results under controlled conditions due to chance.

The pvalue is 0.88.
Note that in this case, the pvalue had to account for the fact that the data in both experiments was generated by the same groups of people since each group plays the game under both the with punishment and without punishment conditions.^{1}
It is important to account for this difference in the hypothesis test. Imagine that the data for each condition came from different people. Even if there were no substantial differences, you would expect some differences, as you are likely to find that the people in the two groups differ in a number of respects, which may result in slightly different contributions. But if the data for both experiments came from the same groups of people, then these differences between groups would not exist. We would be more confident that the differences we observe are due to the fact that the experimental conditions are different.
 Given that the null hypothesis (our assumption when calculating the pvalue) is that the means for both groups are the same, the probability of observing a difference in means as large as or even larger than the one observed is 0.88, which is far higher than the significance level. We cannot therefore reject the null hypothesis at the 5% level, meaning that the difference in means is not statistically significant.
 The pvalue is 0.00001.
 Given that the null hypothesis is that the means for the two groups are the same, the probability of observing a difference in means as large as or even larger than the one observed is 0.00001, which is lower than the significance level. We therefore reject the null hypothesis, and the difference in means is significant at the 5% level. In other words, under our assumption that the means for both groups are the same, it is very unlikely that we would observe a difference in means of the same size as that in the data given. The observation of data with such a difference suggests that our assumption is probably false.
 If the two groups of data have larger standard deviations, it is more likely that the observed difference in means is due to chance. Even when the difference in means is large, as in the case shown in Figure 2.10, the large spread indicates that the difference is likely to be due to chance and is therefore not statistically significant. To determine significance we should consider the relative size of the difference to the standard deviation.

The same group of people did both experiments, so they have the same characteristics other than the punishment options. This means that if the groups differ in contributions, it is likely to be due to the difference in punishment options. Controlling for characteristics allows us to isolate the effect of the punishment option.
There is one other potential explanation, which is that all groups did the nonpunishment game first and the punishment game after that. So the differences could have been the result of some learning process. This explanation, however, seems unlikely, as both experiments start with basically identical contribution means in Period 1.
On the whole we are therefore confident that the differences in period 10 are due to the punishment treatment.
 In addition to the punishment options available, the contributions in periods after Period 1 can be affected by the strategic behaviour of the subjects in response to observation of past actions taken by other subjects in the group. This effect is not present in Period 1, so the difference in contributions is therefore more likely to reflect the effects of punishment options alone.

There can be systematic differences between the lab and the outside world that influence human behaviour. The environment of the lab experiment has unique features that are not present in the real world. The experience of being a subject, the subjects’ awareness that they are being monitored, the power of the experimenter, and the significance of the situation can all cause subjects’ behaviour to be different from behaviour in the real world. As a result, we need to be cautious when extrapolating lab findings to the outside world.
There are many limitations to discuss, including the following examples.
 An individual’s behaviour in a situation can be affected by the degree of anonymity. Differing degrees of anonymity between an experiment scenario and its realworld counterpart reduce the generalizability of lab findings. For example, subjects in the experiment may be unrelated to each other and forbidden from communicating with each other. In the real world, however, people can be related through family, work, and friendship, and are usually able to communicate with each other. For the experimental situation to be representative of its realworld counterpart, it is therefore important to ensure that the degree of anonymity is similar to that of the real world.
 Lab experiments aim to isolate the effects of changes in one variable by controlling all other variables. Human behaviour is dependent on a large number of variables, many of which are not even measurable. Examples of such variables include past experiences, social norms, and ability. It is difficult, if not impossible, for economists to completely control for all these variables. Because of this, economists may fail to obtain the ceteris paribus effects of interest. Economists should explore new ways to measure these variables and control for them, especially if they affect the relationships being studied.
 Behaviour can be dependent on the size of the stake involved. Stakes in the real world are typically higher than in experiments. The lab findings may therefore reveal little about real world behaviour.
 Individuals with certain characteristics tend to select themselves into experiments, resulting in samples that are not representative of the population. For example, college students interested in the research and seeking additional income are more likely to become experimental subjects. If these characteristics affect the relationships of interest, then the results are biased. The sample selection problem means lab findings cannot be generalized to the outside world. Researchers should try random sampling of the population rather than relying on volunteers. Econometric methodologies that combat the effects of sample selection should be adopted.
 Subjects in the typical experiment face limited and wellspecified instructions and choices. In the real world, the choice set can be infinitely large and vaguely defined. Individuals may even be able to influence rules in the real world. Lab experiments usually last no longer than a few hours, whereas in the real world, many decisions are made over long periods of time. Individuals may behave differently as the time horizon expands.
In general, in experimental social science studies involving human subjects, it is difficult to ensure perfectly that the sample is representative of the population of interest and that the situation in the experiment is representative of its realworld counterpart. These difficulties limit the generalizability of lab findings to the real world. To alleviate these problems, researchers can use random sampling and design the experiment to be as realistic as possible. Models of laboratory behaviour can be used to anticipate issues, inspiring adjustments to be made to the study. Designs that recognize the weaknesses of the lab can be adopted. Researchers can also use econometric methodologies that combat the problems and extract valuable information from imperfect data.

Benedikt Herrmann, Christian Thöni, and Simon Gächter. 2008. ‘Antisocial punishment across societies’. Science Magazine 319 (5868): p. 1363, first column. ↩