2. Collecting and analysing data from experiments Solutions
These are not model answers. They are provided to help students, including those doing the project outside a formal class, to check their progress while working through the questions using the Excel, R, or Google Sheets walk-throughs. There are also brief notes for the more interpretive questions. Students taking courses using Doing Economics should follow the guidance of their instructors.
Part 2.1 Collecting data by playing a public goods game
Note
Unless otherwise specified, numerical values are shown to two decimal places.
- The example data used in Questions 1 and 2 is from Excel walk-through 2.1.

Solution figure 2.1 Average contribution over time.
As shown in Solution figure 2.1, the average contributions over the course of the game fluctuate around a mean value of about 11.
- For period 1, the contribution in the example game (9) is lower than the average contribution in Hermann et al. (2008). Results in Hermann et al. (2008) display a downward pattern over time, unlike those in the example game, which fluctuate without a clear trend.
-
There are many possible reasons why results may be similar or different, including:
- social norms about how much people should contribute
- groups are not anonymous in your experiment (you know who your group members are even though the contribution of each member is anonymous); if you are friends with your group members, you may be able to sustain higher contributions.
Part 2.2 Describing the data
- Solution figure 2.2 shows the mean contribution in each period for both experiments.
Without punishment | With punishment |
---|---|
10.58 | 10.64 |
10.63 | 11.95 |
10.41 | 12.66 |
9.81 | 12.97 |
9.31 | 13.33 |
8.45 | 13.50 |
7.84 | 13.57 |
7.38 | 13.64 |
6.39 | 13.57 |
4.38 | 12.87 |
Solution figure 2.2 Mean contributions by period, with and without punishment.
- Solution figure 2.3 shows the comparison of mean contributions over time.

Solution figure 2.3 Comparison of mean contributions over time.
- The mean contributions in the experiments are the same in Period 1. Over time, the mean contribution in the experiment with punishment remains relatively stable, even increasing, while that in the experiment without punishment falls steadily. The divergence over time leads to a large difference (8.5) by the end of Period 10.
- Solution figure 2.4 shows the mean contribution in the first and last period for both experiments.

Solution figure 2.4 Average contributions in Periods 1 and 10, with and without punishment.
-
Solution figure 2.5 provides the standard deviation for Periods 1 and 10 for both experiments.
To check the rule of thumb, check that most values fall within two standard deviations of the mean, which corresponds to the intervals shown by the square brackets (rounded to 1 decimal place):
- [8.6, 12.6] for Period 1, without punishment
- [7.5, 13.7] for Period 1, with punishment
- [2.3, 6.5] for Period 10, without punishment
- [9.1, 16.7] for Period 10, with punishment.
Inspecting the data shows that the rule of thumb applies here.
Period | Without punishment | With punishment |
---|---|---|
1 | 2.02 | 3.21 |
10 | 2.19 | 3.90 |
Solution figure 2.5 Standard deviations in both experiments.
-
The fact that the mean contributions for both experiments in Period 1 are the same does not mean that the two sets of data are the same. The standard deviation corresponding to the experiment with punishment is greater, meaning that the data of the experiment with punishment is spread out over a wider range of values around the mean.
This example emphasizes the importance of using more than one summary statistic to get a better picture of what the data looks like.
- The maximum and minimum for Periods 1 and 10 for both experiments are given in Solution figure 2.6.
Without punishment | With punishment | |||
---|---|---|---|---|
Period | Minimum | Maximum | Minimum | Maximum |
1 | 7.96 | 14.10 | 5.82 | 16.02 |
10 | 1.30 | 8.68 | 6.20 | 17.51 |
Solution figure 2.6 Minimum and maximum values for both experiments.
- Solution figure 2.7 provides summary tables for Periods 1 and 10 for both experiments.
Mean | Standard deviation | Minimum | Maximum | |
---|---|---|---|---|
Contribution (Period 1, without punishment) | 10.58 | 2.02 | 7.96 | 14.10 |
Contribution (Period 10, without punishment) | 4.38 | 2.19 | 1.30 | 8.68 |
Contribution (Period 1, with punishment) | 10.64 | 3.21 | 5.82 | 16.02 |
Contribution (Period 10, with punishment) | 12.87 | 3.90 | 6.20 | 17.51 |
Solution figure 2.7 Summary tables for contributions in both experiments.
-
The two experiments have the same mean in Period 1. Over time, the mean contribution in the experiment without punishment decreases, while that in the experiment with punishment increases relative to the Period 1 values.
The standard deviation in the experiment with punishment is greater in both periods. The standard deviations in both experiments increase over time, although the increase for the experiment with punishment is greater.
The difference between maximum and minimum for the experiment with punishment is greater.
Part 2.3 Did changing the rules of the game affect behaviour?
- Solution figure 2.8 provides a possible outcome for the experiment.
Outcome sequence 1 | Outcome sequence 2 |
---|---|
Tails | Heads |
Heads | Heads |
Heads | Tails |
Heads | Tails |
Tails | Tails |
Tails | Tails |
Solution figure 2.8 Example data from two coin-toss experiments.
- In the example above, both the number of heads and the sequence of heads and tails are different, illustrating that we can still get different results under controlled conditions due to chance.
-
The p-value is 0.88.
Note that in this case, the p-value had to account for the fact that the data in both experiments was generated by the same groups of people, since each group plays the game with the punishment condition, and without the punishment condition.
It is important to account for this fact when calculating the p-value. If the data for each condition came from different groups of people, we might expect that some of the variation in contributions is due to the fact that the people in each group might differ slightly in a number of aspects, such as how altruistic they are. Since the data for both experiments came from the same groups of people, we have controlled for these differences between groups. Thus, we would be more confident that the differences we observe are due to the fact that the experimental conditions are different.
- Our hypothesis (and assumption when calculating the p-value) is that the means for both groups are the same. The probability of observing a difference in means as large as or even larger than the one observed if our hypothesis was correct is 0.88. Thus, under these conditions, it would not be unusual to observe the data that we did.
- The p-value is 0.00001.
- Our hypothesis is that the means for the two groups are the same. The p-value is calculated assuming that this hypothesis holds. It’s value of 0.00001 indicates that the probability of observing a difference in sample means as large as or even larger than the one observed is 0.00001. Under this hypothesis, it would be unusual to observe the data that we did so we can conclude that our hypothesis is not compatible with the data observed (in the context of a hypothesis test, we would most likely reject the null hypothesis based on the statistical evidence).
- If the two groups of data have larger standard deviations, it is more likely that the observed difference in means is due to chance. Even when the difference in means is large, as in the case shown in Figure 2.10 (Figure 2.7 in the R version), the large spread indicates that the difference is likely to be due to chance. Therefore it is important to consider the size of the difference relative to the standard deviation.
-
The same group of people did both experiments, so they have the same characteristics other than the punishment options. This means that if the groups differ in contributions, it is likely to be due to the difference in punishment options. Controlling for characteristics allows us to isolate the effect of the punishment option.
There is one other potential explanation, which is that all groups did the non-punishment game first and the punishment game after that. So the differences could have been the result of some learning process. This explanation, however, seems unlikely, as both experiments start with basically identical contribution means in Period 1.
On the whole, we are therefore confident that the differences in period 10 are due to the punishment treatment.
- In addition to the punishment options available, the contributions in periods after Period 1 can be affected by the strategic behaviour of the subjects in response to observation of past actions taken by other subjects in the group. This effect is not present in Period 1, so the difference in contributions is therefore more likely to reflect the effects of punishment options alone.
-
There can be systematic differences between the lab and the outside world that influence human behaviour. The environment of the lab experiment has unique features that are not present in the real world. The experience of being a subject, the subjects’ awareness that they are being monitored, the power of the experimenter, and the significance of the situation can all cause subjects’ behaviour to be different from behaviour in the real world. As a result, we need to be cautious when extrapolating lab findings to the outside world.
There are many limitations to discuss, including the following examples.
- An individual’s behaviour in a situation can be affected by the degree of anonymity. Differing degrees of anonymity between an experiment scenario and its real-world counterpart reduce the generalizability of lab findings. For example, subjects in the experiment may be unrelated to each other and forbidden from communicating with each other. In the real world, however, people can be related through family, work, and friendship, and are usually able to communicate with each other. For the experimental situation to be representative of its real-world counterpart, it is therefore important to ensure that the degree of anonymity is similar to that of the real world.
- Lab experiments aim to isolate the effects of changes in one variable by controlling all other variables. Human behaviour is dependent on a large number of variables, many of which are not even measurable. Examples of such variables include past experiences, social norms, and ability. It is difficult, if not impossible, for social scientists to completely control for all these variables. Because of this, social scientists may fail to obtain the ceteris paribus effects of interest. Social scientists should explore new ways to measure these variables and control for them, especially if they affect the relationships being studied.
- Behaviour can be dependent on the size of the stake involved. Stakes in the real world are typically higher than in experiments. The lab findings may therefore reveal little about real-world behaviour.
- Individuals with certain characteristics tend to select themselves into experiments, resulting in samples that are not representative of the population. For example, college students interested in the research and seeking additional income are more likely to become experimental subjects. If these characteristics affect the relationships of interest, then the results are biased. The sample selection problem means lab findings cannot be generalized to the outside world. Researchers should try random sampling of the population rather than relying on volunteers. Econometric methodologies that mitigate the effects of sample selection should be adopted.
- Subjects in the typical experiment face limited and well-specified instructions and choices. In the real world, the choice set can be infinitely large and vaguely defined. Individuals may even be able to influence rules in the real world. Lab experiments usually last no longer than a few hours, whereas in the real world, many decisions are made over long periods of time. Individuals may behave differently as the time horizon expands.
In general, in experimental social science studies involving human subjects, it is difficult to ensure perfectly that the sample is representative of the population of interest and that the situation in the experiment is representative of its real-world counterpart. These difficulties limit the generalizability of lab findings to the real world. To alleviate these problems, researchers can use random sampling and design the experiment to be as realistic as possible. Models of laboratory behaviour can be used to anticipate issues, inspiring adjustments to be made to the study. Designs that recognize the weaknesses of the lab can be adopted. Researchers can also use econometric methodologies that mitigate the problems and extract valuable information from imperfect data.