Empirical Project 2 Working in R

Getting started in R

For this project you will need the following packages:

If you need to install either of these packages, run the following code:

install.packages(c("readxl","tidyverse"))

You can import the libraries now, or when they are used in the R walk-throughs below.

library(readxl)  
library(tidyverse)

Part 2.1 Collecting data by playing a public goods game

Note

You can still do Parts 2.2 and 2.3 without completing this part of the project.

Before taking a closer look at the experimental data, you will play a public goods game like the one in the introduction with your classmates to learn how experimental data can be collected. If your instructor has not set up a game, follow the instructions below to set up your own game.

Instructions How to set up the public goods game

Form a group of at least four people. Choose one person to be the game administrator. The administrator will monitor the game, while the other people play the game.

Administrator

  1. Create the game: Go to the ‘Economics Games’ website, scroll down to the bottom of the page, and click ‘Create a Multiplayer Game and Get Logins’. Then click ‘Externalities and public goods’. Under the heading ‘Voluntary contribution to a public good’, click ‘Choose this Game’. Enter in the number of people playing the game, and select ‘1’ for the number of universes. Then click ‘Get Logins’. A pop-up will appear, showing the login IDs and passwords for the players and for the administrator.
  2. Start the game: Give each player a different login ID. The game should be played anonymously, so make sure that players do not know the login IDs of other players. You are now ready to start the first round of game. There are ten rounds in total.
  3. Confirm that all the rounds are complete: On the top right corner of the webpage, click ‘Login’, enter your login ID and password, and then click the green ‘Login’ button. You will be taken to the game administration page, which will show the average contribution in each round, and the results of the round just played. Wait until all the players have finished playing ten rounds before refreshing this page.
  4. Collect the game results: Once the players have finished playing ten rounds, refresh this page. The table at the top of the page will now show the average contribution (in euros) for each of the ten rounds played. Select the whole table, then copy and paste it into a new worksheet in Excel.

Players

  1. Login: Once the administrator has created the game, go to the ‘Economics Games’ website. On the top right corner, click ‘Login’, enter the login ID and password that your administrator has given you, then click the green ‘Login’ button. You will be taken to the public goods game that your administrator has set up.
  2. Play the first round of the game: Read the instructions at top of the page carefully before starting the game. In each round, you must decide how much to contribute to the public good. Enter your choice for each universe (group of players) that you are a part of (if the same players are in two universes, then make the same contribution in both), then click ‘Validate’.
  3. View the results of the first round: You will then be shown the results of the first round, including how much each player (including yourself) contributed, the payoffs, and the profits. Click ‘Next’ to start the next round.
  4. Complete all the rounds of the game: Repeat steps 2 and 3 until you have played ten rounds in total, then collect the results of the game from your administrator.

Use the results of the game you have played to answer the following questions.

  1. Make a line chart with average contribution as the vertical axis variable, and period (from 1 to 10) on the horizontal axis. Describe how average contributions have changed over the course of the game.

R walk-through 2.1 Plotting a line chart with multiple variables

Use the data from your own experiment to answer Question 1. As an example, we will use the data for the first three cities of the dataset that will be introduced in Part 2.2.

Period <- seq(1,10)

Copenhagen <- c(14.1,14.1,13.7,12.9,12.3,11.7,10.8,10.6,9.8,5.3)
Dniprop <- c(11.0,12.6,12.1,11.2,11.3,10.5,9.5,10.3,9.0,8.7)
Minsk <- c(12.8,12.3,12.6,12.3,11.8,9.9,9.9,8.4,8.3,6.9)

# Put the data into a data frame
data_ex <- data.frame(Period,Copenhagen,Dniprop,Minsk)

plot(data_ex$Period,data_ex$Copenhagen, type = "l", lwd=2, col="blue", xlab = "Round",ylim = c(4,16), ylab="Average contribution")
lines(data_ex$Dniprop,col="red", lwd=2)  # Select colour and line width
lines(data_ex$Minsk,col="green", lwd=2)
title("Average contribution to public goods game: without punishment")
legend("bottomleft", legend=c("Copenhagen", "Dniprop", "Minsk"),
  col=c("blue", "red", "green"), lwd = 2, lty = 1, cex=1.2)

Average contributions in different locations.

Figure 2.1 Average contributions in different locations.

  1. Compare your line chart with Figure 3 of Herrmann et al. (2008).1 Comment on any similarities or differences between the results (for example, the amount contributed at the start and end, or the change in average contributions over the course of the game).
  1. Can you think of any reasons why your results are similar to (or different from) those in Figure 3? You may find it helpful to read the ‘Experiments’ section of the Herrmann et al. (2008) study for a more detailed description of how the experiments were conducted.

Part 2.2 Describing the data

We will now use the data used in Figures 2A and 3 of Herrmann et al. (2008), and evaluate the effect of the punishment option on average contributions. Rather than compare two charts showing all of the data from each experiment, as the authors of the study did, we will use summary measures to compare the data, and show the data from both experiments (with and without punishment) on the same chart.

First, download and save the data. The spreadsheet contains two tables:

You can see that in each period (row), the average contribution varies across countries, in other words, there is a distribution of average contributions in each period.

R walk-through 2.2 Importing the datafile into R

Notice that the one Excel worksheet contains both the tables you need. Note down the cell ranges of each table, in this case A2:Q12 for the without punishment data and A16:Q26 for the punishment data. We will now use this range information to import the data into the respective dataframes.

library(tidyverse) # This package provides useful functionality later.
library(readxl)

data_N <- read_excel("Public-goods-experimental-data.xlsx",range = "A2:Q12")
data_P <- read_excel("Public-goods-experimental-data.xlsx",range = "A16:Q26")

Look at the data either by opening the dataframes from the Environment window or by typing data_N or data_P into the Console.

You can see that in each period (row), the average contribution varies across countries, in other words, there is a distribution of average contributions in each period.

The mean, median and variance are two ways to summarize distributions. We will now use these measures, along with other measures (range and standard deviation) to summarize and compare the distribution of contributions in both experiments.

  1. Using the data for Figures 2A and 3 of Herrmann et al. (2008):

R walk-through 2.3 Calculating the mean using a loop or the apply function

Calculate mean contribution

We calculate the mean using two different methods, to illustrate that there are usually many ways of achieving the same thing. First we use a loop that calculates the average over all but the first column (data_P[row,2:17] or data_P[row,-1]). Then we use the apply function (for data_P).

# Use a loop for data_N

data_N$meanC <- 0

for (row in 1:nrow(data_N)) {
  data_N$meanC[row] <- rowMeans(data_N[row,2:17])
}

# Use the apply function for data_P
data_P$meanC <- apply(data_P[,2:17], 1,mean)

The apply function takes another function (the mean function in this case) and applies it to all rows in data_P[,2:17]. The second input, 1 applies the specified function to all rows. An entry of 2 would have calculated column means. Type ?apply in your console for more details, or see R walk-through 2.5 for further practice.

Plot mean contribution

Now we will produce the line charts for the mean contributions.

plot(data_N$Period,data_N$meanC, type = "l",col="blue", lwd=2, xlab = "Round",ylim = c(4,14), ylab="Average contribution")
lines(data_P$meanC, col="red", lwd=2)
title("Average contribution to public goods game")
legend("bottomleft", legend=c("Without punishment", "With punishment"),
  col=c("blue", "red"), lty = 1, cex=1.2, lwd=2)

Average contribution to public goods game, with and without punishment.

Figure 2.2 Average contribution to public goods game, with and without punishment.

The difference is stark as the contributions increase and then stabilize at around $13 when there is punishment, but decrease consistently from around $11 to $4 across the rounds when there is no punishment.

  1. Instead of looking at all periods, we can focus on contributions in the first and last period. Plot a column chart showing the mean contribution in the first and last period for both experiments. Your chart should look like Figure 2.3 below.

R walk-through 2.4 Drawing a column chart to compare two groups

To make a column chart, we will use the barplot function. We first extract the four data points we need (Period 1 and 10 for with and without punishment) and place them into a matrix, which we then input into the barplot function.

temp_d <- c(data_N$meanC[1],data_N$meanC[10],data_P$meanC[1],data_P$meanC[10])
temp <- matrix(temp_d, nrow = 2, ncol = 2, byrow = TRUE)
temp
##          [,1]      [,2]
## [1,] 10.57831  4.383769
## [2,] 10.63876 12.869879
barplot(temp, main="Mean contributions in a public goods game", ylab = "Contribution",
  beside=TRUE, col=c("Blue","Red"), names.arg = c("Round 1","Round 10"))
legend("bottomleft", c("Without punishment"," With punishment"), pch = 1, col=c("Blue","Red"))

Mean contributions in a public goods game.

Figure 2.3 Mean contributions in a public goods game.

Tip

Experimenting with these charts will help you to learn how to use R. The details of how to specify the column chart can be complicated, but you can see from Figure 2.3 what the options main, ylab, col and names.arg do. To figure out what the beside option does, try switching the option from TRUE to FALSE and see what happens.

variance
A measure of dispersion in a frequency distribution, equal to the mean of the squares of the deviations from the arithmetic mean of the distribution. The variance is used to indicate how ‘spread out’ the data is. A higher variance means that the data is more spread out. Example: The set of numbers 1, 1, 1 has zero variance (no variation), while the set of numbers 1, 1, 999 has a high variance of 2178 (large spread).
standard deviation
A measure of dispersion in a frequency distribution, equal to the square root of the variance. The standard deviation has a similar interpretation to the variance. A larger standard deviation means that the data is more spread out. Example: The set of numbers 1, 1, 1 has a standard deviation of zero (no variation or spread), while the set of numbers 1, 1, 999 has a standard deviation of 46.7 (large spread).

The mean is one useful measure of the ‘middle’ of a distribution, but is not a complete description of what our data looks like. We also need to know how ‘spread out’ the data is in order to get a clearer picture and make comparisons between distributions. The variance is one way to measure spread: the higher the variance, the more spread out the data is.

A similar measure is standard deviation, which is the square root of the variance and is commonly used because there is a handy rule of thumb for large datasets, which is that most of the data (95% if there are many observations) will be two standard deviations away from the mean.

  1. Using the data for Figures 2A and 3 of Herrmann et al. (2008):

R walk-through 2.5 Calculating and understanding the standard deviation

In order to calculate these standard deviations and variances, we will use the apply function, which we introduced in R walk-through 2.3. As we saw, apply is a command asking R to apply a loop to a dataframe, and the basic structure is as follows: apply(dataframe, dimension (rows or columns),which function to apply). So to calculate the variances, use the following command:

data_N$varC <- apply(data_N[,2:17],1,var)

Here we take data_N[,2:17] and apply the var function to each row (recall that the entry 1 does this; 2 would indicate columns). Note that we exclude the first column from the calculation, as that indicates the period and not one of the locations given in columns 2:17. The result is saved to a new variable called varC.

The same principle is now extended to the standard deviation calculation and the data_P dataframe.

data_N$sdC <- apply(data_N[,2:17],1,sd)
data_P$varC <- apply(data_P[,2:17],1,var)
data_P$sdC <- apply(data_P[,2:17],1,sd)

To determine whether 95% of the observations fall within two standard deviations of the mean, we can use a line chart. As we have 16 countries in every period, we would expect about one observation (0.05 × 16 = 0.8) to fall outside this interval.

citylist <- names(data_N[2:17])

plot(data_N$Period,data_N$meanC, type = "l",col="blue", lwd=2, xlab = "Round",ylim = c(0,20),    
  ylab="Average contribution")
lines(data_N$meanC+2*data_N$sdC,col="red",lwd=2) # mean + 2 sd
lines(data_N$meanC-2*data_N$sdC,col="red",lwd=2) # mean – 2 sd

for(i in citylist) {
  points(data_N[[1]],data_N[[i]])
}

title("Contribution to public goods game without punishment")
legend("bottomleft", legend=c("Mean", "+/- 2 sd"),
  col=c("blue", "red"), lwd=2, lty = 1,cex=1.2)

Contribution to public goods game without punishment.

Figure 2.4 Contribution to public goods game without punishment.

None of the observations falls outside the mean ± two standard deviations interval for the without punishment public goods game. Let’s see the equivalent picture for the version with punishment.

citylist <- names(data_N[2:17])

plot(data_P$Period,data_P$meanC, type = "l",col="blue", xlab = "Round",ylim = c(0,22), ylab="Average contribution")
lines(data_P$meanC+2*data_P$sdC,col="red") # mean + 2 sd
lines(data_P$meanC-2*data_P$sdC,col="red") # mean – 2 sd

for(i in citylist) {
  points(data_P[[1]],data_P[[i]])
}

title("Contribution to public goods game with punishment")
legend("bottomleft", legend=c("Mean", "+/- 2 sd"),
  col=c("blue", "red"), lty = 1,cex=1.2)

Contribution to public goods game with punishment.

Figure 2.5 Contribution to public goods game with punishment.

Here it looks as if we only have one observation outside the interval (in Period 8). In that aspect the two experiments look similar. However, from comparing these two plots, we see that the game with punishment displays a greater variation of responses than the game without punishment. In other words, there is a larger standard deviation and variance for the observations coming from the game with punishment.

range
The interval formed by the smallest (minimum) and the largest (maximum) value of a particular variable. The range shows the two most extreme values in the distribution, and can be used to check whether there are any outliers in the data. (Outliers are a few observations in the data that are very different from the rest of the observations.)

Another measure of spread is the range, which is the interval formed by the smallest (minimum) and the largest (maximum) values of a particular variable. For example, we might say that the number of periods in the public goods experiment ranges from 1 to 10. Once we know the most extreme values in our dataset, we have a better picture of what our data looks like.

  1. Calculate the maximum and minimum value for Periods 1 and 10 separately, for both experiments.

R Walk-through 2.6 Finding the minimum, maximum, and range of a variable

To calculate the range for both experiments and for all periods, we will use the apply function again.

data_P$rangeC <- apply(data_P[,2:17],1,range)

Unfortunately, when you execute this command you are likely to get an error message that reads something like this: Error in $<-.data.frame(*tmp*, "rangeC", value = c(5.81818199157715, : replacement has 2 rows, data has 10.

Let’s investigate why this is the case by picking one data row and calculating the range.

range(data_N[1,2:17])
## [1] 7.958333 14.102941

You can see that we get two values, the maximum and the minimum. However, we need the range (maximum–minimum). So we need a short intermediary step, where we calculate the maximum and minimum for our whole dataset (using the apply function) and save this in a new variable called temp.

temp <- apply(data_N[,2:17],1,range)
temp
##           [,1]      [,2]     [,3]     [,4]     [,5]      [,6]      [,7]
## [1,]  7.958333  6.272727  6.25000  5.97500  5.42500  4.546875  3.921875
## [2,] 14.102941 14.132353 13.72059 12.89706 12.33824 11.676471 10.779412
##          [,8]     [,9]    [,10]
## [1,]  3.15625 2.171875 1.300000
## [2,] 10.63235 9.764706 8.681818

Now we use the difference between the respective maximums and minimums to define our new range variable rangeC.

data_N$rangeC <- temp[2,] - temp[1,]

And now we do the same for data_P:

temp <- apply(data_P[,2:17],1,range)
data_P$rangeC <- temp[2,] - temp[1,]

Let’s create a chart of the ranges for both experiments for all periods in order to compare them.

plot(data_N$Period,data_N$rangeC, type = "l",col="blue", lwd=2, xlab = "Round",ylim = c(4,14),   
  ylab="Range of contributions")
lines(data_P$rangeC,col="red",lwd=2)
title("Range of contributions to public goods game")
legend("bottomright", legend=c("Without punishment", "With punishment"),
  col=c("blue", "red"), lwd=2,lty = 1,cex=1.2)

Range of contributions to public goods game.

Figure 2.6 Range of contributions to public goods game.

This chart confirms what we found in R walk-through 2.5, which is that there is a greater spread (variation) of contributions in the game with punishment.

  1. A concise way to describe the data is in a summary table. With just four numbers (mean, standard deviation, minimum value, maximum value), we can get a general idea of what the data looks like.

R walk-through 2.7 Creating a table of summary statistics

We have already done most of the work for creating this summary table in R walk-through 2.6. Since we also want to display the minimum and maximum values, we should add these to the dataframes.

data_N$minC <- apply(data_N[,2:17],1,min)
data_N$maxC <- apply(data_N[,2:17],1,max)
data_P$minC <- apply(data_P[,2:17],1,min)
data_P$maxC <- apply(data_P[,2:17],1,max)

Now we display the statistics. Here we enclose our command in the round function, which reduces the number of digits displayed after the decimal point and makes the table a little easier to read.

print("Public goods game without punishment")
## [1] "Public goods game without punishment"
round(data_N[c(1,10),c(1,18:23)], digits=2) # We want to see Rows 1 and 10 and Column 1 (which contains the period number) as well as Columns 18 to 23 (which contain the statistics). 

## # A tibble: 2 x 7
##   Period meanC  varC   sdC rangeC  minC  maxC
##    <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1      1 10.58  4.08  2.02   6.14  7.96 14.10 
## 2     10  4.38  4.78  2.19   7.38  1.30  8.68

Let’s repeat this command for the version with punishment.

options(signif=2) # Show two digits
print("Public goods game with punishment")
## [1] "Public goods game with punishment"
round(data_P[c(1,10),c(1,18:23)], digits=2)
## # A tibble: 2 x 7
##   Period meanC  varC   sdC rangeC  minC  maxC
##    <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1      1 10.64  10.29 3.21  10.20  5.82 16.02
## 2     10 12.87  15.19 3.90  11.31  6.20 17.51

Part 2.3 Did changing the rules of the game have a significant effect on behaviour?

The punishment option was introduced into the public goods game in order to see whether it could help sustain contributions, compared to the game without a punishment option. We will now use a method called a hypothesis test to compare the results from both experiments more formally.

By comparing the results in Period 10 of both experiments, we can see that the mean contribution in the experiment with punishment is 8.5 units higher than in the experiment without punishment (see Figure 2.6). Is it more likely that this behaviour is due to chance, or is it more likely to be due to the difference in experimental conditions?

  1. You can conduct another experiment to understand why we might see differences in behaviour that are due to chance.

The important point to note is that even when we conduct experiments under the same controlled conditions, due to an element of randomness we may not observe the exact same behaviour each time we do the experiment.

statistically significant
When a relationship between two or more variables is unlikely to be due to chance, given the assumptions made about the variables (for example, having the same mean). Statistical significance does not tell us whether there is a causal link between the variables.

If the observed differences are not likely to be due to chance, then we say the differences are statistically significant. To determine whether the differences in means is statistically significant or not, we need to consider the size of the difference relative to the standard deviation of both distributions (how spread out the data is).

The fact that statistical significance relies on a relative comparison is very important. The size of the difference alone cannot tell us whether something is statistically significant or not. In fact, even if the observed differences are large, it is not a guarantee that the differences are statistically significant. Figures 2.7 and 2.8 show the mean exam score of two groups (represented by the height of the columns, and reported in the boxes above the columns), with the dots representing the underlying data. Figure 2.7 shows that a relatively large difference in means is not statistically significant because the data is widely spread out (the standard deviation is large), while Figure 2.8 shows that a relatively small difference is statistically significant because the data is tightly clustered together (the standard deviation is very small). In Figure 2.7, the difference in means is likely to be due to chance, but in Figure 2.8, the difference in means is not likely to be due to chance.

An example of a large difference in means that is not statistically significant.

Figure 2.7 An example of a large difference in means that is not statistically significant.

An example of a small difference in means that is statistically significant.

Figure 2.8 An example of a small difference in means that is statistically significant.

hypothesis test
A test in which a null (default) and an alternative hypothesis are posed about some characteristic of the population. Sample data is then used to test how likely it is that these sample data would be seen if the null hypothesis was true.
p-value
The probability of observing the data collected, assuming that the two groups have the same mean. The p-value ranges from 0 to 1, where lower values indicate a higher probability that the underlying assumption (same means) is false. The lower the probability (the lower the p-value), the less likely it is to observe the given data, and therefore the more likely it is that the assumption is false (the means of both distributions is not the same).

To determine statistical significance, we conduct a hypothesis test, which uses the size of the difference and the standard deviation to calculate the probability (called a p-value) of seeing the data we observe, assuming that the means of both distributions are the same. Since the p-value is a probability, it ranges from 0 to 1 (inclusive). The smaller the probability (the smaller the p-value), the less likely it is that we will observe the given data, so the more likely it is that our assumption is false (in other words, the means of both distributions are not the same).

significance level
A cut-off probability that determines whether a p-value is considered statistically significant. If a p-value is smaller than the significance level, it is considered unlikely that the differences observed are due to chance, given the assumptions made about the variables (for example, having the same mean). Common significance levels are 1% (p-value of 0.01), 5% (p-value of 0.05), and 10% (p-value of 0.1). See also: statistically significant, p-value.

Our conclusions will depend on our definition of a ‘small’ probability. We define ‘small’ by choosing a cut-off (a percentage) also referred to as a significance level. Any probability smaller than that cut-off would be considered ‘small’. Some commonly used cut-offs are 1% (p-value of 0.01), 5% (p-value of 0.05), and 10% (p-value of 0.1).

Find out more Hypothesis test

When we conduct a hypothesis test, we formulate a null hypothesis (in this case, that the two means are identical) and an alternative hypothesis (that the two means are different). At the end of the hypothesis test procedure, we will either reject the null hypothesis (sometimes called H0) or not reject the null hypothesis. Essentially we will reject the null hypothesis if the p-value (the probability of seeing data similar to the data observed if the null hypothesis was true) is smaller than a certain cut-off level. Some commonly used cut-offs are 1% (0.01), 5% (0.05) and 10% (0.10). If we reject the null hypothesis we also sometimes say that we have found a statistically significant difference at a (say) 5% significance level.

You may wonder how we should choose that cut-off level. Importantly this cut-off level describes the probability of a Type-I error. When we make decisions on sample data, as we do here, we may come to a conclusion that is erroneous. In particular we may actually reject the null hypothesis, while in reality the null hypothesis was true (which is what we call a Type-I error). The cut-off level above is equivalent to the probability of making a Type-I error. So if we chose a 5% cut-off (significance level) we implicitly accept that even if the null hypothesis was true, there is a 5% chance that we will reject it.

Now that we understand what this cut-off level represents (the probability of making a Type-I error if H0 is true), we can return to the initial question of what our cut-off (significance) level should be. There is not one ‘correct’ level, although 5% is a standard level people use without thinking (and that can cause problems). What matters is how costly a Type-I error is. If it is very costly, you should choose a small significance/cut-off level, perhaps even smaller than 1%. Would we want to set this significance level as low as possible to avoid such errors? The trade-off when setting a lower significance level is that it is more difficult to reject the null hypothesis, even if it is incorrect (which of course we don’t know).

An example of such a situation is testing a new medication that is known to have significant side-effects, but may be useful for a serious medical condition. We would start with the null hypothesis that the medication has no effect (H0), and would only want to reject the null hypothesis if there is significant evidence that the medication is very useful for the intended purpose. But given that there are known significant side-effects, we would want to keep the significance level low, so that we are not exposing patients to the side effects without any benefits.

We will calculate the p-value for our hypothesis test, testing whether the introduction of punishment has changed the contributions.

  1. Using the data for Figures 2A and 3:

R walk-through 2.8 Calculating a t-test for means

We need to extract the observations in Period 1 for the data for with and without punishment, and then feed the observations into the t.test function. The t.test function is extremely flexible: if you feed in a y and a x variable as shown below, it will automatically test the null hypothesis that the means of both series are equal (in other words, the difference in means is 0).

p1_N <- data_N[1,2:17]
p1_P <- data_P[1,2:17]
t.test(x=p1_N,y=p1_P)

Unfortunately, if you run this code, you are likely to get an error message like this:

Error: Unsupported use of matrix or array for column indexing.

The reason for this is that p1_N and p1_P are still tibbles (dataframes) with one observation and 16 variables. The t.test function requires one variable (with many observations), but we are giving it one observation for 16 variables, so we need to change this. One way to do this is to transpose (the same idea as a vector transpose) from one observation and many variables to many observations and one variable, which is done using the t function.

p1_N <- t(data_N[1,2:17])
p1_P <- t(data_P[1,2:17])
t.test(x=p1_N,y=p1_P)
##
## Welch Two Sample t-test
##
## data: p1_N and p1_P
## t = -0.063782, df = 25.288, p-value = 0.9496
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2.011123 1.890231
## sample estimates:
## mean of x mean of y
## 10.57831 10.63876

This result delivers a p-value of 0.9496. This means it is very likely to receive data like that we observed if H0 was true, so we cannot reject the null hypothesis.

p1_N <- t(data_N[1,2:17])
p1_P <- t(data_P[1,2:17])
t.test(x=p1_N,y=p1_P,paired = TRUE)
##
## Paired t-test
##
## data: p1_N and p1_P
## t = -0.14996, df = 15, p-value = 0.8828
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.9195942 0.7987027
## sample estimates:
## mean of the differences
## -0.06044576

As you can see, the p-value does change. It becomes smaller as we can attribute more of the differences to the ‘with punishment’ treatment, but the p-value is still very large (0.8828), so we still conclude that there is no evidence that the contributions in Period 1 differ.

  1. Using the data for Period 10:
spurious correlation
A strong linear association between two variables that does not result from any direct relationship, but instead may be due to coincidence or to another unseen factor.

An important point to note is that statistical significance cannot tell us anything about causation. In the example of house size and exam scores shown in Figure 2.8, there was a statistically significant relationship between the two variables (students living in a three-bedroom house had higher exam scores, on average, than students living in a two-bedroom house). However, we cannot say that the larger size of the house was the cause of higher exam scores, because it is unlikely that building an extra room would automatically make someone smarter. Statistical significance cannot help us detect these spurious correlations.

However, experiments can help us determine whether there is a causal link between two variables or not. If we conduct an experiment and find a statistically significant difference in outcomes, then we can conclude that one variable is the cause of the other.

  1. Refer to the results from the public goods games.

Experiments can be useful for identifying causal links. However, if people’s behaviour in experimental conditions were different from their behaviour in the real world, our results would not be applicable anywhere outside the experiment.

  1. Discuss some limitations of experiments, and suggest some ways to address (or partially address) them. (You may find pages 158–171 of the paper ‘What do laboratory experiments measuring social preferences reveal about the real world?’ helpful, as well as the discussion on free-riding and altruism in Section 2.6 of Economy, Society, and Public Policy.)
  1. Benedikt Herrmann, Christian Thöni, and Simon Gächter. 2008. Figure 3 in ‘Antisocial punishment across societies’. Science Magazine 319 (5868): p. 1365.