Empirical Project 11 Working in R
Getting started in R
For this project you will need the following packages:
tidyverse
, to help with data manipulationreadxl
, to import an Excel spreadsheetknitr
, to format tablespsych
, to compute Cronbach’s alpha.
If you need to install any of these packages, run the following code:
install.packages(c("tidyverse", "readxl", "knitr", "psych"))
You can import the libraries now, or when they are used in the R walkthroughs below.
library(tidyverse)
library(readxl)
library(knitr)
library(psych)
Part 11.1 Summarizing the data
We will be using data collected from an internet survey sponsored by the German government.
First, download the survey data and documentation:
 Download the data. Open the file in Excel and read the Data dictionary tab and make sure you know what each variable represents. (Later we will discuss exactly how some of these variables were coded.)
 The paper ‘Data in Brief’ gives a summary of how the survey was conducted. You may find it helpful to read it before starting on the questions below.
 While contingent valuation methods can be useful, they also have shortcomings. Read Section 5 of the paper ‘Introduction to economic valuation methods’ (pages 16–19), and explain which limitations you think apply particularly to the survey we are looking at.
 Likert scale
 A numerical scale (usually ranging from 1–5 or 1–7) used to measure attitudes or opinions, with each number representing the individual’s level of agreement or disagreement with a particular statement.
Before comparing between question types, we will first compare the people assigned to each question type to see if they are similar in demographic characteristics and attitudes towards related topics (such as beliefs about climate change and need for government intervention). Attitudes were assessed using a 1–5 Likert scale, where 1 = strongly disagree, and 5 = strongly agree.
 Recode or create the variables as specified:
 Reversecode the following variables (so that 1 is now 5, 2 is now 4, and so on):
cog_2
,cog_5
,scepticism_6
,scepticism_7
.
 For the variables
WTP_plmin
andWTP_plmax
, create new variables with the values replaced as shown in Figure 11.1 below (these are the actual amounts, in euros, that individuals selected in the survey, and will be useful for calculating summary measures later).
Original value  New value 

1  48 
2  72 
3  84 
4  108 
5  156 
6  192 
7  252 
8  324 
9  432 
10  540 
11  720 
12  960 
13  1,200 
14  1,440 
WTP survey categories (original value) and euro amounts (new value).
R walkthrough 11.1 Importing data and recoding variables
Before importing data in Excel or csv format, check it in a spreadsheet program (such as Excel) to ensure you understand the structure of the data and whether any additional options are required for the
read_excel
function. In this case the data is in a worksheet called ‘Data’, there are no missing values to worry about, and the first row contains the variable names. We can therefore import the data using theread_excel
function without any additional options.library(tidyverse) library(readxl) library(knitr) WTP < read_excel("Project 11 datafile.xlsx",sheet="Data")
Reversecode variables
The first task is to recode variables related to the respondents’ views on certain aspects of government behaviour and attitudes about global warming. This coding makes the interpretation of high and low values consistent across all questions, since the survey questions do not have this consistency.
Note that even though the value of 3 for these variables will stay the same, for the
recode
function to work properly we have to include a mapping for every new value to a previous value.WTP < WTP %>% mutate_at(c("cog_2","cog_5","scepticism_6","scepticism_7"), funs(recode(.,"1"=5,"2"=4,"3"=3,"4"=2,"5"=1)))
Create new variables containing WTP amounts
Although we could employ the same technique as above to recode the value for the minimum and maximum willingness to pay variables, an alternative is to use the
merge
function. This function allows us to combine two dataframes via values given in a particular variable.We start by creating a new dataframe (
category_amount
) that has two variables: the original category value and the corresponding new euro amount. We then apply themerge
function to the WTP dataframe and the new dataframe, specifying the variable in WTP usingby.x
and in the new dataframe usingby.y
as the variables that link the data in each dataframe together. We also use theall.x=TRUE
option, otherwise themerge
function will drop any observations with missing values for theWTP_plmin
andWTP_plmax
variables. Finally we have to rename the column of the merged new values to something more meaningful.wtp_euro_levels < c(48,72,84,108,156,192,252,324,432,540,720,960,1200,1440) # Vector containing the euro amounts category_amount < data.frame(original = 1:14, new = wtp_euro_levels) # Create mapping dataframe # Create a new variable for the minimum WTP WTP < merge(WTP,category_amount, by.x="WTP_plmin",by.y = "original", all.x = TRUE) %>% rename(.,"WTP_plmin_euro"="new") # Create a new variable for the maximum WTP WTP < merge(WTP,category_amount, by.x="WTP_plmax",by.y = "original", all.x = TRUE) %>% rename(.,"WTP_plmax_euro"="new")
 Create the following indices, giving them an appropriate name in your spreadsheet (make sure to use the reversecoded variable wherever relevant):
 Belief that climate change is a real phenomenon: Take the mean of
scepticism_2
,scepticism_6
, andscepticism_7
.
 Preferences for government intervention to solve problems in society: Take the mean of
cog_1
,cog_2
,cog_3
,cog_4
,cog_5
, andcog_6
.
 Feeling of personal responsibility to act proenvironmentally: Take the mean of
PN_1
,PN_2
,PN_3
,PN_4
,PN_6
, andPN_7
.
R walkthrough 11.2 Creating indices
We can create all of the required indices in three steps using the
rowMeans
function. In each step we use thecbind
function to join the required variables (columns) together as a matrix. As the data is stored as a single observation per row, the index value is the average of the values in each row of this matrix, which we calculate using therowMeans
function.WTP < WTP %>% rowwise() %>% # Ensure subsequent operations are applied by row mutate(.,climate = rowMeans(cbind(scepticism_2, scepticism_6, scepticism_7))) %>% mutate(.,gov_intervention = rowMeans(cbind(cog_1, cog_2, cog_3, cog_4, cog_5, cog_6))) %>% mutate(.,pro_environment = rowMeans(cbind(PN_1, PN_2, PN_3, PN_4, PN_6, PN_7))) %>% ungroup() # Return the dataframe to the original format
 Cronbach’s alpha
 A measure used to assess the extent to which a set of items is a reliable or consistent measure of a concept. This measure ranges from 0–1, with 0 meaning that all of the items are independent of one another, and 1 meaning that all of the items are perfectly correlated with each other.
When creating indices, we may be interested to see if each item used in the index measures the same underlying concept of interest (known as reliability or consistency). There are two common ways to assess reliability: either look at the correlation between items in the index, or use a summary measure called Cronbach’s alpha (this measure is used in the social sciences).
Cronbach’s alpha is a way to summarize the correlations between many variables, and ranges from 0 to 1, with 0 meaning that all of the items are independent of one another, and 1 meaning that all of the items are perfectly correlated with each other. While higher values of this measure indicate that the items are closely related and therefore measure the same concept, with values that are very close to 1 (or 1), we could be concerned that our index contains redundant items (for example, two items that tell us the same information, so we would only want to use one or the other, but not both). You can read more about this in the paper ‘Using and interpreting Cronbach’s Alpha’.
 Calculate correlation coefficients and interpret Cronbach’s alpha:
 For one of the indices you created in Question 3, create a correlation table to show the correlation between each of the items in the index. Figure 11.2 shows an example for Question 3(a). (Remember that the correlation between A and B is the same as the correlation between B and A, so you only need to calculate the correlation for each pair of items once). Are the items in that index strongly correlated?
 Use the
alpha
function (part of thepsych
package included with R) to compute the Cronbach’s alpha for these indices. Interpret these values in terms of index reliability.
scepticism_2  scepticism_6  scepticism_7  

scepticism_2  1  –  – 
scepticism_6  1  –  
scepticism_7  1 
Correlation table for items in Question 3(a).
R walkthrough 11.3 Calculating correlation coefficients
Calculate correlation coefficients
We covered calculating correlation coefficients in R walkthrough 10.1. In this case, since there are no missing values we can use the
cor
function without any additional options.For the questions on climate change:
cor(cbind(WTP$scepticism_2,WTP$scepticism_6, WTP$scepticism_7))
## [,1] [,2] [,3] ## [1,] 1.0000000 0.3904296 0.4167478 ## [2,] 0.3904296 1.0000000 0.4624211 ## [3,] 0.4167478 0.4624211 1.0000000
For the questions on government behaviour:
cor(cbind(WTP$cog_1, WTP$cog_2, WTP$cog_3, WTP$cog_4, WTP$cog_5, WTP$cog_6))
## [,1] [,2] [,3] [,4] [,5] [,6] ## [1,] 1.0000000 0.2509464 0.32358783 0.6823385 0.28925672 0.4141992 ## [2,] 0.2509464 1.0000000 0.11761093 0.2771883 0.40794667 0.0828661 ## [3,] 0.3235878 0.1176109 1.00000000 0.3347662 0.01818617 0.3128608 ## [4,] 0.6823385 0.2771883 0.33476619 1.0000000 0.27424993 0.4597244 ## [5,] 0.2892567 0.4079467 0.01818617 0.2742499 1.00000000 0.1045843 ## [6,] 0.4141992 0.0828661 0.31286082 0.4597244 0.10458434 1.0000000
For the questions on personal behaviour:
cor(cbind(WTP$PN_1, WTP$PN_2, WTP$PN_3, WTP$PN_4, WTP$PN_6, WTP$PN_7))
## [,1] [,2] [,3] [,4] [,5] [,6] ## [1,] 1.0000000 0.4824823 0.4282149 0.4226534 0.4138090 0.4584007 ## [2,] 0.4824823 1.0000000 0.6315015 0.4375971 0.4994126 0.6542377 ## [3,] 0.4282149 0.6315015 1.0000000 0.4596711 0.5219712 0.5894731 ## [4,] 0.4226534 0.4375971 0.4596711 1.0000000 0.5668642 0.3947270 ## [5,] 0.4138090 0.4994126 0.5219712 0.5668642 1.0000000 0.4551294 ## [6,] 0.4584007 0.6542377 0.5894731 0.3947270 0.4551294 1.0000000
Calculate Cronbach’s alpha
It is straightforward to compute the Cronbach’s alpha using the
alpha
function.psych::alpha(WTP[c("scepticism_2", "scepticism_6", "worry")])$total$std.alpha
## [1] 0.663788
psych::alpha(WTP[c("cog_1", "cog_2", "cog_3", "cog_4", "cog_5", "cog_6")])$total$std.alpha
## [1] 0.7102249
psych::alpha(WTP[c("PN_1", "PN_2", "PN_3", "PN_4", "PN_6", "PN_7")])$total$std.alpha
## [1] 0.8543827
Now we will compare characteristics of people in the dichotomous choice (DC) group and twoway payment ladder (TWPL) group (the variable abst_format
indicates which group an individual belongs to). Since the groups are of different sizes, we will use percentages instead of frequencies.

For each group, create separate tables to summarize the distribution of the following variables:
 gender (
sex
)  age (
age
)  number of children (
kids_nr
)  household net income per month in euros (
hhnetinc
)  membership in environmental organization (
member
)  highest educational attainment (
education
).
Without doing formal statistical tests, do these two groups of individuals (DC and TWPL) look similar in demographic characteristics?
 gender (
R walkthrough 11.4 Using loops to obtain summary statistics
The two different formats (DC and TWPL) are recorded in the
abst_format
variable, and take the valuesref
andladder
respectively.variables < list(quo(sex),quo(age),quo(kids_nr),quo(hhnetinc),quo(member),quo(education)) for (i in seq_along(variables)){ WTP %>% group_by(abst_format,!!variables[[i]]) %>% summarize (n = n()) %>% mutate(freq = n / sum(n)) %>% select(n) %>% spread(abst_format,freq) %>% print() }
## # A tibble: 2 x 3 ## sex ladder ref ## <chr> <dbl> <dbl> ## 1 female 0.518 0.523 ## 2 male 0.482 0.477 ## # A tibble: 6 x 3 ## age ladder ref ## <chr> <dbl> <dbl> ## 1 18  24 0.0949 0.0964 ## 2 25  29 0.0830 0.0865 ## 3 30  39 0.178 0.172 ## 4 40  49 0.223 0.226 ## 5 50  59 0.241 0.239 ## 6 60  69 0.180 0.181 ## # A tibble: 5 x 3 ## kids_nr ladder ref ## <chr> <dbl> <dbl> ## 1 four or more children 0.00988 0.00895 ## 2 no children 0.646 0.657 ## 3 one child 0.204 0.176 ## 4 three children 0.0296 0.0348 ## 5 two children 0.111 0.123 ## # A tibble: 12 x 3 ## hhnetinc ladder ref ## <chr> <dbl> <dbl> ## 1 1100 bis unter 1500 Euro 0.142 0.132 ## 2 1500 bis unter 2000 Euro 0.150 0.146 ## 3 2000 bis unter 2600 Euro 0.115 0.148 ## 4 2600 bis unter 3200 Euro 0.107 0.107 ## 5 3200 bis unter 4000 Euro 0.111 0.0815 ## 6 4000 bis unter 5000 Euro 0.0514 0.0497 ## 7 500 bis unter 1100 Euro 0.134 0.142 ## 8 5000 bis unter 6000 Euro 0.0277 0.0169 ## 9 6000 bis unter 7500 Euro 0.00791 0.00398 ## 10 7500 und mehr 0.00395 0.00497 ## 11 bis unter 500 Euro 0.0296 0.0417 ## 12 do not want to answer 0.121 0.125 ## # A tibble: 2 x 3 ## member ladder ref ## <chr> <dbl> <dbl> ## 1 no 0.923 0.914 ## 2 yes 0.0771 0.0865 ## # A tibble: 6 x 3 ## education ladder ref ## <dbl> <dbl> <dbl> ## 1 1 0.0119 0.0129 ## 2 2 0.0198 0.0209 ## 3 3 0.342 0.328 ## 4 4 0.263 0.269 ## 5 5 0.0692 0.0686 ## 6 6 0.294 0.300
The output above gives the required tables, but is not easy to read. You may want to tidy up the results, for example by translating and reordering the options in the household net income variable (
hhnetinc
).
 Create summary tables as shown in Figure 11.3 for each index you created in Question 3. Without doing formal statistical tests, do the two groups of individuals look similar in the attitudes specified?
Mean  Standard deviation  Min  Max  

DC format  
TWPL format 
Summary table for indices.
R walkthrough 11.5 Calculating summary statistics
The
summarize_at
function can provide multiple statistics on a number of variables in one command. Simply provide a list of the variables you want to summarize and then use thefuns()
option to specify the summary statistics you need.WTP %>% group_by(abst_format) %>% summarize_at(c("climate","gov_intervention","pro_environment"),funs(mean,sd,min,max)) %>% gather(index,value, climate_mean:pro_environment_max) %>% # Use gather and spread functions to reformat output spread(abst_format,value) %>% kable(.,format = "markdown", digits=2)
index  ladder ref ::: climate_max  5.00 5.00 climate_mean  2.29 2.37 climate_min  1.00 1.00 climate_sd  0.84 0.85 gov_intervention_max  5.00 5.00 gov_intervention_mean  3.15 3.19 gov_intervention_min  1.00 1.00 gov_intervention_sd  0.70 0.66 pro_environment_max  5.00 5.00 pro_environment_mean  3.03 3.01 pro_environment_min  1.00 1.00 pro_environment_sd  0.79 0.82
Part 11.2 Comparing willingness to pay across methods and individual characteristics
Before comparing WTP across question formats, we will summarize the distribution of WTP within each question format.
 For individuals who answered the TWPL question:
 Use the variables
WTP_plmin
andWTP_plmax
to create column charts (one for each variable) with frequency on the vertical axis and category (the numbers 1–14) on the horizontal axis. Describe characteristics of the distributions shown on the charts.
 Using the variables you created in Question 2(c) in Part 11.1, make a new variable that contains the average of the two variables (the average of the minimum and maximum willingness to pay).
 Using the variables you created in Question 2(c) in Part 11.1 and Question 1(b) here, calculate the mean and median willingness to pay.
 Using the variable from Question 1(b), calculate the correlation between the average WTP and the demographic and attitudinal variables. Interpret the relationships implied by the coefficients.
R walkthrough 11.6 Summarizing willingness to pay variables
Create column charts for minimum and maximum WTP
Before we can plot a column chart, we need to compute frequencies (number of observations) for each value of the willingness to pay (1–14). We do this separately for the minimum and maximum willingness to pay.
In each case we select the relevant variable and remove any observations with missing values using the
na.omit
function. We can then separate the data by level of theWTP_plmin_euro
orWTP_PLmax_euro
variables (usinggroup_by
), then obtain a frequency count using thesummarize
function. We also set this variable’s type to factor, for the horizontal axis labels in the column chart.Once we have the frequency count stored as a dataframe, we can plot the column charts.
For the minimum willingness to pay:
df.plmin < WTP %>% select(WTP_plmin_euro) %>% na.omit() %>% group_by(WTP_plmin_euro) %>% summarize(n = n()) %>% mutate(WTP_plmin_euro = factor(WTP_plmin_euro, levels = wtp_euro_levels)) ggplot(df.plmin, aes(WTP_plmin_euro, n)) + geom_bar(stat="identity", position = "identity") + xlab("Minimum WTP (euros)") + ylab("Frequency") + theme_bw()
For the maximum willingness to pay:
df.plmax < WTP %>% select(WTP_plmax_euro) %>% na.omit() %>% group_by(WTP_plmax_euro) %>% summarize(n = n()) %>% mutate(WTP_plmax_euro = factor(WTP_plmax_euro, levels = wtp_euro_levels)) ggplot(df.plmax, aes(WTP_plmax_euro, n)) + geom_bar(stat="identity", position = "identity") + xlab("Maximum WTP (euros)") + ylab("Frequency") + theme_bw()
Calculate average WTP for each individual
We can use the
rowMeans
function to obtain the average of the minimum and maximum willingness to pay.WTP < WTP %>% rowwise() %>% mutate(.,WTP_average = rowMeans(cbind(WTP_plmin_euro,WTP_plmax_euro))) %>% ungroup()
Calculate mean and median WTP across individuals
The mean and median of this average value can be obtained using the
mean
andmedian
functions, although we have to use thena.rm=TRUE
option to handle missing values correctly.mean(WTP$WTP_average, na.rm = TRUE)
## [1] 268.5345
median(WTP$WTP_average, na.rm = TRUE)
## [1] 132
Calculate correlation coefficients
We showed how to obtain a matrix of correlation coefficients for a number of variables in R walkthrough 8.8. We use the same process here.
WTP %>% mutate(gender = as.numeric(ifelse(sex=="female",0,1))) %>% # Create the gender variable select(WTP_average,education,gender,climate,gov_intervention,pro_environment) >% cor(., use = "pairwise.complete.obs") > M M[,"WTP_average"]
## WTP_average education gender climate ## 1.00000000 0.13817368 0.03694972 0.14462072 ## gov_intervention pro_environment ## 0.18845205 0.18750331
 For individuals who answered the DC question:
 Each individual was given an amount and had to decide ‘yes’, ‘no’, or ‘no vote/abstain from deciding’. Make a table showing the frequency of
DC_ref_outcome
, withcosts
as the row variable andDC_ref_outcome
as the column variable.
 Use this table to calculate the percentage of individuals who voted ‘no’ and ‘yes’ for each amount (in other words as a percentage of the row total, not the overall total). Count individuals who chose ‘abstain’ as voting ‘no’.
 Make a line chart showing the ‘demand curve’, with percentage of individuals who voted ‘yes’ as the vertical axis variable and amount (in euros) as the horizontal axis variable. Describe features of this ‘demand curve’ that you find interesting.
 Repeat Question 2(b), this time excluding individuals who chose ‘abstain’ from the calculations. Plot this new ‘demand curve’ on the chart created for Question 2(c). Do your results change qualitatively depending on how you count individuals who did not vote?
R walkthrough 11.7 Summarizing Dichotomous Choice (DC) variables
Create frequency table for DC_ref_outcome
We can group by
costs
andDC_ref_outcome
to obtain a count for each combination of amount and vote response. We can alsorecode
the voting options to ‘Yes’, ‘No’, and ‘Abstain’.WTP_DC < WTP %>% group_by(costs,DC_ref_outcome) %>% summarize(n=n()) %>% na.omit() %>% mutate_at("DC_ref_outcome", funs(recode(.,"do not support referendum and nopay"="No","support referendum and pay"="Yes","would not vote"="Abstain"))) %>% spread(DC_ref_outcome,n) kable(WTP_DC,format="markdown", digits=2)
 costs Abstain No Yes ::::  48 12 21 32  72 11 30 40  84 12 24 45  108 7 35 31  156 13 31 40  192 11 25 25  252 9 32 28  324 16 41 27  432 11 35 29  540 9 31 22  720 12 39 13  960 14 28 15  1200 11 42 21  1440 19 42 15
Add column showing proportion voting yes or no
We can extend the table from Question 2(a) to include the proportion voting yes or no (to obtain percentages, multiply the values by 100).
WTP_DC < WTP_DC %>% mutate(total = Abstain + No + Yes, prop_no = (Abstain + No)/total, prop_yes = Yes / total) %>% mutate_if(is.numeric, funs(round(., 2))) # Round all numbers to 2 decimal places kable(WTP_DC,format="markdown", digits=2)
 costs Abstain No Yes total prop_no prop_yes :::::::  48 12 21 32 65 0.51 0.49  72 11 30 40 81 0.51 0.49  84 12 24 45 81 0.44 0.56  108 7 35 31 73 0.57 0.42  156 13 31 40 84 0.52 0.48  192 11 25 25 61 0.59 0.41  252 9 32 28 69 0.59 0.41  324 16 41 27 84 0.68 0.32  432 11 35 29 75 0.61 0.39  540 9 31 22 62 0.64 0.36  720 12 39 13 64 0.80 0.20  960 14 28 15 57 0.74 0.26  1200 11 42 21 74 0.72 0.28  1440 19 42 15 76 0.80 0.20
Make a line chart of WTP
Using the dataframe generated for Questions 2(a) and (b), we can plot the ‘demand curve’ as a scatterplot with connected points by using the
geom_point
andgeom_line
options forggplot
. Adding the extra optionscale_x_continuous
changes the default labeling on the horizontal axis to display ticks at every 100 euros.p < ggplot(WTP_DC,aes(y=prop_yes,x=costs)) +geom_point() + geom_line(size=1) + ylab("% Voting 'Yes'") + xlab("Amount (euros)") + scale_x_continuous(breaks = seq(0,1500,100)) + theme_bw() print(p)
Calculate new proportions and add them to the table and chart
It is straightforward to calculate the new proportions and add them to the existing dataframe, however, we will need to reshape the data to plot multiple lines on the same scatterplot.
WTP_DC < WTP_DC %>% mutate(total_ex = No + Yes, prop_no_ex = No/total_ex, prop_yes_ex = Yes / total_ex) %>% mutate_if(is.numeric, funs(round(., 2))) # Round all numbers to 2 decimal places kable(WTP_DC,format="markdown", digits=2)
 costs Abstain No Yes total prop_no prop_yes total_ex prop_no_ex prop_yes_ex ::::::::::  48 12 21 32 65 0.51 0.49 53 0.40 0.60  72 11 30 40 81 0.51 0.49 70 0.43 0.57  84 12 24 45 81 0.44 0.56 69 0.35 0.65  108 7 35 31 73 0.57 0.42 66 0.53 0.47  156 13 31 40 84 0.52 0.48 71 0.44 0.56  192 11 25 25 61 0.59 0.41 50 0.50 0.50  252 9 32 28 69 0.59 0.41 60 0.53 0.47  324 16 41 27 84 0.68 0.32 68 0.60 0.40  432 11 35 29 75 0.61 0.39 64 0.55 0.45  540 9 31 22 62 0.64 0.36 53 0.58 0.42  720 12 39 13 64 0.80 0.20 52 0.75 0.25  960 14 28 15 57 0.74 0.26 43 0.65 0.35  1200 11 42 21 74 0.72 0.28 63 0.67 0.33  1440 19 42 15 76 0.80 0.20 57 0.74 0.26
WTP_DC %>% select(costs,prop_yes,prop_yes_ex) %>% gather(Vote,value,prop_yes:prop_yes_ex) %>% ggplot(.,aes(y=value,x=costs,color=Vote)) + geom_line(size=1) + geom_point() + ggtitle("'Demand curve' from DC respondents, under different treatments for 'Abstain' responses.") + scale_color_manual(values = c("blue", "red"), labels=c("counted as no","excluded")) + ylab("% voting 'yes'") + xlab("Costs (euros)") + theme_bw()
 Compare the mean and median WTP under both question formats:
 Complete Figure 11.8 and use it to calculate the difference in means (DC minus TWPL), the standard deviation of these differences, and the number of observations. (The mean of DC is the mean of
DC_ref_outcome
for individuals who votedyes
.)
 Obtain 95% confidence intervals for the difference of means for each question format. Discuss the statistical significance of your findings.
 Does the median WTP look different across question formats? (You do not need to do any formal statistical testing.)
 Using your answers to Questions 3(a)–(c), would you recommend that governments use the mean or median WTP in policy making decisions? (That is, which measure is more robust to changes in the question format?)
Format  Mean  Standard deviation  Number of observations 

DC  
TWPL 
Summary table for WTP.
R walkthrough 11.8 Calculating confidence intervals for differences in means
Calculate the difference in means, standard deviations, and number of observations
We first create two vectors that will contain the WTP values for each of the two question methods. For the DC format, willingness to pay is recorded in the
costs
variable, so we select all observations where theDC_ref_outcome
variable indicates the individual voted ‘yes’ and drop any missing observations. For the TWPL format we use theWTP_average
variable that we created in R walkthrough 11.6.DC_WTP < WTP %>% subset(DC_ref_outcome == "support referendum and pay") %>% select(costs) %>% filter(!is.na(costs)) %>% as.matrix() # Print out the mean, sd, and count cat(sprintf("DC Format  mean: %.1f, standard deviation %.1f, count %d\n", mean(DC_WTP), sd(DC_WTP),length((DC_WTP))))
## DC Format  mean: 348.2, standard deviation 378.6, count 383
TWPL_WTP < WTP %>% select(WTP_average) %>% filter(!is.na(WTP_average)) %>% as.matrix() cat(sprintf("TWPL Format  mean: %.1f, standard deviation %.1f, count %d\n", mean(TWPL_WTP), sd(TWPL_WTP),length((TWPL_WTP))))
## TWPL Format  mean: 268.5, standard deviation 287.7, count 348
Calculate 95% confidence intervals
Using the
t.test
function to obtain 95% confidence intervals was covered in R walkthroughs 8.10 and 10.6. As we have already separated the data for the two different question formats in Question 3(a), we can obtain the confidence interval directly.t.test(DC_WTP,TWPL_WTP,conf.level = 0.05)$conf.int
## [1] 78.10141 81.20560 ## attr(,"conf.level") ## [1] 0.05
Calculate median WTP for the DC format
In R walkthrough 11.6 we obtained the median WTP for the TWPL format (132). We now obtain the WTP using the DC format.
median(DC_WTP)
## [1] 192