Contents

# Empirical Project 3 Measuring the effect of a sugar tax

Learning objectives

In this project you will:

• use the differences-in-differences method to measure the effects of a policy or program, and explain how this method works
• use line and column charts to visualize and compare multiple variables
• create summary tables to describe the data
• interpret the p-value in the context of a policy or program evaluation.

### Key concepts

• Concepts needed for this project: mean, frequency table, statistical significance, and p-value.
• Concepts introduced in this project: natural experiment, differences-in-differences, and conditional mean.

## Introduction

### CORE projects

This empirical project is related to material in:

• Unit 3 of Economy, Society, and Public Policy
• Unit 22 of The Economy.
natural experiment
An empirical study exploiting naturally occurring statistical controls in which researchers do not have the ability to assign participants to treatment and control groups, as is the case in conventional experiments. Instead, differences in law, policy, weather, or other events can offer the opportunity to analyse populations as if they had been part of an experiment. The validity of such studies depends on the premise that the assignment of subjects to the naturally occurring treatment and control groups can be plausibly argued to be random.

In Empirical Project 1, we mentioned that natural experiments can help us determine whether one variable causes another variable. A useful application of natural experiments is assessing the effects of a policy. To do so, we compare the outcomes of two groups both before and after the policy took effect:

• The treatment group: those who were affected by the policy.
• The control group: those who were not affected by the policy.

If other factors that could affect the outcomes remain constant over the period considered, then we can reasonably conclude that any observed differences between the groups are due to the policy. Natural experiments therefore allow us to make causal statements about policies and outcomes.

differences-in-differences
A method that applies an experimental research design to outcomes observed in a natural experiment. It involves comparing the difference in the average outcomes of two groups, a treatment and control group, both before and after the treatment took place.

The method of comparing the treatment and control groups, both before and after the treatment took place, is known in economics as differences-in-differences. Unlike lab experiments, in natural experiments we cannot choose exactly who receives the treatment. The reason why we need to compare the two groups before the treatment took place is to take account of any pre-existing differences between the two groups, for example differences in age (for people) or characteristics (for products).

We will use the 2014 sugar tax in the US to learn how before-and-after comparisons are done in practice. Sugar-sweetened beverages (SSBs) are considered unhealthy because of their link to conditions such as diabetes and obesity. In November 2014, the city of Berkeley in California became the first US jurisdiction to implement a tax on SSB distributors, with the aim of discouraging SSB consumption. The tax of one cent per fluid ounce meant that if retailers raised their prices to exactly counter the effects of the tax, a $1 can of soda (12 oz) would now cost$1.12. But did sellers actually respond this way? And what effects did the tax have on shoppers’ expenditure on sugary beverages?

A group of researchers did a differences-in-differences study of the effects of this SSB tax, which you can read about in this news article. Figure 3.1 summarizes the timeline of the tax, and the data that they collected and published in a 2017 Plos Medicine journal paper. We will make before-and-after comparisons using the data they collected, in order to learn about the effects of the sugar tax.

## Working in Excel

### Excel-specific learning objectives

In addition to the learning objectives for this project, in this section you will create summary tables using Excel’s PivotTable option.

### Part 3.1 The treatment group: before-and-after comparisons of retail prices

We will first look at price data from the treatment group (stores in Berkeley) to see what happened to the price of sugary and non-sugary beverages after the tax.

• Download the data from the Global Food Research Program’s website, and select the ‘Berkeley Store Price Survey’ Excel dataset.
• The first tab of the Excel file contains the data dictionary. Make sure you read the data description column carefully, and check that each variable is in the Data tab.
1. Read ‘S1 Text’, from the journal paper’s supporting information, which explains how the Store Price Survey data was collected.
• In your own words, explain how the product information was recorded, and the measures that researchers took to ensure that the data was accurate and representative of the treatment group. What were some of the data collection issues that they encountered?
• Instead of using the name of the store, each store was given a unique ID number (recorded as store_id on the spreadsheet). Using Excel’s filter function, verify that the number of stores in the dataset is the same as that stated in the ‘S1 Text’ (26). Similarly, each product was given a unique ID number (product_id). How many different products are in the dataset?

Following the approach described in ‘S1 Text’, we will compare the variable price per ounce in US\$ cents (price_per_oz_c). We will look at what happened to prices in the two treatment groups before the tax (time = DEC2014) and after the tax (time = JUN2015):

• treatment group one: large supermarkets (store_type = 1)
• treatment group two: pharmacies (store_type = 3)

Instead of calculating summary measures one by one (as we did in Empirical Project 2), we will use Excel’s PivotTable option to make frequency tables containing the summary measures that we are interested in. The tables should be in a different tab to the data (either all in the same tab, or in separate tabs).

1. Use Excel’s PivotTable option to create the following tables:
• A frequency table showing the number (count) of store observations (store type) in December 2014 and June 2015, with ‘store type’ as the row variable and ‘time period’ as the column variable. For each store type, is the number of observations similar in each time period?
• A frequency table showing the number of taxed and non-taxed beverages in December 2014 and June 2015, with ‘store type’ as the row variable and ‘taxed’ as the column variable. (‘Taxed’ equals 1 if the sugar tax applied to that product, and 0 if the tax did not apply). For each store type, is the number of taxed and non-taxed beverages similar?
• A frequency table showing the number of each product type (type), with ‘product type’ as the row variable and ‘time period’ as the column variables. Which product types have the highest number of observations and which have the lowest number of observations? Why might some products have more observations than others?

#### Walk-through 3.1 Making a frequency table using Excel’s PivotTable option

Figure 3.2 How to make a frequency table using Excel’s PivotTable option.

The data

The data will look like this. We will be making a PivotTable using Column D (store type) and Column K (time). It will show how many observations of each store type there are in 2 time periods (DEC2014 and JUN2015).

Figure 3.2a The data will look like this. We will be making a PivotTable using Column D (store type) and Column K (time). It will show how many observations of each store type there are in 2 time periods (DEC2014 and JUN2015).

Insert a blank pivot table

We will make and store the frequency table in a new tab in the spreadsheet.

Figure 3.2b We will make and store the frequency table in a new tab in the spreadsheet.

Insert a blank pivot table

Now we have to tell Excel which data to use to make the table.

Figure 3.2c Now we have to tell Excel which data to use to make the table.

Insert a blank pivot table

The PivotTable is currently blank. In order to create the table, we will use the variables listed in the menu to fill in the empty boxes shown at the bottom.

Figure 3.2d The PivotTable is currently blank. In order to create the table, we will use the variables listed in the menu to fill in the empty boxes shown at the bottom.

Choose the variables to put in the PivotTable

Excel will create a table that looks like the one above. By default it will use all the values of the variables selected (e.g. all time periods). We can remove unnecessary time periods and blank cells by filtering the data.

Figure 3.2e Excel will create a table that looks like the one above. By default it will use all the values of the variables selected (e.g. all time periods). We can remove unnecessary time periods and blank cells by filtering the data.

Filter the values of each variable

After completing step 9, you will have the required table.

Figure 3.2f After completing step 9, you will have the required table.

conditional mean
An average of a variable, taken over a subgroup of observations that satisfy certain conditions, rather than all observations.

Besides counting the number of observations in a particular group, we can also use the PivotTable option to calculate the mean by only using observations that satisfy certain conditions (known as the conditional mean). In this case, we are interested in comparing the mean price of taxed and untaxed beverages, before and after the tax.

1. Calculate and compare conditional means:
• Create a table similar to Figure 3.3, showing the average price per ounce (in cents) for taxed and untaxed beverages separately, with ‘store type’ as the row variable, and ‘taxed’ and ‘time’ as the column variables. To follow the methodology used in the journal paper, make sure to only include products that are present in all time periods, and non-supplementary products (supp = 0).
• Without doing any calculations, summarize any differences or general patterns between December 2014 and June 2015 that you find in the table.
• Would we be able to assess the effect of sugar taxes on product prices by comparing the average price of untaxed goods with that of taxed goods in any given period? Why or why not?
Non-Taxed Taxed
Store type Dec 2014 Jun 2015 Dec 2014 Jun 2015
1
3

The average price of taxed and non-taxed beverages, according to time period and store type.

Figure 3.3 The average price of taxed and non-taxed beverages, according to time period and store type.

#### Walk-through 3.2 Making a PivotTable with more than two variables

Figure 3.4 How to make a PivotTable with more than two variables.

The data

The data will look like this. We will be making a PivotTable using Columns D (store type), Column K (time), and Column I (taxed). It will show the average price of taxed and non-taxed beverages in DEC2014 and JUN2015, according to store type.

Figure 3.4a The data will look like this. We will be making a PivotTable using Columns D (store type), Column K (time), and Column I (taxed). It will show the average price of taxed and non-taxed beverages in DEC2014 and JUN2015, according to store type.

Count the number of times each product appears in the dataset

We only want to look at products that were present in all time periods, to ensure we are comparing the same group of products over time. We will create a new variable (called ‘Number’, shown in Column M) that shows how many periods of data are available for each product. Excel’s COUNTIFS function will help us count the number of observations that satisfy certain conditions.

Figure 3.4b We only want to look at products that were present in all time periods, to ensure we are comparing the same group of products over time. We will create a new variable (called ‘Number’, shown in Column M) that shows how many periods of data are available for each product. Excel’s COUNTIFS function will help us count the number of observations that satisfy certain conditions.

Insert a blank PivotTable

We will make and store the frequency table in a new tab in the spreadsheet.

Figure 3.4c We will make and store the frequency table in a new tab in the spreadsheet.

Insert a blank PivotTable

Now we have to tell Excel which data to use to make the table.

Figure 3.4d Now we have to tell Excel which data to use to make the table.

Fill in the PivotTable

After step 9, Excel will create a table that looks like the one above. By default, it uses all the available data and shows frequencies instead of means.

Figure 3.4e After step 9, Excel will create a table that looks like the one above. By default, it uses all the available data and shows frequencies instead of means.

Filter the values of each variable

We will filter the data so that only the store types we want (1 and 3) are visible.

Figure 3.4f We will filter the data so that only the store types we want (1 and 3) are visible.

Filter the values of each variable

We will filter the data so that only the time periods we want (DEC2014 and JUN2015) are visible.

Figure 3.4g We will filter the data so that only the time periods we want (DEC2014 and JUN2015) are visible.

Filter the values inside the table

Now we tell Excel to only include products that are available in all time periods (Number = 3), and non-supplementary products (supp = 0).

Figure 3.4h Now we tell Excel to only include products that are available in all time periods (Number = 3), and non-supplementary products (supp = 0).

Filter the values inside the table

Your table should now look like the one shown above. Notice that there are fewer observations (216 compared to 575) because now we are looking at a subgroup of the data.

Figure 3.4i Your table should now look like the one shown above. Notice that there are fewer observations (216 compared to 575) because now we are looking at a subgroup of the data.

Change the values inside the table to means

Now we change the values shown in the table from frequencies to means.

Figure 3.4j Now we change the values shown in the table from frequencies to means.

Change the values inside the table to means

In the box that pops up, you can see the different measures that the PivotTable can show (for example, the sum, max, or min). We would like the cells to show averages.

Figure 3.4k In the box that pops up, you can see the different measures that the PivotTable can show (for example, the sum, max, or min). We would like the cells to show averages.

Remove the ‘Grand Total’ rows and columns

By default, the PivotTable will have a column showing the mean for each row (for example, the mean for store type 1 in both DEC2014 and JUN2015. We will remove these columns to make our table easier to read.

Figure 3.4l By default, the PivotTable will have a column showing the mean for each row (for example, the mean for store type 1 in both DEC2014 and JUN2015. We will remove these columns to make our table easier to read.

Remove the ‘Grand Total’ rows and columns

After step 22, you can see that some columns and rows are removed from your table.

Figure 3.4m After step 22, you can see that some columns and rows are removed from your table.

Round cell values to 2 decimal places

To make the table even easier to read, we will round the cell values to 2 decimal places.

Figure 3.4n To make the table even easier to read, we will round the cell values to 2 decimal places.

In order to make a before-and-after comparison, we will make a chart similar to Figure 2 in the journal paper, to show the change in prices for each store type.

1. Using your table from Question 3:
• Calculate the change in the mean price after the tax (price in June 2015 minus price in December 2014) for taxed and untaxed beverages, by store type.
• Using the values you calculated in Question 4(a), plot a column chart to show this information (as done in Figure 2 of the journal paper) with store type on the horizontal axis and price change on the vertical axis. Label each axis and data series appropriately. You should get the same values as shown in Figure 2.

#### Walk-through 3.3 Making a column chart to compare two groups

Figure 3.5 How to make a column chart to compare two groups.

Create a table showing differences in means

We will be using the PivotTable above to create a column chart. (See Walk-through 3.2 for help on how to create the PivotTable.) First, we will calculate the difference in mean price for each type of good and store type.

Figure 3.5a We will be using the PivotTable above to create a column chart. (See Walk-through 3.2 for help on how to create the PivotTable.) First, we will calculate the difference in mean price for each type of good and store type.

Create a table showing differences in means

Fill in the table by using cell formula to calculate the differences required. After step 2, your table will look like the one shown above.

Figure 3.5b Fill in the table by using cell formula to calculate the differences required. After step 2, your table will look like the one shown above.

Draw a column chart

After step 6, the column chart will look like the one shown above. Notice that Excel has put the columns for taxed and non-taxed products in separate groups, but we want the columns to be grouped according to store type.

Figure 3.5c After step 6, the column chart will look like the one shown above. Notice that Excel has put the columns for taxed and non-taxed products in separate groups, but we want the columns to be grouped according to store type.

Switch the row and column variable; change the legend and horizontal axis labels

After step 8, the columns will now be grouped according to store type. The next step is to change the labels in the legend, so that ‘Series1’ is renamed as ‘Taxed’ and ‘Series2’ is renamed as ‘Non-taxed’.

Figure 3.5d After step 8, the columns will now be grouped according to store type. The next step is to change the labels in the legend, so that ‘Series1’ is renamed as ‘Taxed’ and ‘Series2’ is renamed as ‘Non-taxed’.

Switch the row and column variable; change the legend and horizontal axis labels

After step 10, ‘Series1’ will be renamed as ‘Taxed’, and this will also show up in the chart legend.

Figure 3.5e After step 10, ‘Series1’ will be renamed as ‘Taxed’, and this will also show up in the chart legend.

Switch the row and column variable; change the legend and horizontal axis labels

The next step is to change the horizontal axis labels from numbers to store names (‘1’ is ‘Large supermarkets’, and ‘2’ is ‘Pharmacies’).

Figure 3.5f The next step is to change the horizontal axis labels from numbers to store names (‘1’ is ‘Large supermarkets’, and ‘2’ is ‘Pharmacies’).

Switch the row and column variable; change the legend and horizontal axis labels

After step 13, you can see that the horizontal axis labels are store types.

Figure 3.5g After step 13, you can see that the horizontal axis labels are store types.

Add axis titles and a chart title

After step 17, your chart will look like the bottom chart of Figure 2 in the journal paper.

Figure 3.5h After step 17, your chart will look like the bottom chart of Figure 2 in the journal paper.

It is important to know whether the difference in means is statistically significant or not. According to the journal paper, the p-value is less than 0.05 for large supermarkets, but greater than 0.05 for pharmacies.

1. Based on a cut-off (significance level) of 5%, what can we conclude about the difference in means? (You may find the discussion in Part 2.3 helpful.)

### Part 3.2 The control group: before-and after comparisons with prices in other areas

When looking for any price patterns, it is possible that the observed changes are not due to the tax, but instead are due to other events that happened in Berkeley and the in neighbouring areas. If prices changed in a similar way in nearby areas, then what we observed in Berkeley may not be a result of the tax. To investigate whether this is the case, the researchers collected price data from stores in the surrounding areas and compared them with prices in Berkeley.

• The Excel file containing the price data they collected, including information on the date (year and month), location (Berkeley or Non-Berkeley), beverage group (soda, fruit drinks, milk substitutes, milk, and water), average price, and the consumer price index (CPI) for that month.
• ‘S5 Table’ comparing the neighbourhood characteristics of the Berkeley and non-Berkeley stores.
1. Based on ‘S5 Table’, do you think the researchers chose suitable comparison stores? Why or why not?

We will now plot a line chart similar to Figure 3 in the journal paper, to compare prices of similar goods in different locations and see how they have changed over time. To do this, we will need to summarize the data using Excel’s PivotTable option, so that there is one value (the mean price) for each location and type of good in each month.

1. Assess the effects of a tax on prices:
• Create a table similar to Figure 3.6 to show the average price in each month for taxed and non-taxed beverages, according to location. Use ‘year and month’ as the row variables, and ‘tax’ and ‘location’ as the column variables. (You may find Walk-through 3.2 helpful.)
• Plot the four columns of your table on the same line chart, with average price on the vertical axis and time (months) on the horizontal axis. Describe any differences you see between the prices of non-taxed goods in Berkeley and those outside Berkeley, both before the tax (January 2013 to December 2014) and after the tax (March 2015 onwards). Do the same for prices of taxed goods.
• Based on your chart, is it reasonable to conclude that the sugar tax had an effect on prices?
Non-taxed Taxed
Year/Month Berkeley Non-Berkeley Berkeley Non-Berkeley
January 2013
February 2013
March 2013
December 2013
January 2014
February 2016

The average price of taxed and non-taxed beverages, according to location and month.

Figure 3.6 The average price of taxed and non-taxed beverages, according to location and month.

It is important to know whether the observed differences between Berkeley and non-Berkeley prices are statistically significant or not. According to the journal paper, when comparing the mean Berkeley and non-Berkeley price of sugary beverages after the tax, the p-value is less than 0.01, but it is greater than 0.05 for non-sugary beverages after the tax.

1. Based on a cut-off (significance level) of 5%, what can we conclude about the difference in means and the effect of the sugar tax on the price of sugary beverages? (You may find the discussion in Part 2.3 helpful.)

The aim of the sugar tax was to decrease consumption of sugary beverages. Figure 3.7 shows the mean number of calories consumed and the mean volume consumed before and after the tax. The researchers tested whether the difference in means before and after the tax were statistically significant or not, and reported the p-values in the last column.

Usual intake Pre-tax (Nov–Dec 2014),
n = 623
Post-tax (Nov–Dec 2015),
n = 613
Pre-tax–post-tax difference
Mean 95% CI Mean 95% CI
Caloric intake (kilocalories/per capita/day)
Taxed beverages 45.1 29.4, 60.7 38.7 23.0, 54.4 −6.4, p = 0.56
Untaxed beverages 115.7 87.6, 142.5 116.3, 178.9 31.9*, p = 0.04
Volume of intake (grams/capita/day)
Taxed beverages 121.0 78.7, 163.3 97.0 56.5, 137.4 −24.0, p = 0.24
Untaxed beverages 1,839.4 1,692.7, 1,986.1 1,896.5 1,742.3, 2,050.8 57.1, p = 0.22
Models account for age, gender, race/ethnicity, income level, and educational attainment. n is the sample size at each round of the survey after excluding participants with missing values on self-reported race/ethnicity, age, education, income, or monthly intake of sugar-sweetened beverages.

* Statistically significant difference in mean per capita intake between pre-tax and post-tax values, p < 0.05.

Changes in prices, sales, consumer spending, and beverage consumption one year after a tax on sugar-sweetened beverages in Berkeley, California, US: A before-and-after study.

Figure 3.7 Changes in prices, sales, consumer spending, and beverage consumption one year after a tax on sugar-sweetened beverages in Berkeley, California, US: A before-and-after study.

Lynn D. Silver, Shu Wen Ng, Suzanne Ryan-Ibarra, Lindsey Smith Taillie, Marta Induni, Donna R. Miles Jennifer M. Poti, and Barry M. Popkin. 2017. Table 1 in ‘Changes in prices, sales, consumer spending, and beverage consumption one year after a tax on sugar-sweetened beverages in Berkeley, California, US: A before-and-after study’. PLoS Med 14 (4): e1002283.

1. Based on Figure 3.7, was there a statistically significant change in consumption behaviour in Berkeley after the tax (at a 5% significance level)? Suggest some reasons why or why not.
1. Read the ‘Limitations’ in the ‘Discussions’ section of the paper and discuss the strengths and limitations of this study. How could future studies on the sugar tax in Berkeley address these problems? (Some issues you may want to discuss are: the number of stores observed, number of people surveyed, and the reliability of the price data collected.)
1. Suppose that you have the authority to conduct your own sugar tax natural experiment in two neighbouring towns, Town A and Town B. Outline how you would conduct the experiment to ensure that any changes in outcomes (prices, consumption of sugary beverages) are due to the tax and not due to other factors. (Hint: think about what factors you need to hold constant.)

## Working in R

This section is under development and will be in the next release of Doing Economics: Empirical Projects