**Empirical Project 1** Measuring climate change

Learning objectives

In this project you will:

- use charts and summary measures to discuss the extent of climate change and its possible causes
- use line charts to describe the behaviour of real-world variables over time
- summarize data in a frequency table, and visualize distributions with column charts
- describe a distribution using mean and variance
- use scatterplots and the correlation coefficient to assess the degree of association between two variables
- explain what correlation measures and the limitations of correlation.

## Key concepts

Concepts needed for this project: mean, median, and decile.Concepts introduced in this project: variance, frequency table, correlation and correlation coefficient, causation, and spurious correlation.

## Introduction

## CORE projects

This empirical project is related to material in:

Climate change is one of the effects of the rapid economic growth that has occurred in most countries since the Industrial Revolution. It is an important issue for policymaking, since governments need to assess how serious the problem is and then decide how to mitigate it.

Suppose you are a policy advisor for a small island nation. The government would like to know more about the extent of climate change and its possible causes. They ask you the following questions:

- How can we tell whether climate change is actually happening or not?
- If it is real, how can we measure the extent of climate change and determine what is causing it?

To answer the first question, we look at the behaviour of environmental variables over time to see whether there are general patterns in environmental conditions that could be indicative of climate change. In this project, we focus on temperature-related variables.

To answer the second question, we examine the degree of association between temperature and another variable, CO_{2} emissions, and consider whether there is a plausible relationship between the two, or whether there are other explanations for what we observe.

## Working in Excel

### Part 1.1 The behaviour of average surface temperature over time

In the questions below, we look at data from NASA about land-ocean temperature anomalies in the northern hemisphere from 1880 to 2016. Figure 1.1 is constructed using this data and shows temperatures in the northern hemisphere over the period 1880–2016, expressed as differences from the average temperature from 1951 to 1980. We start by creating charts similar to Figure 1.1, in order to visualize the data and spot patterns more easily.

Before plotting any charts, download the data and make sure you understand how temperature is measured:

- Go to NASA’s Goddard Institute for Space Studies website.
- Under the subheading ‘Combined Land-Surface Air and Sea-Surface Water Temperature Anomalies’, select the CSV version of ‘Northern Hemisphere-mean monthly, seasonal, and annual means’.
- The default name of this file is NH.Ts+dSST.csv. Give it a suitable name and save it in an easily accessible location, such as a folder on your Desktop or in your personal folder.

- In this dataset, temperature is measured as ‘anomalies’ rather than absolute temperature. Using this source as a reference, explain in your own words what temperature ‘anomalies’ means. Why have researchers chosen this particular measure over other measures (such as absolute temperature)?

Now create some line charts using monthly, seasonal, and annual data, which help us look for general patterns over time.

- Choose one month and plot a line chart with average temperature anomaly on the vertical axis and time (1880–2016) on the horizontal axis. Label each axis appropriately and give your chart a suitable title (Refer to Figure 1.1 as an example.)

Walk-through 1.1Drawing a line chart of temperature and time

- The columns labelled DJF, MAM, JJA, and SON contain seasonal averages (means). For example, the MAM column contains the average of the March, April, and May columns for each year. Plot a separate line chart for each season, using average temperature anomaly for that season on the vertical axis and time (1880–2016) on the horizontal axis.

- The column labelled J–D contains the average temperature anomaly for each year.

- Plot a line chart with annual average temperature anomaly on the vertical axis and time (1880–2016) on the horizontal axis. Your chart should look like Figure 1.1.
*Extension:*Add a horizontal line that intersects the vertical axis at 0, and label it ‘1951–1980 average’.

- What do your charts from Questions 2 to 4
*(a)*suggest about the relationship between temperature and time?

Walk-through 1.2Plotting a line chart and adding a horizontal line

- You now have charts for three different time intervals: month, season, and year. For each time interval, discuss what we can learn about patterns in temperature over time that we might not be able to learn from the charts of other time intervals.

- Compare your chart from Question 4 to Figure 1.4 which also shows the behaviour of temperature over time using data taken from the National Academy of Sciences.

- Discuss the similarities and differences between the charts. (For example, are the horizontal and vertical axes variables the same, or do the lines have the same shape?)

- Looking at the behaviour of temperature over time from 1000 to 1900 in Figure 1.4, are the observed patterns in your chart unusual?

- Based on your answers to Questions 4 and 5, do you think the government should be concerned about climate change?

### Part 1.2 Variation in temperature over time

Aside from changes in the mean temperature, the government is also worried that climate change will result in more frequent extreme weather events. The island has experienced a few major storms and severe heat waves in the past, both of which caused serious damage and disruption to economic activity.

Will weather become more extreme and vary more as a result of climate change? This *New York Times* article uses the same temperature dataset you have been using to investigate the distribution of temperatures and temperature variability over time. Read through this article, paying close attention to the descriptions of the temperature distributions.

We can use the mean and median to describe distributions, and we can use deciles to describe parts of distributions. To visualize distributions, we can use column charts in Excel. (For some practice on using these concepts and creating column charts in Excel, see Section 1.1 of *Economy, Society, and Public Policy*). We are now going to create similar charts of temperature distributions to the ones in the *New York Times* article, and look at different ways of summarizing distributions.

- frequency table
- A record of how many observations in a dataset have a particular value, range of values, or belong to a particular category.

In order to create a column chart using the temperature data we have, we first need to summarize the data using a **frequency table**. Instead of using deciles to group the data, we use intervals of 0.05, so that temperature anomalies with a value from −0.3 to −0.025 will be in one group, a value greater than −0.025 up until 0.02 in another group, and so on. The frequency table shows us how many values belong to a particular group.

- Using the monthly data for June, July, and August (columns G to I in your spreadsheet), create two frequency tables similar to Figure 1.5 below for the years 1951–1980 and 1981–2010, respectively. The values in the first column should range from −0.3 to 1.05, in intervals of 0.05.

Range of temperature anomaly (T) | Frequency |
---|---|

−0.3 | |

−0.25 | |

… | |

1.00 | |

1.05 |

Walk-through 1.3Creating a frequency table in Excel

- Using the frequency tables from Question 1:

- Plot two separate column charts for 1951–1980 and 1981–2010 to show the distribution of temperatures, with frequency on the vertical axis and the range of temperature anomaly on the horizontal axis. Your charts should look similar to those in the
*New York Times*article.

- Using your charts, describe the similarities and differences (if any) between the distributions of temperature anomalies in 1951–1980 and 1981–2010.

- variance
- A measure of dispersion in a frequency distribution, equal to the mean of the squares of the deviations from the arithmetic mean of the distribution. The variance is used to indicate how ‘spread out’ the data is. A higher variance means that the data is more spread out. Example: The set of numbers 1, 1, 1 has zero variance (no variation), while the set of numbers 1, 1, 999 has a high variance of 2178 (large spread).

Now we will use our data to look at different aspects of distributions. First, we will learn how to use deciles to determine which observations are ‘normal’ and ‘abnormal’, and then learn how to use **variance** to describe the shape of a distribution.

- The
*New York Times*article considers the bottom third (the lowest or coldest one-third) of temperature anomalies in 1951–1980 as ‘cold’ and the top third (the highest or hottest one-third) of anomalies as ‘hot’. In decile terms, temperatures in the 1st to 3rd decile are ‘cold’ and temperatures in the 7th to 10th decile or above are ‘hot’ (rounded to the nearest decile). Use Excel’s PERCENTILE.INC function to determine what values correspond to the 3rd and 7th decile, across all months in 1951–1980.

Walk-through 1.4Using Excel’s PERCENTILE.INC function

- Based on the values you found in Question 3, count the number of anomalies that are considered ‘hot’ in 1981–2010, and express this as a percentage of all the temperature observations in that period. Does your answer suggest that we are experiencing hotter weather more frequently in 1981–2010? (Remember that each decile represents 10% of observations, so 30% of temperatures were considered ‘hot’ in 1951–1980.)

Walk-through 1.5Using Excel’s COUNTIF function

- The
*New York Times*article discusses whether temperatures have become more variable over time. One way to measure temperature variability is by calculating the variance of the temperature distribution for each season (DJF, MAM, JJA, and SON).

- Calculate the mean (average) and variance separately for the following time periods: 1921–1950, 1951–1980, and 1981–2010.

- For each season, compare the variances in different periods, and explain whether or not temperature appears to be more variable in later periods.

Walk-through 1.6Calculating and understanding the variance

- Using the findings of the
*New York Times*article and your answers to Questions 1 to 5, discuss whether temperature appears to be more variable over time. Would you advise the government to spend more money on mitigating the effects of extreme weather events?

### Part 1.3 Carbon emissions and the environment

- correlation
- A measure of how closely related two variables are. Two variables are correlated if knowing the value of one variable provides information on the likely value of the other, for example high values of one variable being commonly observed along with high values of the other variable. Correlation can be positive or negative. It is negative when high values of one variable are observed with low values of the other. Correlation does not mean that there is a causal relationship between the variables. Example: When the weather is hotter, purchases of ice cream are higher. Temperature and ice cream sales are positively correlated. On the other hand, if purchases of hot beverages decrease when the weather is hotter, we say that temperature and hot beverage sales are negatively correlated.

The government has heard that carbon emissions could be responsible for climate change, and has asked you to investigate whether this is the case. To do so, we are now going to look at carbon emissions over time, and use another type of chart (scatter charts) to show their relationship with temperature anomalies. One way to measure the relationship between two variables is **correlation**. Walk-through 1.7 explains what correlation is and how to calculate it in Excel.

In the questions below, we will make charts using the CO_{2} data from the US National Oceanic and Atmospheric Administration. Download the Excel spreadsheet containing this data.

- The CO
_{2}data was recorded from one Observatory in Mauna Loa. Using this source as reference, explain whether or not you think this data is a reliable representation of the global atmosphere.

- The variables ‘trend’ and ‘interpolated’ are similar, but not identical. In your own words, explain the difference between these two measures of CO
_{2}levels. Why might there be seasonal variation in CO_{2}levels?

Now we will use a line chart to look for general patterns over time.

- Plot a line chart with interpolated and trend CO
_{2}levels on the vertical axis and time (starting from January 1960) on the horizontal axis. Label the axes and the chart legend, and give your chart an appropriate title. What does this chart suggest about the relationship between CO_{2}and time?

- correlation coefficient
- numerical measure of how closely associated two variables are and whether they tend to take similar or dissimilar values, ranging from a value of 1, indicating that the variables take similar values (positively correlated), to −1, indicating that the variables take dissimilar variables (negative or inverse correlation). A value of 1 or −1 indicates that knowing the value of one of the variables would allow you to perfectly predict the value of the other. A value of 0 indicates that knowing one of the variables provides no information about the value of the other.

We will now combine the CO_{2} data with the temperature data from Part 1.1, and then examine the relationship between these two variables visually, using scatterplots, and statistically, using the **correlation coefficient**.

- Choose one month and add the CO
_{2}trend data to the temperature dataset from Part 1.1, making sure that the data corresponds to the correct year.

- Make a scatterplot of CO
_{2}level on the vertical axis and temperature anomaly on the horizontal axis.

- Calculate and interpret the (Pearson) correlation coefficient between these two variables.

Walk-through 1.7Scatterplots and the correlation coefficient

- Choose two months and add the CO
_{2}trend data to the temperature dataset from Part 1.1, making sure that the data corresponds to the correct year.

- Create a separate chart for each month. What do your charts and the correlation coefficients suggest about the relationship between CO
_{2}levels and temperature anomalies?

- Discuss the shortcomings of using this coefficient to summarize the relationship between variables.

- causation
- A direction from cause to effect, establishing that a change in one variable produces a change in another. While a correlation gives an indication of whether two variables move together (either in the same or opposite directions), causation means that there is a mechanism that explains this association. Example: We know that higher levels of CO
_{2}in the atmosphere lead to a greenhouse effect, which warms the Earth’s surface. Therefore we can say that higher CO_{2}levels are the cause of higher surface temperatures. - spurious correlation
- A strong linear association between two variables that does not result from any direct relationship, but instead may be due to coincidence or to another unseen factor.

Even though two variables are strongly correlated with each other, it is not necessarily the case that one variable’s behaviour is the result of the other (a characteristic known as **causation**). The two variables could be spuriously correlated. The following example illustrates **spurious correlation**:

A child’s academic performance may be positively correlated with the number of rooms in their house or house size, but could we conclude that building an extra room would make a child smarter, or doing well at school would make your house bigger? It is more plausible that income or wealth, which determines the size of home that a family can afford and the resources available for studying, is the ‘unseen factor’ in this relationship. We could also determine whether income is the reason for this spurious correlation by comparing exam scores for children whose parents have similar income but different house sizes. If there is no correlation between exam scores and house size, then we can deduce that house size was not ‘causing’ exam scores (or vice versa).

- Consider the example of spurious correlation above.

- In your own words, explain spurious correlation and the difference between correlation and causation.

- Give an example of spurious correlation, similar to the one above, for either CO
_{2}levels or temperature anomalies.

- Choose an example of spurious correlation from this website. Explain whether you think it is a coincidence, or whether this correlation could be due to one or more other variables.

- Find some other examples of spurious correlations in the news, and briefly explain why they are spurious.

This part shows that summary statistics, such as the correlation coefficient, can help identify possible patterns or relationships between variables, but we cannot make conclusions about causation from them alone. It is also important to think about other explanations for what we see in the data, and whether we would expect there to be a relationship between the two variables.

- natural experiment
- An empirical study exploiting naturally occurring statistical controls in which researchers do not have the ability to assign participants to treatment and control groups, as is the case in conventional experiments. Instead, differences in law, policy, weather, or other events can offer the opportunity to analyse populations as if they had been part of an experiment. The validity of such studies depends on the premise that the assignment of subjects to the naturally occurring treatment and control groups can be plausibly argued to be random.

However, there are ways to determine whether there is a causal relationship between two variables, for example, by looking at the scientific processes that connect the variables (as with CO_{2} and temperature anomalies), or by using a **natural experiment**. To read more about how natural experiments help evaluate whether one variable causes another, see Section 1.5 of *Economy, Society, and Public Policy*. In Empirical Project 3, we will take a closer look at natural experiments and how we can use them to identify causal links between variables.

## Working in R

This section is under development and will be in the next release of *Doing Economics: Empirical Projects*