Empirical Project 7 Working in R

Getting started in R

For this project you will need the following packages:

If you need to install either of these packages, run the following code:

install.packages(c("readxl","tidyverse"))

You can import the libraries now, or when they are used in the R walk-through below.

library(readxl)  
library(tidyverse)

Part 7.1 Drawing supply and demand diagrams

logarithmic scale
A way of measuring a quantity based on the logarithm function, f(x) = log(x). The logarithm function converts a ratio to a difference: log (a/b) = log a – log b. This is very useful for working with growth rates. For instance, if national income doubles from 50 to 100 in a poor country and from 1,000 to 2,000 in a rich country, the absolute difference in the first case is 50 and in the second 1,000, but log(100) – log(50) = 0.693, and log(2,000) – log(1,000) = 0.693. The ratio in each case is 2 and log(2) = 0.693.

The data is in natural logs, rather than the ‘base 10 logarithms’ number scale you are used to working with (referred to as ‘base 10’ where relevant). Before plotting supply and demand curves we will first practise converting natural logarithms to numbers. In Part 7.2 we will discuss why it is useful to express relationships between variables (for example, price and quantity) in natural logs.

  1. To make charts that look like those in Figure 1 in the paper, you need to convert the relevant variables to their actual values.

R walk-through 7.1 Importing data into R and creating tables and charts

First we import the data with the read_excel function, using the na = "NA" option to indicate how missing data is recorded.

library(tidyverse)
## -- Attaching packages ------------------------------------------------------------------------------------------------------------ tidyverse 1.2.1 --
## v ggplot2 2.2.1     v purrr   0.2.4
## v tibble  1.4.2     v dplyr   0.7.4
## v tidyr   0.8.0     v stringr 1.3.0
## v readr   1.1.1     v forcats 0.3.0
## -- Conflicts --------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(readxl)
wm_data <- read_excel("SuitsData.xlsx", na = "NA")
str(wm_data)
## Classes 'tbl_df', 'tbl' and 'data.frame':    22 obs. of  10 variables:
##  $ Year   : num  1930 1931 1932 1933 1934 ...
##  $ log.q  : num  1.93 1.89 1.83 1.75 1.78 ...
##  $ log.h  : num  1.9 1.88 1.76 1.74 1.78 ...
##  $ log.p  : num  2.07 2 1.9 1.97 2.02 ...
##  $ log.pc : num  0.976 0.753 0.814 1.007 1.092 ...
##  $ log.pv : num  0.367 1.184 1.124 0.993 0.641 ...
##  $ log.w  : num  1.46 1.36 1.23 1.2 1.27 ...
##  $ log.n  : num  2.09 2.09 2.1 2.1 2.1 ...
##  $ log.y_n: num  2.78 2.71 2.59 2.56 2.61 ...
##  $ log.pf : num  1.1 1.11 1.13 1.15 1.14 ...

Let’s create p and q from their log counterparts. Remember that the variables are provided in the log to the base of 10 format, not the natural logarithm. We also transform the harvest variable log.h. The harvest will be at most as large as the crop (q), and therefore we convert these variables back as follows.

wm_data$p <- 10^wm_data$log.p # Price
wm_data$h <- 10^wm_data$log.h # Harvest quantity
wm_data$q <- 10^wm_data$log.q # Crop quantity

Let’s produce the chart for the prices.

plot(wm_data$Year,wm_data$p,type = "o",    # type: "p" = points, "l" = lines, "o" = points and lines
     xlab = "Year", ylab = "Price")  

Line chart for prices for watermelons.

Figure 7.2 Line chart for prices for watermelons.

Now we create the line chart for harvest and crop quantities.

plot(wm_data$Year,wm_data$q,type = "o",    # type: "p" = points, "l" = lines, "o" = points and lines
     pch = 1, lty = "dashed", xlab = "Year", ylab = "Price")  
# Add the harvest data
lines(wm_data$Year,wm_data$h, type = "o", pch = 16)
# Add a legend
legend(1947.5, 55, legend=c("Crop", "Harvest"),
       col=c("black", "black"),  pch = c(1,16), lty= c("dashed","solid"), cex=0.8)

Line chart for harvest and crop for watermelons.

Figure 7.3 Line chart for harvest and crop for watermelons.

Now we will plot supply and demand curves for a simplified version of the model given in the paper. We will assume that the supply curve is given by the following equation:

(Supply curve)

Technical note

Whenever log (or ln) is used in economics, it refers to natural logarithms. Since this equation shows the price in terms of quantity (instead of quantity in terms of price), it is technically referred to as the inverse supply curve. However, we will be using the terms ‘supply curve’ and ‘demand curve’ to refer to both the supply/demand curve and the inverse supply/demand curve.

We define P as the price of a single watermelon and Q as the quantity of watermelons (in thousands). Using the same notation, the following equation describes the demand curve:

(Demand curve)

To plot a curve, we need to generate a series of points (vertical axis values that correspond to particular horizontal axis values) and join them up. First we will work with the variables in natural log format, and then we will convert them to base 10 values so that our supply and demand curves will be in familiar units.

  1. Plot supply and demand curves:
Q Log Q Supply (log P) Demand (log P) Supply (P) Demand (P)
500
1,000
9,500
10,000

Calculating supply and demand.

Figure 7.4 Calculating supply and demand.

exogenous
Coming from outside the model rather than being produced by the workings of the model itself. See also: endogenous.

During the time period considered (1930–1951), the market for watermelons experienced a negative supply shock due to the Second World War. Supply was limited because production inputs (land and labour) were being used for the war effort. This shock shifted the entire supply curve because the cause (Second World War) was not part of the supply equation, but was external (also known as being exogenous. Before doing the next question, draw a supply and demand diagram to illustrate what you would expect to happen to price and quantity as a result of the shock (all other things being equal). To see how oil shocks in the 1970s caused by wars in the Middle East shifted the supply curve in the oil market, see Section 7.11 in Economy, Society, and Public Policy.

Now we will use equations to show the effects of a negative supply shock on your chart from Question 2. Suppose that the supply curve after the shock is:

  1. Add the new supply curve to your line chart and interpret the outcomes, as follows:

Consumer and producer surplus are explained in Sections 7.4 and 7.9 of Economy, Society, and Public Policy.

Part 7.2 Interpreting supply and demand curves

You may be wondering why it is useful to express relationships in natural log form rather than in base 10 form. In economics, we do this because there is a convenient interpretation of the coefficients: in the equation Y = a + bX, when both X and Y are in natural logs, then the coefficient represents the elasticity of Y with respect to X. That is, the coefficient is the estimated percentage change in Y for a 1 per cent change in X. To look at the concept of elasticity in more detail, see Section 7.8 of The Economy.

Supply curve:

Demand curve:

  1. Use the supply and demand equations from Part 7.1 which are shown here, and carry out the following:

Now we will use this information to take a closer look at the model of the watermelon market in the paper and interpret the equations.

The paper assumes that farmers decide how many watermelons to grow (supply) based on last season’s prices of watermelons and other crops they could grow instead, and the current political conditions that support or limit the amount grown. The reasoning for using last season’s prices is that watermelons take time to grow and are also perishable, so farmers cannot wait to see what prices will be in the next season before deciding how many watermelons to plant.

The supply equation for watermelons is shown below (this is equation (1) in the paper):

dummy variable (indicator variable)
A variable that takes the value 1 if a certain condition is met, and 0 otherwise.

Here, CP is a dummy variable that equals 1 if the government cotton-acreage-allotment program was in effect (1934–1951). This program was intended to prevent cotton prices from falling by limiting the supply of cotton, so farmers who reduced their cotton production were given government compensation according to the size of their reduction. WW2 is a dummy variable that equals 1 if the US was involved in the Second World War at the time (1943–1946).

You can read more about the government farm programs for cotton during this time period on pages 67–69 of the report ‘The cotton industry in the United States’.

endogenous
Produced by the workings of a model rather than coming from outside the model. See also: exogenous

The dummy variables are used to measure the effects of these exogenous shocks that affected the decisions of farmers growing watermelons. Looking at the supply curves in Figure 7.5, you can see that the dummy variables shift the entire curve by changing the intercept of the supply curve, whereas changes in variables that are part of the model (endogenous variables) would entail a movement along the supply curve.

Supply curve: Dummy variables shift the entire curve (left-hand panel) while changes in endogenous variables move along the curve (right-hand panel).

Figure 7.5 Supply curve: Dummy variables shift the entire curve (left-hand panel) while changes in endogenous variables move along the curve (right-hand panel).

  1. For each variable in the supply equation, give an economic interpretation of the coefficient (for example, explain the effect on the farmers’ supply decision) and (where relevant) relate the coefficient to an elasticity. Also, with reference to the 95% confidence intervals in Figure 7.6 below, assess the statistical significance of the coefficient. For coefficients on price, assess whether supply is likely to be price elastic (absolute value of coefficient greater than 1) or not.
Variable Coefficient 95% confidence interval
P (price of watermelons) 0.580 [0.572, 0.586]
C (price of cotton) –0.321 [–0.328, –0.314]
T (price of vegetables) –0.124 [–0.126, –0.122]
CP (cotton program) 0.073 [0.068, 0.077]
WW2 (Second World War) –0.360 [–0.365, –0.355]

Supply equation coefficients and 95% confidence intervals.

Figure 7.6 Supply equation coefficients and 95% confidence intervals.

Now we will look at the demand curve (equation (3) in the paper). The paper specified per capita demand () in terms of price and other variables. () is the demand curve intercept:

  1. Using the demand equation and Figure 7.7 below, give an economic interpretation of each coefficient and (where relevant) relate the coefficient to an elasticity. Assess the statistical significance of each coefficient. For coefficients that can be interpreted as elasticities, assess whether the absolute value of the elasticity is significantly different from 1 or not.
Variable Coefficient 95% confidence interval
P (price of watermelons) –1.125 [–1.738, –0.512]
Y/N (per capita income) 1.750 [0.778, 2.722]
F (railway freight costs) –0.968 [–1.674, –0.262]

Demand equation coefficients and 95% confidence intervals.

Figure 7.7 Demand equation coefficients and 95% confidence intervals.

Earlier, we mentioned that exogenous supply/demand shocks shift the entire supply/demand curve, whereas endogenous changes (such as changes in price) result in movements along the supply or demand curve. Exogenous shocks that only shift supply or only shift demand come in handy when we try to estimate the shape of the supply and demand curves. Read the information on simultaneity below to understand why exogenous shocks are important to identify the supply and demand curves.

The simultaneity problem Why we need exogenous shocks that shift only supply or demand

simultaneity
When the right-hand and left-hand variables in a model equation affect each other at the same time, so that the direction of causality runs both ways. For example, in supply and demand models, the market price affects the quantity supplied and demanded, but quantity supplied and demanded can in turn affect the market price.

In the model of supply and demand, the price and quantity we observe in the data are jointly determined by the supply and demand equations, meaning that they are chosen simultaneously. In other words, the market price affects the quantity supplied and demanded, but the quantity supplied and demanded can in turn affect the market price. In economics we refer to this problem as simultaneity. We cannot estimate the supply and demand curves with price and quantity data alone, because the right-hand-side variable is not independent, but is instead dependent on the left-hand-side variable.

Another way to think about this issue is that for every period, the price and quantity we observe is a result of both shifts and/or movements along the supply and demand curves, and we cannot disentangle these shifts or movements of the supply and demand curves without additional information. Figure 7.8 illustrates that there can be many different supply and demand curve shifts to explain the same data.

Many possible supply and demand curves can explain the data.

Figure 7.8 Many possible supply and demand curves can explain the data.

To address this issue, we need to find an exogenous variable that affects one variable but not the other. That way we can be sure that what we observe is due to a shift in one curve, holding the other curve fixed. In the watermelon market, we used the Second World War as an exogenous supply shock in Part 7.1. The war affected the amount of farmland dedicated to producing watermelons, but arguably did not affect demand for watermelons.

Figure 7.9 shows how we can use the exogenous supply shock to learn about the demand curve. The solid line shows the part of the demand curve revealed by the supply shock. Under the assumption that the demand curve is a straight line, we can infer what the rest of the curve looks like. If we had more information, for example if the size of the shock varied in each period, then we could use this information to learn more about the shape of the demand curve (for example, check whether it is actually linear). We use similar reasoning (exogenous demand shocks) to identify the supply curve.

Using exogenous supply shocks to identify the demand curve.

Figure 7.9 Using exogenous supply shocks to identify the demand curve.

  1. Given the supply and demand equations in the watermelon model, give two examples of an exogenous demand shock and explain why they are exogenous.