Extra Empirical Project: Female Labour Supply and the Macroeconomy Working in R

These code downloads have been constructed as supplements to the full Doing Economics projects. You’ll need to download the data before running the code that follows.

Download the code

To download the code chunks used in this project, right-click on the download link and select ‘Save Link As…’. You’ll need to save the code download to your working directory, and open it in RStudio.

Don’t forget to also download the data into your working directory by following the steps in this project.

Getting started in R

For this project, you will need the following packages:

  • tidyverse, to help with data manipulation
  • mFilter, to apply the Hodrick–Prescott (HP) filter

You will also use the ggplot2 package to produce accurate charts, but that comes as part of the tidyverse package.

If you need to install any of these packages, run the following code:

install.packages(c("tidyverse","mFilter"), repos="https://www.stats.bris.ac.uk/R/")

You can import these libraries now, or when they are used in the R walk-throughs below.

library(tidyverse)
library(mFilter)

Part 1: Collecting and preparing the data

Learning objectives for this part

  • download and clean customized excerpts from a database
  • merge different datasets
  • estimate recession dates from GDP data.

In the questions below, we will collect and merge various datasets. We will combine labour force statistics (hours worked and LFP) with macroeconomic statistics (GDP and productivity), so that we can assess the relationship between those variables. The goal is to end up with a single dataset containing all the necessary data for the analysis in Part 2.

Before we start, create a subfolder called ‘data’ in your working directory. We will store all datasets in that subfolder. That means that your working directory should contain the R markdown file with the code chunks and the (so far empty) ‘data’ subfolder.

  1. Download annual GDP data for the US and one other country of your choice from the World Bank statistical database:
  • By clicking the link, you have already selected the correct series, ‘GDP (constant 2015 US$)’. We want to keep prices constant, so that GDP is in real terms (taking out any variation due to changes in the price level). We do not need to worry about fixing exchange rates and purchasing power parity (PPP) here, because we will not compare GDP between countries.
  • On the left under ‘Country’, first unselect all countries (this can be done in one click on the ‘X’ button above the country list), and then select the United States and your other country.
  • Under ‘Time’, select all years by clicking on the checkbox icon (to the left of the ‘Unselect all’ button).
  • Then click on ‘Apply Changes’ in the centre of the screen.
  • Download the data as a CSV file by clicking on ‘Download options’ in the top right (above the data table) and then click on ‘Advanced options’.
  • Select ‘CSV’ in the header (the default is ‘Excel’) and change the ‘Data format’ from ‘Table’ to ‘List’.
  • Click ‘Download’, which will download a ZIP file. Extract the file and delete the CSV file ending in ‘Metadata’. Rename the CSV file ending in ‘Data’ as ‘GDP_annual’ and save it in your ‘data’ subfolder.
  1. Download the following data for the US and the country you chose in Question 1 from the OECD statistical database:
  1. Labour force participation (LFP) rates by sex: You can find the LFP rates under ‘Labour > Labour Force Statistics > LFS by sex and age - LFS by sex and age - indicators’, then click on ‘LFS by sex and age indicators’. You can also find the data series directly by searching for ‘LFS by sex and age indicators’ in the search bar on the top left.
    • Click on the ‘Country’ header in the first column, which should open the customization pop-up.
    • Unselect all countries (can be done in one click via ‘unselect all’ in the top-right corner) and select the United States and one country of your choice.
    • Click on ‘Time & Frequency’ and select from 1960 (leave the upper limit at ‘latest available data’).
    • Click on ‘Sex’ and select all three options.
    • Click on ‘Age’ and unselect all except ‘Total’.
    • Click on ‘Series’ and unselect all except ‘Labour force participation rate’.
    • Click ‘View Data’. The table should now only show the data you selected. To download the data, click on ‘Export > Text file (CSV) > Default format > Download’. After the file has been downloaded, rename it as ‘LFP_rates’, and move it to the ‘data’ subfolder in your working directory.

You can learn more about how GDP is measured and made comparable across countries and time in Section 1.2 of The Economy 2.0: Microeconomics.

  1. Average labour productivity (ALP): One way that the OECD measures labour productivity is by GDP per hour worked, which you can find under ‘Productivity > Productivity and ULC - Annual, Total Economy > Level of GDP per capita and productivity’, then click on ‘Level of GDP per capita and productivity’.
    • Download the data for the US and your other country, from 1970 to the latest available year, selecting ‘GDP per hour worked’ under ‘Subject’.
    • Under ‘Measure’, select only ‘USD, constant prices, 2015 PPPs’. This measure makes sure the productivity (GDP per hour worked) measures are comparable, regardless of country-dependent exchange and inflation rates.
    • Download the data as a CSV file in the same way as before and save it as ‘labour_productivities’ in your ‘data’ subfolder.
  1. Average hours worked per week by sex: The OECD only has hours worked by sex for the main job, so keep in mind that this data disregards hours worked outside of the main job. Also, as these are average hours per worker, they only capture the intensive margin of labour supply. You can find the data series under ‘Labour > Labour Force Statistics > Hours worked > Average usual weekly hours worked on the main job’.

    Do the following customizations for the US and your chosen country:
    • Select data from 1979 to the latest available year.
    • Select all three options for ‘Sex’.
    • Select only ‘Total’ for ‘Age’.
    • For ‘Employment’, select only ‘Dependent employment’. (The data for the US only covers those in dependent employment, so to keep things comparable we will restrict the data to them.)
    • Under ‘Job type’, select only ‘Total declared employment’ (includes both full and part-time workers).
    • Download the data as a CSV file and save it as ‘hours_worked’ in your ‘data’ subfolder.
  1. Quarterly GDP: This data can be found under ‘National Accounts > Quarterly National Accounts > Quarterly National Accounts’, then click on ‘Quarterly National Accounts’ (the first table).
    • Select your two countries, starting from 1947Q1 (you might have to switch from ‘select latest data’ to ‘select date range’ first).
    • Under ‘Subject’, select only ‘Gross domestic product - expenditure approach’.
    • Under ‘Measure’, select only ‘VPVOBARSA: US dollars, volume estimates, fixed PPPs, OECD reference year, annual levels, seasonally adjusted’ (you can find it under ‘VOL-VOLUMES’).
    • Download the data as a CSV file and save it as ‘GDP_quarterly’ in your ‘data’ subfolder.

Find out more Going from micro to macro data

Most macroeconomic researchers nowadays use micro data (for example, at the household or firm level) to construct aggregate variables like the ones you have downloaded from the OECD database. Albanesi, for example, obtains her data series from the Current Population Survey (CPS), a publicly available household survey from the US. You have already seen some limitations of using aggregate data like the OECD’s: you can only use the series they have made available. The OECD does not provide data on hours worked by sex outside of the main job, nor on labour productivity by sex. However, these could easily be computed from micro data. For simplicity, we will not be working with micro data in this Doing Economics project. However, the OECD’s data is based on country-specific micro datasets which have been harmonized and aggregated by the OECD, so by using their data we are indirectly working with micro data as well.

If you are interested in how the NBER defines and dates recessions and expansions, you can read their FAQs.

dummy variable (indicator variable)
A variable that takes the value 1 if a certain condition is met, and 0 otherwise.
  1. We want to link the variables obtained in Question 1 with data on business cycles, for which we will use the US National Bureau of Economic Research (NBER)-based US recession dates from FRED. (Clicking on the link will download the data as a CSV file called ‘USRECQ.csv’; move it into the ‘data’ subfolder in your working directory.) This file contains a quarterly dummy variable (that is, 1 for recession periods and 0 otherwise), which we will use to distinguish between expansion and recession periods in the analysis.
  1. Import the ‘USRECQ.csv’ file as ‘US_recessions’.
  1. We want to create an annual recession dummy because the other data is at the annual level (and we want to have consistent date variables when combining the various datasets). To do so, first create a ‘Year’ variable (of numeric type, not character) based on the ‘DATE’ variable (each year will be repeated four times, as the data is quarterly). Then create an annual dummy variable ‘Recession’ that takes the value 1 for all observations within a year if one or more quarters of that year featured a recession (‘USRECQ’ = 1). (For help, see R walk-through 1 on importing the recession data and creating a recession variable.)
  1. Remove all variables except ‘Year’ and ‘Recession’. Remove duplicate values in ‘Year’ (as we still have every year four times due to the initial data being on the quarterly level). Create a ‘Country’ variable that takes the value ‘United States’ for all observations.

R walk-through 1 Importing a .csv data file into R and doing basic data cleaning

We want to import the data files we have just downloaded into R, starting with the NBER US recession dates (‘USRECQ.csv’).

We start by setting our working directory using the setwd command. This command tells R where your codes and data files are stored. In the code below, replace ‘YOURFILEPATH’ with the full filepath that indicates the folder in which you have saved the code chunks file (not the ‘data’ subfolder, but its parent folder). Note that you have to use forward slashes (‘/’) rather than backslashes (‘\’). If you don’t know how to find the path to your working folder, see the Technical Reference section.

setwd('YOURFILEPATH')

You should now see the contents of your working directory in the bottom-right panel of RStudio (click the ‘Files’ tab). It should contain the ‘data’ subfolder and this ‘.rmd’ file.

Since our data is in CSV format, we use the read.csv function to import the data into R, specifying that we want R to look for the file in the ‘data’ subfolder. We will call our file ‘US_recessions’.

US_recessions <- read.csv("data/USRECQ.csv") 

To check that the data has been imported correctly, you can use the head function to view the first six rows of the dataset, and confirm that they correspond to the columns in the CSV file.

head(US_recessions)
##         DATE USRECQ
## 1 1854-10-01      1
## 2 1855-01-01      0
## 3 1855-04-01      0
## 4 1855-07-01      0
## 5 1855-10-01      0
## 6 1856-01-01      0

Before working with the data, we use the str function to check that the data is formatted correctly.

str(US_recessions)
## 'data.frame':    673 obs. of  2 variables:
##  $ DATE  : chr  "1854-10-01" "1855-01-01" "1855-04-01" "1855-07-01" ...
##  $ USRECQ: int  1 0 0 0 0 0 0 0 0 0 ...

You can see that the variable USRECQ containing the quarterly recession dummy is an integer (which is correct as it is a binary dummy variable), while the DATE variable containing the quarterly dates is formatted as character (chr).

We want to create an annual recession dummy because the other data is at the annual level (and we want to have consistent date variables when combining the various datasets). The first step is to create a numeric Year variable based on DATE, which can be done via the substr function (extracting the first four characters) inside as.numeric to convert the extracted character string to numeric (num) type.

We then create an annual dummy variable called Recession that takes the value 1 for all observations within a year if one or more quarters of that year featured a recession (USRECQ = 1).

US_recessions <- US_recessions %>% 
  mutate(Year = as.numeric(substr(DATE,1,4))) %>% 
  group_by(Year) %>% 
  mutate(Recession = case_when(any(USRECQ == 1) ~ 1,
                               TRUE ~ 0))

Piping is a very useful technique in R for data analysis. If we have to perform a sequence of commands on the same object, then we can simplify the code by writing the sequence as a single command. We use the punctuation %>% to link the commands together.

In this code above, we 1) created the numeric Year variable based on DATE, 2) grouped the data by Year, 3) created the annual Recession dummy using the any function within case_when, which sets the dummy to 1 if USRECQ = 1 for any of the quarters within a year. Otherwise (indicated by TRUE), we set the dummy to 0.

Next, we want to remove all variables from US_recessions except Year and Recession. Using the distinct function, this can be done in one step together with removing the duplicate observations (as we still have every year four times due to the initial data being at the quarterly level).

We also want to create a new variable called Country indicating that this data is for the US (which will be helpful when merging it with the other datasets later).

US_recessions <- US_recessions %>% 
  distinct(Year, Recession) %>% 
  mutate(Country ="United States")

Recession timing data is not available for other (non-US) countries, so we have to create our own based on the NBER’s definition of a recession (for comparability). The NBER defines a recession as ‘a period when output is declining. It is over once the economy begins to grow again.’

‘A period’ of declining output is usually interpreted as at least two consecutive quarters of economic contraction (negative GDP growth).

  1. In all the steps below, replace ‘YOURCOUNTRYNAME’ with your chosen country’s name or a suitable abbreviation.
  1. Import the quarterly GDP dataset ‘GDP_quarterly.csv’ as ‘GDP_quarterly’. Delete all observations for the US (we will be using the NBER’s recession dummy for the US). Compute quarterly GDP growth rates in percentage points for your non-US country only (for example, a value of 2 if GDP grew from $100 in one quarter to $102 in the next).
  1. Code a recession dummy named ‘Recession_YOURCOUNTRYNAME_quarterly’ that is 1 during recession periods (2 or more consecutive quarters of negative GDP growth) and 0 otherwise. For all such periods, set the dummy to 1 in all quarters (starting from the first quarter of negative GDP growth). A recession is over (the dummy takes the value of 0) as soon as GDP growth turns positive again. For periods of only one quarter of negative GDP growth, the dummy should be set to 0 as well.
  1. As for the US recession data above, create a numeric ‘Year’ based on the ‘TIME’ variable (each year will be repeated four times as the data is quarterly). Then create an annual dummy variable ‘Recession_YOURCOUNTRYNAME’ that takes the value 1 for all observations within a year if one or more quarters of that year featured a recession (‘Recession_YOURCOUNTRYNAME_quarterly’ = 1). Remove duplicate values in ‘Year’ (as we still have every year listed four times due to the initial data being on the quarterly level). (For help, see R walk-through 2 on recession dating based on quarterly GDP data.)
  1. Keep only the ‘Country’, ‘Year’, and ‘Recession_YOURCOUNTRYNAME’ variables (remove all others). Rename the ‘GDP quarterly’ dataset as ‘YOURCOUNTRYNAME_recessions’.

R walk-through 2 Recession dating based on quarterly GDP data

We start by importing the quarterly GDP dataset and calling our file ‘GDP_quarterly’.

GDP_quarterly <- read.csv("data/GDP_quarterly.csv")

The quarterly GDP values are stored in the Value column. We want to obtain a recession dummy for the non-US country, similar to the NBER one we downloaded for the US. In this example, we have chosen Germany in addition to the US.

To compute the growth rate of GDP, we compute the percentage change from one quarter to the next. We only want to do this for the non-US country (Germany), so we filter for the German values only. We can use the lag function to take the previous row’s (that is, quarter’s) value.

GDP_quarterly <- GDP_quarterly %>% 
  filter(Country == "Germany") %>% 
  mutate(pct_change = (Value/lag(Value) - 1) * 100)

The NBER defines a recession as at least two consecutive quarters of negative GDP growth. We want to define a recession dummy variable that takes the value of 1 in all periods of all such instances, including the first quarter of negative growth. In all other periods (positive GDP growth or only one quarter of negative GDP growth), the dummy should be set to 0.

GDP_quarterly <- GDP_quarterly %>% 
  mutate(Recession_Germany_quarterly = case_when(pct_change < 0 & (lag(pct_change) < 0 | lead(pct_change) < 0) ~ 1,
                                       TRUE ~ 0))

We used the lag and lead functions to check whether either the preceding or the following value (quarter) features a negative GDP growth. If a period has negative GDP growth (pct_change < 0) and either the lagging or leading period does as well, we set the recession dummy to 1 using the case_when function; otherwise (indicated by TRUE), we set the dummy to 0.

The TIME variable contains dates as character strings (chr) in the format YYYY-QQ (year-quarter). As for the US above, we define a numeric Year variable, as we need the German recession period data on the annual level. We then convert the quarterly recession dummy Recession_Germany_quarterly to annual frequency in the same way as for the US (an annual dummy that takes the value 1 for all observations within a year if one or more quarters of that year featured a recession). The annual recession dummy is a new variable called Recession_Germany.

We then keep only the Country, Year, and Recession_Germany variables and drop any duplicate values (as we have four observations per year) in one step by using the distinct function. We can also rename the dataset GDP_quarterly to Germany_recessions in this step by assigning the dataset in the first line (using the <- operator).

Germany_recessions <- GDP_quarterly %>%
  mutate(Year = as.numeric(substr(TIME,1,4))) %>%
  group_by(Year) %>%
  mutate(Recession_Germany = case_when(any(Recession_Germany_quarterly == 1) ~ 1,
                               TRUE ~ 0)) %>% 
  distinct(Country,Year,Recession_Germany)

To keep the workspace clean, we remove the GDP_quarterly dataset, which is no longer needed.

rm(GDP_quarterly)
  1. Clean the four remaining datasets (‘GDP_annual’, ‘hours_worked’, ‘labour_productivities’, and ‘LFP_rates’), following the steps described below.
  1. Import ‘GDP_annual’ and ‘labour_productivities’ into R, keeping their filenames.
  1. In ‘GDP_annual’, convert the ‘Value’ variable to numeric (ignore the warning message). Remove all observations with missing values for ‘Value’. Rename the ‘Value’ variable as ‘GDP’, the ‘Country.Name’ variable as ‘Country’, and the ‘Time’ variable as ‘Year’. Remove all columns except ‘Country’, ‘Year’, and ‘GDP’.
  1. In ‘labour productivities’, rename the ‘Time’ variable as ‘Year’ and the ‘Value’ variable as ‘labour_productivity’. Remove all columns except ‘Country’, ‘Year’, and ‘labour_productivity’.
  1. Import ‘hours_worked’ and ‘LFP_rates’ into R, keeping their filenames as is. In each dataset, reshape the variables that are currently in long format to wide format. For example, in ‘hours worked’, the hours worked by all persons, females, and males should be their own variables (columns), rather than sharing the ‘SEX’, ‘Sex’, and ‘Value’ columns. Remove the ‘Sex’ column in both datasets before reshaping. (For help, see R walk-through 3 on reshaping the data.)
  1. After reshaping ‘hours_worked’ and ‘LFP_rates’, name the new variables according to the name of the data series (for example, to ‘hours_worked_all’, ‘hours_worked_female’, or ‘LFP_rate_female’). Rename the ‘Time’ variable as ‘Year’. Remove all columns except ‘Country’, ‘Year’, and the newly created variables.

R walk-through 3 Reshaping data and further data cleaning

Two of our datasets (hours_worked and LFP_rates) are currently in long format (the single variable Value contains values for multiple economic variables) and need to be reshaped (reformatted) to wide format, where each economic variable has its own column.

For example in hours_worked, the hours worked by all persons, females, and males should be their own variables (columns), rather than sharing the SEX, Sex, and Value columns.

We first clean the GDP_annual and labour_productivities datasets, as these are already in wide format and do not need to be reshaped. We import them into R.

GDP_annual <- read.csv("data/GDP_annual.csv")
labour_productivities <- read.csv("data/labour_productivities.csv")

In GDP_annual, convert the Value variable to numeric. Remove all observations with missing values for Value. Rename the Value variable to GDP, the Country.Name variable to Country, and the Time variable to Year. Remove all columns except Country, Year, and GDP.

GDP_annual <- GDP_annual %>% 
  mutate(Value = as.numeric(Value)) %>% 
  filter(!is.na(Value)) %>% 
  rename(GDP = Value,
         Country = Country.Name,
         Year = Time) %>% 
  select(Country,Year,GDP)

In labour_productivities, rename the Time variable to Year and the Value variable to labour_productivity and remove all columns except Country, Year, and labour_productivity.

labour_productivities <- labour_productivities %>% 
  select(Country,Time,Value) %>% 
  rename(Year = Time,
         labour_productivity = Value)

We now import the two datasets (hours_worked and LFP_rates) in long format. These need to be reshaped into wide format.

hours_worked <- read.csv("data/hours_worked.csv")
LFP_rates <- read.csv("data/LFP_rates.csv")

We can reshape the datasets with the spread function. The key input takes the variable that will be used to name the newly created variables (in our case, the Sex column) while the value input contains the values that will be reshaped (Value in our data). Before we reshape, we need to remove the SEX column which also differs between All persons, Women, and Men (as spread can only take one key variable, which is Subject). This can be done using a minus sign inside the select function.

hours_worked <- hours_worked %>% 
  select(-SEX) %>% 
  spread(key = Sex, value = Value)

LFP_rates <- LFP_rates %>% 
  select(-SEX) %>% 
  spread(key = Sex, value = Value)

The datasets are now in wide format. However, as spread takes the (character string) values in the key variable as variable names for the newly generated variables, the new All persons variables have spaces in their variable names. While R allows variable names to have spaces, it is best practice to avoid this. Moreover, the new women and men variables are also not very descriptive in terms of what economic series those variables correspond to.

Therefore, we rename the reshaped variables to the name of the data series (for example, to hours_worked_female). Note that we have to put the ‘old’ variable names containing spaces (All persons) inside quotation marks when we rename them. We also rename the Time variable (containing the years as numeric values) to Year, consistent with the other datasets.

Finally, we remove all columns except Country, Year, and the newly created (and renamed) variables.

hours_worked <- hours_worked %>% 
  rename(hours_worked_all = `All persons`,
         hours_worked_female = Women,
         hours_worked_male = Men,
         Year = Time) %>% 
  select(Country,Year,hours_worked_all,hours_worked_female,hours_worked_male)

LFP_rates <- LFP_rates %>% 
  rename(LFP_rate_all = `All persons`,
         LFP_rate_female = Women,
         LFP_rate_male = Men,
         Year = Time) %>% 
  select(Country,Year,LFP_rate_all,LFP_rate_female,LFP_rate_male)
  1. Merge all six datasets by their ‘Country’ and ‘Year’ variables. The combined dataset should have 11 variables. (For help, see R walk-through 4 on merging datasets.)
  1. Replace the (missing) values of the ‘Recession’ variable (containing the US recession dummy) with the values of the ‘Recession_YOURCOUNTRYNAME’ variable. Then delete the ‘Recession_YOURCOUNTRYNAME’ variable and sort the data by ‘Country’ and ‘Year’ (in that order).
  1. Delete all observations before 1960 (for both countries), as the OECD data is only available from 1960 onwards (or later), while the US recession period data goes back to 1854.
  1. Delete all other datasets, leaving only ‘dataset_main’ in your workspace. You will work with this dataset in Part 2.

R walk-through 4 Merging multiple datasets into a single dataset

To merge (combine) all six datasets into one, we first combine them in a list and then use the reduce(full_join) function, which automatically merges the datasets by the Country and Year variables. We save the new (merged) dataset as dataset_main.

dataset_main <- 
  list(GDP_annual,labour_productivities,hours_worked,LFP_rates,US_recessions,Germany_recessions) %>% 
  reduce(full_join)

We replace the (missing) values of the Recession variable (containing the US recession dummy) with the values of the Recession_Germany variable, using the coalesce function which takes the first non-missing element of its two inputs. Then we delete the Recession_Germany variable. We also sort the data by Country and Time (in that order). We delete all pre-1960 data, as the OECD data series are not available before then.

dataset_main <- dataset_main %>% 
  mutate(Recession = coalesce(Recession,Recession_Germany)) %>% 
  select(-Recession_Germany) %>% 
  arrange(Country,Year) %>% 
  filter(Year >= 1960)

Finally, we delete all datasets except dataset_main.

rm(GDP_annual,labour_productivities,hours_worked,LFP_rates,US_recessions,Germany_recessions)

Learning objectives for this part

  • use the Hodrick–Prescott (HP) filter to separate trend and cyclical components
  • compute business cycle properties of macroeconomic variables
  • interpret macroeconomic (aggregate) data
  • relate changes on the household level to macroeconomic phenomena.

To learn more about the great moderation, read Section 17.4 of The Economy 1.0.

jobless recovery
A macroeconomic phenomenon in which employment grows slowly (or not at all) during a post-recession economic recovery, thereby making the recovery ‘jobless’. One example is from the US, where recoveries have been jobless from the 1990s onwards.
great moderation
A period of low volatility in aggregate output in advanced economies between the 1980s and the 2008 financial crisis. The name was suggested by James Stock and Mark Watson, the economists, and popularized by Ben Bernanke, then chairman of the Federal Reserve.

In this part, we will study three puzzling macroeconomic phenomena in the US:

  • the great moderation, which describes the reduction in business cycle volatility of output and hours worked
  • the productivity slowdown in the 1970s and 1980s, during which the growth rate of average labour productivity was substantially lower than in previous decades
  • jobless recovery, which describes an economic recovery during which employment grows slowly (or not at all); this phenomenon contrasts with post-recession economic recoveries before the 1990s, when employment used to recover quickly.

Even better, we will show these phenomena empirically using the dataset obtained in Part 1. You will also be using the dataset to check whether these puzzles can also be found outside of the US, by contrasting the US data with data from a country of your choice. We will then explore whether these macro puzzles can be explained by changes at the micro (household) level happening at the same time.

  1. Start by plotting female and male labour force participation over time:
  1. Separately for each country, create a line chart of the labour force participation rate by sex (one line for females, one for males) over time. Add a legend to the chart. The horizontal axis labels should be years in 10-year intervals (1960, 1970, …, 2020).
  1. Add grey vertical shaded bars for recession periods to both countries, using the NBER recession dummy for the US and the one you computed yourself for the other country. Note that if the recession data for your other country starts in an earlier year than the LFP rate data, you must plot the recession period before the two line charts within the ‘ggplot’ function. (For help, see R walk-through 5 on creating line charts with shaded vertical bars.) Your chart for the US should be similar to Figure 1.
  1. How do both labour force participation rates evolve over time? How do these rates compare between both countries?

R walk-through 5 Creating line charts with ggplot and adding shaded vertical bars

We now want to plot the female and male labour force participation (LFP) rates over time together in a line chart (one chart per country), including a legend.

We use the ggplot function for this, which is part of the tidyverse package we loaded earlier. ggplot is one the most popular ways to plot data in R and comes with a lot of different chart types and customisation options.

We start with the US:

dataset_main %>%
  filter(Country == "United States") %>%
  ggplot(aes(Year)) +
  geom_line(aes(y = LFP_rate_female, colour = "LFP_rate_female")) +
  geom_line(aes(y = LFP_rate_male, colour = "LFP_rate_male")) +
  theme_bw() +
	# Add the vertical axis label
  labs(y = "Percentage") +  
  # Scale horizontal axis tick marks in 10-year intervals
  scale_x_continuous(breaks = seq(1960, 2020, by = 10)) +
  # Add the title and legend labels
  ggtitle("Labour force participation rate by sex (United States)") +
  scale_colour_discrete(name = "", 
    labels = c("Female", "Male")) + 
  # Move the legend below the chart
  theme(legend.position="bottom") +
  # Remove the legend title
  theme(legend.title=element_blank())
Labour force participation rate by sex (United States)
Fullscreen

Walk-through figure 1 Labour force participation rate by sex (United States)

Next, we plot the same chart for Germany:

dataset_main %>%
  filter(Country == "Germany") %>%
  ggplot(aes(Year)) +
  geom_line(aes(y = LFP_rate_female, colour = "LFP_rate_female")) +
  geom_line(aes(y = LFP_rate_male, colour = "LFP_rate_male")) +
  theme_bw() +
	# Add the vertical axis label
  labs(y = "Percentage") +  
  # Scale horizontal axis tick marks in 10-year intervals
  scale_x_continuous(breaks = seq(1960, 2020, by = 10)) +
  # Add the title and legend labels
  ggtitle("Labour force participation rate by sex (Germany)") +
  scale_colour_discrete(name = "", 
    labels = c("Female", "Male")) + 
  # Move the legend below the chart
  theme(legend.position="bottom") +
  # Remove the legend title
  theme(legend.title=element_blank())
Labour force participation rate by sex (Germany)
Fullscreen

Walk-through figure 2 Labour force participation rate by sex (Germany)

We now add the shaded recession areas, using the Recession dummy, to produce the chart shown in the instructions. We use the geom_rect function to add the shaded areas for the recession periods, and the subset function to only plot the recession areas (where the recession dummy Recession ==1) for the US (Country == "United States").

dataset_main %>%
  filter(Country == "United States") %>%
  mutate(ymax = 90, ymin = 0) %>%
  ggplot(aes(Year)) +
  geom_line(aes(y = LFP_rate_female, colour = "LFP_rate_female")) +
  geom_line(aes(y = LFP_rate_male, colour = "LFP_rate_male")) +
  theme_bw() +
  # Add the vertical axis label
  labs(y = "Percentage") +  
  # Scale horizontal axis tick marks in 10-year intervals
  scale_x_continuous(breaks = seq(1960, 2020, by = 10)) +
  # Add the title and legend labels
  ggtitle("Labour force participation rate by sex (United States)") +
  scale_colour_discrete(name = "", 
    labels = c("Female", "Male")) + 
  # Move the legend below the chart
  theme(legend.position="bottom") +
  # Remove the legend title
  theme(legend.title=element_blank()) +
  # Add recession periods as shaded areas
  geom_rect(data = subset(dataset_main, Recession == 1 & Country == "United States"), 
            aes(ymin = -Inf, ymax = Inf, xmin = Year-0.5, xmax = Year+0.5), 
            alpha = 0.2)
Labour force participation rate by sex (United States)
Fullscreen

Walk-through figure 3 Labour force participation rate by sex (United States)

We also plot the chart with recession areas for Germany.

dataset_main %>%
  filter(Country == "Germany") %>%
  ggplot(aes(Year)) +
  geom_rect(data = subset(dataset_main, Recession == 1 & Country == "Germany"), 
            aes(ymin = -Inf, ymax = Inf, xmin = Year-0.5, xmax = Year+0.5), 
            alpha = 0.2) +
  geom_line(aes(y = LFP_rate_female, colour = "LFP_rate_female")) +
  geom_line(aes(y = LFP_rate_male, colour = "LFP_rate_male")) +
  theme_bw() +
	# Add the vertical axis label
  labs(y = "Percentage") +  
  # Scale horizontal axis tick marks in 10-year intervals
  scale_x_continuous(breaks = seq(1960, 2020, by = 10)) +
  # Add the title and legend labels
  ggtitle("Labour force participation rate by sex (Germany)") +
  scale_colour_discrete(name = "", 
    labels = c("Female", "Male")) + 
  # Move the legend below the chart
  theme(legend.position="bottom") +
  # Remove the legend title
  theme(legend.title=element_blank())
  # Add recession periods as shaded areas
  geom_rect(data = subset(dataset_main, Recession == 1 & Country == "Germany"), 
            aes(ymin = -Inf, ymax = Inf, xmin = Year-0.5, xmax = Year+0.5), 
            alpha = 0.2)
Labour force participation rate by sex (Germany)
Fullscreen

Walk-through figure 4 Labour force participation rate by sex (Germany)

Labour force participation rates in the US, by sex (1960-2021). Shaded bars indicate recession periods.
Fullscreen

Figure 1 Labour force participation rates in the US, by sex (1960-2021). Shaded bars indicate recession periods.

The first macro puzzle we will explore is the great moderation, which describes the reduction in volatility over the business cycles of several aggregate variables, such as GDP and employment. We will measure employment by average hours worked per employed person (intensive margin only) in this question.

trend component
The long-run growth component of a macroeconomic time series. Commonly obtained using the Hodrick–Prescott (HP) filter.
cyclical component
The short-run (business cycle) fluctuations around the long-run trend component of a macroeconomic time series. Commonly obtained using the Hodrick–Prescott (HP) filter.
Hodrick–Prescott (HP) filter
A mathematical tool used by macroeconomists to estimate the cyclical and trend components of time series data. Its main purpose is to fit a smooth curve (the trend) through the time series, where the trend reacts more to long-term fluctuations than to short-term fluctuations (the latter will mostly affect the cyclical component). The HP filter uses a parameter λ (‘lambda’) to dictate how sensitive this trend is to short-term fluctuations. This lambda needs to be chosen depending on the frequency of the data; popular values are λ = 6.25 for annual and λ = 16,000 for quarterly data.

We will start by showing the great moderation in the data. To do so, we need to isolate the cyclical (that is, relating to the business cycle) components of GDP and hours worked. This is a standard exercise in business cycle macroeconomics, where we deal with time series variables (such as GDP) that have a trend component (for example, the long-run growth of GDP over time) and a cyclical component that fluctuates around that trend. This cyclical component indicates the business cycles, and it is this component we need to isolate to see the great moderation in the data.

Macroeconomists typically use the Hodrick–Prescott (HP) filter to estimate the cyclical and trend components. Its main idea is to fit a smooth curve (the trend) through the time series, where the trend reacts more to long-term fluctuations than to short-term fluctuations (the latter will mostly affect the cyclical component). The HP filter uses a parameter (‘lambda’) to dictate how sensitive this trend is to short-term fluctuations (greater values of mean less sensitivity; that is, a smoother trend). This lambda value needs to be chosen depending on the frequency of the data; a popular value for annual data (like ours) is = 6.25.

  1. We will first show the great moderation in the US data and then explore a potential explanation for it.
  1. Create new variables for the natural log of annual GDP and hours worked (female, male, and all persons). Read the ‘Find out more’ box in Doing Economics Project 4.2 on why economists take logs of variables and how to do it in R.
  1. For both countries, run the HP filter with lambda = 6.25 on 1) log annual GDP and 2) log hours worked per capita over time (female, male, and total separately) and save both the trend and cyclical components as their own variables. Make sure to group by country first (otherwise R would treat both countries’ log GDP as one data series) and filter out the missing values using the code ‘filter(!is.na(GDP_log))’. (For help, see R walk-through 6 on running the HP filter.)
  1. To show the great moderation in your data, plot line charts of the cyclical components of log GDP and of hours worked (of all persons) separately, by country (four charts in total). Add the recession periods as shaded vertical areas as before.
  1. Looking at the log GDP chart for the US, can you see the great moderation in your chart? When does it start (roughly)? Some economists argue that the great moderation in the US has ended; which year/event do you think they regard as its end? Does your other country also display a moderation period? If so, when does it start and end (roughly)?

R walk-through 6 Using the Hodrick–Prescott (HP) filter and plotting cyclical components

We use the log function to create variables containing the natural log of annual GDP and of hours worked (female, male, and all persons).

dataset_main <- dataset_main %>% 
  mutate(GDP_log = log(GDP),
         hours_worked_all_log = log(hours_worked_all),
         hours_worked_female_log = log(hours_worked_female),
         hours_worked_male_log = log(hours_worked_male))

To apply the HP filter to the four log variables, we use the hpfilter function, which is part of the mFilter package. We want to set the smoothing parameter lambda to 6.25 (a standard value for annual data like ours) and save both the trend and cyclical components. The hpfilter function outputs two vectors: trend, containing the trend component, and cycle, containing the cyclical component. To save the trend component as a new variable, we add $trend after calling the hpfilter function.

Before applying the HP filter, we need to group by country (otherwise R would treat both countries’ log GDP as one data series) and filter out the missing values via filter(!is.na(GDP_log)). Note that this removes all observations which have missing values in GDP_log. We therefore merge the dataset with the HP-filtered values back to the main dataset. Finally, we remove all intermediate datasets (with their filenames all starting with dataset_main_filter) except for dataset_main, using rm(list=ls(pattern="dataset_main_filter")).

dataset_main_filter1 <- dataset_main %>% 
  group_by(Country) %>% 
  filter(!is.na(GDP_log)) %>% 
  mutate(GDP_log_trend = hpfilter(GDP_log, freq=6.25)$trend,
         GDP_log_cycle = hpfilter(GDP_log, freq=6.25)$cycle)

dataset_main_filter2 <- dataset_main %>% 
  group_by(Country) %>% 
  filter(!is.na(hours_worked_all_log)) %>% 
  mutate(hours_worked_all_log_trend = hpfilter(hours_worked_all_log, freq=6.25)$trend,
         hours_worked_all_log_cycle = hpfilter(hours_worked_all_log, freq=6.25)$cycle)

dataset_main_filter3 <- dataset_main %>% 
  group_by(Country) %>% 
  filter(!is.na(hours_worked_female_log)) %>% 
  mutate(hours_worked_female_log_trend = hpfilter(hours_worked_female_log, freq=6.25)$trend,
         hours_worked_female_log_cycle = hpfilter(hours_worked_female_log, freq=6.25)$cycle) 

dataset_main_filter4 <- dataset_main %>% 
  group_by(Country) %>% 
  filter(!is.na(hours_worked_male_log)) %>% 
  mutate(hours_worked_male_log_trend = hpfilter(hours_worked_male_log, freq=6.25)$trend, 
         hours_worked_male_log_cycle = hpfilter(hours_worked_male_log, freq=6.25)$cycle)

dataset_main <- 
  list(dataset_main,dataset_main_filter1,dataset_main_filter2,dataset_main_filter3,dataset_main_filter4) %>% 
  reduce(full_join)
rm(list=ls(pattern="dataset_main_filter"))

We plot the cyclical component of log GDP and of hours worked (by all persons) for each country, again adding the recession areas.

dataset_main %>%
  filter(Country == "United States") %>%
  ggplot(aes(Year)) +
  geom_line(aes(y = GDP_log_cycle)) +
  theme_bw() +
	# Add the vertical axis label
  labs(y = "") +  
  # Scale horizontal axis tick marks in 10-year intervals
  scale_x_continuous(breaks = seq(1960, 2020, by = 10)) +
  # Add the title and legend labels
  ggtitle("Cylical component of log GDP (United States)") +
  # Move the legend below the chart
  theme(legend.position="bottom") +
  # Remove the legend title
  theme(legend.title=element_blank()) +
  # Add recession periods as shaded areas
  geom_rect(data = subset(dataset_main, Recession == 1 & Country == "United States"), 
            aes(ymin = -Inf, ymax = Inf, xmin = Year-0.5, xmax = Year+0.5), 
            alpha = 0.2)
Cyclical component of log GDP (United States)
Fullscreen

Walk-through figure 5 Cyclical component of log GDP (United States)

dataset_main %>%
  filter(Country == "United States") %>%
  ggplot(aes(Year)) +
  geom_line(aes(y = hours_worked_all_log_cycle)) +
  theme_bw() +
	# Add the vertical axis label
  labs(y = "") +  
  # Scale horizontal axis tick marks in 10-year intervals
  scale_x_continuous(breaks = seq(1960, 2020, by = 10)) +
  # Add the title and legend labels
  ggtitle("Cylical component of average hours worked (United States)") +
  # Move the legend below the chart
  theme(legend.position="bottom") +
  # Remove the legend title
  theme(legend.title=element_blank()) +
  # Add recession periods as shaded areas
  geom_rect(data = subset(dataset_main, Recession == 1 & Country == "United States"), 
            aes(ymin = -Inf, ymax = Inf, xmin = Year-0.5, xmax = Year+0.5), 
            alpha = 0.2)
Cyclical component of average hours worked (United States)
Fullscreen

Walk-through figure 6 Cyclical component of average hours worked (United States)

dataset_main %>%
  filter(Country == "Germany") %>%
  ggplot(aes(Year)) +
  geom_line(aes(y = GDP_log_cycle)) +
  theme_bw() +
	# Add the vertical axis label
  labs(y = "") +  
  # Scale horizontal axis tick marks in 10-year intervals
  scale_x_continuous(breaks = seq(1960, 2020, by = 10)) +
  # Add the title and legend labels
  ggtitle("Cylical component of log GDP (Germany)") +
  # Move the legend below the chart
  theme(legend.position="bottom") +
  # Remove the legend title
  theme(legend.title=element_blank()) +
  # Add recession periods as shaded areas
  geom_rect(data = subset(dataset_main, Recession == 1 & Country == "Germany"), 
            aes(ymin = -Inf, ymax = Inf, xmin = Year-0.5, xmax = Year+0.5), 
            alpha = 0.2)
Cyclical component of log GDP (Germany)
Fullscreen

Walk-through figure 7 Cyclical component of log GDP (Germany)

dataset_main %>%
  filter(Country == "Germany") %>%
  ggplot(aes(Year)) +
  geom_line(aes(y = hours_worked_all_log_cycle)) +
  theme_bw() +
	# Add the vertical axis label
  labs(y = "") +  
  # Scale horizontal axis tick marks in 10-year intervals
  scale_x_continuous(breaks = seq(1960, 2020, by = 10)) +
  # Add the title and legend labels
  ggtitle("Cylical component of average hours worked (Germany)") +
  # Move the legend below the chart
  theme(legend.position="bottom") +
  # Remove the legend title
  theme(legend.title=element_blank()) +
  # Add recession periods as shaded areas
  geom_rect(data = subset(dataset_main, Recession == 1 & Country == "Germany"), 
            aes(ymin = -Inf, ymax = Inf, xmin = Year-0.5, xmax = Year+0.5), 
            alpha = 0.2)
Cyclical component of average hours worked (Germany)
Fullscreen

Walk-through figure 8 Cyclical component of average hours worked (Germany)

As you can see in your charts, the cyclical components of GDP and employment seem to co-move over the business cycle. That is, in expansion periods, both output and employment grow (and in recessions, both shrink). Economists call this property of employment ‘procyclical’. Unemployment behaves the other way around; therefore it is ‘countercyclical’. To measure the degree of procyclicality and of cyclical volatility of an economic variable, economists typically compare it relative to GDP, which we will do in the next tasks.

  1. To study the cyclicality of total employment, compute the correlation coefficient of the cyclical component of log hours worked of all persons with the cyclical component of log GDP (separately for each country). How can you tell that employment is a procyclical variable (at least in the US)? In which of your countries is employment more cyclical?
  1. To evaluate the cyclical volatility of employment, compute the standard deviation (s.d.) of each of the three cyclical components of log hours (all persons, female, and male) separately for each country. Then divide it by the standard deviation of the cyclical component of log GDP. Make sure to compute the standard deviations over the same time period within each country (the time periods can be different between countries, but must be the same between economic variables to compare them). Is employment more or less volatile than GDP? Which is more volatile—male or female employment? How does it differ between the US and your other country?
  1. Now we get to finding an explanation for the great moderation in the US:
    • Compute the relative standard deviations (that is, cyclical volatilities) in the same way as in Question 2e for the US, but only for female and male employment, splitting the sample into two periods: 1983 to 2007 (the last pre-global financial crisis year), and 2008 until your last year of data.
    • Create a grouped column chart with four columns, each column representing one relative standard deviation (all from US data): pre-global financial crisis female and male, and post-global financial crisis female and male. (For help, see R walk-through 7 on creating grouped column charts.)
    • Create the same column chart for your other country as well (use the same time periods as for the US).

R walk-through 7 Computing cyclical properties of economic variables and plotting a grouped bar chart

First, we study the cyclicality of average hours worked (of all persons). To do so, compute the correlation coefficient of the cyclical components of log hours worked (of all persons) with the cyclical component of log GDP (separately for each country). We use the cor function to compute the correlation, but need to add use="complete.obs" within the cor function to omit missing values in the computation.

cyclical_correlation <- dataset_main %>%
  group_by(Country) %>% 
  summarise(corr_hours_worked_GDP = cor(hours_worked_all_log_cycle, GDP_log_cycle, use="complete.obs"))

print(cyclical_correlation)
## # A tibble: 2 × 2
##   Country       corr_hours_worked_GDP
##   <chr>                         <dbl>
## 1 Germany                     -0.0492
## 2 United States                0.724

Next, we compute the standard deviation of the cyclical components of log hours worked (of all persons, female, and male) separately for each country. We also compute the standard deviation of the cyclical component of log GDP. Then to obtain the relative standard deviation (also known as ‘cyclical volatility’) of the three hours_worked variables, we divide their standard deviation by the standard deviation of GDP. By only considering observations which are non-missing in all four variables (using filter_at), we ensure that the standard deviations are computed over the same time periods within each country.

cyclical_volatility <- dataset_main %>%
  group_by(Country) %>% 
  filter_at(vars(GDP,hours_worked_all_log_cycle,hours_worked_female_log_cycle,hours_worked_male_log_cycle),all_vars(!is.na(.))) %>% 
  summarise(sd_total_hours_worked = sd(hours_worked_all_log_cycle),
            sd_female_hours_worked = sd(hours_worked_female_log_cycle),
            sd_male_hours_worked = sd(hours_worked_male_log_cycle),
            sd_GDP = sd(GDP_log_cycle)) %>% 
  mutate(relative_sd_total_hours_worked = sd_total_hours_worked/sd_GDP,
         relative_sd_female_hours_worked = sd_female_hours_worked/sd_GDP,
         relative_sd_male_hours_worked = sd_male_hours_worked/sd_GDP) %>%
  select(Country,relative_sd_total_hours_worked,relative_sd_female_hours_worked,relative_sd_male_hours_worked)

print(cyclical_volatility)
## # A tibble: 2 × 4
##   Country       relative_sd_total_hours_worked relative_sd_female_hour…¹ relat…²
##   <chr>                                  <dbl>                     <dbl>   <dbl>
## 1 Germany                                0.237                     0.435   0.167
## 2 United States                          0.233                     0.192   0.274
## # … with abbreviated variable names ¹​relative_sd_female_hours_worked,
## #   ²​relative_sd_male_hours_worked

We now split the sample into a pre-global financial crisis (1983–2007) and a post-global financial crisis (2008–latest available) period, repeating the calculation of relative standard deviations, but only for female and male hours worked in the US.

cyclical_volatility_pre_financial_crisis_US <- dataset_main %>%
  filter(Country == "United States" & Year >= 1983 & Year < 2008) %>% 
  filter_at(vars(GDP,hours_worked_female_log_cycle,hours_worked_male_log_cycle),all_vars(!is.na(.))) %>% 
  summarise(sd_female_hours_worked = sd(hours_worked_female_log_cycle),
            sd_male_hours_worked = sd(hours_worked_male_log_cycle),
            sd_GDP = sd(GDP_log_cycle)) %>% 
  mutate(relative_sd_female_hours_worked = sd_female_hours_worked/sd_GDP,
         relative_sd_male_hours_worked = sd_male_hours_worked/sd_GDP) %>%
  select(relative_sd_female_hours_worked,relative_sd_male_hours_worked) %>% 
  mutate(period = "1983-2007")

cyclical_volatility_post_financial_crisis_US <- dataset_main %>%
  filter(Country == "United States" & Year >= 2008) %>% 
  filter_at(vars(GDP,hours_worked_female_log_cycle,hours_worked_male_log_cycle),all_vars(!is.na(.))) %>% 
  summarise(sd_female_hours_worked = sd(hours_worked_female_log_cycle),
            sd_male_hours_worked = sd(hours_worked_male_log_cycle),
            sd_GDP = sd(GDP_log_cycle)) %>% 
  mutate(relative_sd_female_hours_worked = sd_female_hours_worked/sd_GDP,
         relative_sd_male_hours_worked = sd_male_hours_worked/sd_GDP) %>%
  select(relative_sd_female_hours_worked,relative_sd_male_hours_worked) %>% 
  mutate(period = "2008--today")

cyclical_volatility_comparison_US <- 
  bind_rows(cyclical_volatility_pre_financial_crisis_US,cyclical_volatility_post_financial_crisis_US) %>% 
  relocate(period)

print(cyclical_volatility_comparison_US)
##        period relative_sd_female_hours_worked relative_sd_male_hours_worked
## 1   1983-2007                       0.2309357                     0.3724551
## 2 2008--today                       0.1814919                     0.2144338
rm(cyclical_volatility_pre_financial_crisis_US,cyclical_volatility_post_financial_crisis_US)

Now we plot a grouped bar chart based on the cyclical_volatility_comparison table. First, we need to reshape the data into long format, with an extra column to indicate sex.

cyclical_volatility_comparison_US <- cyclical_volatility_comparison_US %>% 
  gather(key = relative_sd_, value = relative_sd, relative_sd_female_hours_worked:relative_sd_male_hours_worked) %>% 
  mutate(sex = case_when(relative_sd_ == "relative_sd_female_hours_worked" ~ "female",
                            TRUE ~ "male")) %>% 
  select(-relative_sd_)

cyclical_volatility_comparison_US %>%
  ggplot() +
  aes(fill=sex, y=relative_sd, x=period) +
  geom_bar(position = "dodge", stat = "identity") +
  theme_bw() +
  # Add the vertical axis label
  labs(y = "Standard deviation of hours worked relative to GDP") +
  # Add the title and legend labels
  ggtitle("Cyclical volatility of hours worked by sex (United States)") +
  scale_fill_discrete(name = "", labels = c("Female", "Male")) + 
  # Move the legend below the chart
  theme(legend.position="bottom") +
  # Remove the legend title and the horizontal axis label
  theme(legend.title=element_blank(),
        axis.title.x=element_blank())
Cyclical volatility of hours worked by sex (United States)
Fullscreen

Walk-through figure 9 Cyclical volatility of hours worked by sex (United States)

Then we create the same bar chart for Germany, using the same sample split as for the US:

cyclical_volatility_pre_financial_crisis_Germany <- dataset_main %>%
  filter(Country == "Germany" & Year >= 1983 & Year < 2008) %>% 
  filter_at(vars(GDP,hours_worked_female_log_cycle,hours_worked_male_log_cycle),all_vars(!is.na(.))) %>% 
  summarise(sd_female_hours_worked = sd(hours_worked_female_log_cycle),
            sd_male_hours_worked = sd(hours_worked_male_log_cycle),
            sd_GDP = sd(GDP_log_cycle)) %>% 
  mutate(relative_sd_female_hours_worked = sd_female_hours_worked/sd_GDP,
         relative_sd_male_hours_worked = sd_male_hours_worked/sd_GDP) %>%
  select(relative_sd_female_hours_worked,relative_sd_male_hours_worked) %>% 
  mutate(period = "1983-2007")

cyclical_volatility_post_financial_crisis_Germany <- dataset_main %>%
  filter(Country == "Germany" & Year >= 2008) %>% 
  filter_at(vars(GDP,hours_worked_female_log_cycle,hours_worked_male_log_cycle),all_vars(!is.na(.))) %>% 
  summarise(sd_female_hours_worked = sd(hours_worked_female_log_cycle),
            sd_male_hours_worked = sd(hours_worked_male_log_cycle),
            sd_GDP = sd(GDP_log_cycle)) %>% 
  mutate(relative_sd_female_hours_worked = sd_female_hours_worked/sd_GDP,
         relative_sd_male_hours_worked = sd_male_hours_worked/sd_GDP) %>%
  select(relative_sd_female_hours_worked,relative_sd_male_hours_worked) %>% 
  mutate(period = "2008--today")

cyclical_volatility_comparison_Germany <- 
  bind_rows(cyclical_volatility_pre_financial_crisis_Germany,cyclical_volatility_post_financial_crisis_Germany) %>% 
  relocate(period)

rm(cyclical_volatility_pre_financial_crisis_Germany,cyclical_volatility_post_financial_crisis_Germany)

cyclical_volatility_comparison_Germany <- cyclical_volatility_comparison_Germany %>% 
  gather(key = relative_sd_, value = relative_sd, relative_sd_female_hours_worked:relative_sd_male_hours_worked) %>% 
  mutate(sex = case_when(relative_sd_ == "relative_sd_female_hours_worked" ~ "female",
                            TRUE ~ "male")) %>% 
  select(-relative_sd_)

cyclical_volatility_comparison_Germany %>%
  ggplot() +
  aes(fill=sex, y=relative_sd, x=period) +
  geom_bar(position = "dodge", stat = "identity") +
  theme_bw() +
	# Add the vertical axis label
  labs(y = "Standard deviation of hours worked relative to GDP") +
  # Add the title and legend labels
  ggtitle("Cyclical volatility of hours worked by sex (Germany)") +
  scale_fill_discrete(name = "", labels = c("Female", "Male")) + 
  # Move the legend below the chart
  theme(legend.position="bottom") +
  # Remove the legend title and the horizontal axis label
  theme(legend.title=element_blank(),
        axis.title.x=element_blank())
Cyclical volatility of hours worked by sex (Germany)
Fullscreen

Walk-through figure 10 Cyclical volatility of hours worked by sex (Germany)

To keep the workspace clean, delete all datasets except for dataset_main.

rm(list=ls(pattern="cyclical"))
  1. Interpret the column chart for the US. Do your results on female vs male cyclical volatility from Question 2e hold for both time periods? Taking into account the changes in labour force participation shown in Question 1, come up with a hypothesis for how these changes caused GDP and employment to become less volatile (the defining features of the great moderation). (Hint: Think about the change in the composition of total employment.)
  1. Compare the column charts between the US and your chosen country. Do the same characteristics of female employment (less volatile than male) also apply to your other country? If you have found a great moderation for your other country in Question 2d, can it be explained in the same way as for the US? (Note that you might have to adjust the years of the sample split to those you found in Question 2d for your other country.)

In the next question, we will look at the second macro puzzle: the slowdown of US productivity growth in the 1970s and 1980s. We take the average labour productivity (ALP) series as a measure of productivity. To measure sex-specific employment, we will use the labour force participation (LFP) rates instead of hours worked in this question because the hours data for the US only starts in 1979 (while the LFP data starts in 1960). Note that LFP only captures the extensive margin of employment, not the intensive margin. This again highlights the limitations of using aggregated data compared to micro data, which would contain hours worked by sex for earlier years as well.

We will use the HP filter to isolate the trend (that is, growth) components of the log of ALP and of log female and male LFP rates, as we are interested in the long-run trend rather than the (business) cyclical variations around that trend.

  1. We will again show the productivity slowdown first and then try to find an explanation for it:
  1. First document the productivity slowdown in the US and check if a similar trend can be found for your other country:
    • Apply the HP filter to the log of ALP with lambda = 6.25 (remembering to group by country and filter out missing values of labour productivity first), saving only the trend component.
    • Compute the annual growth rate in percentage of that trend component.
    • Plot the growth rate in a line chart over time (separate charts by country), using all available years of data.
    • Add in the recession periods as grey shaded areas.
    • Verify that productivity growth for the US has slowed down during the 1970s and 1980s.
    • Repeat the steps above for your chosen country. Was there a similar slowdown in your chosen country?
  1. Now add employment (measured by LFP rate) to both line charts:
    • Apply the HP filter to the log of the female and male LFP rates with lambda = 6.25 (remembering to group by country and filter out missing values of labour productivity first). In doing so, apply the filter only to observations from 1963 onwards to exclude the jumps in LFP rates due to changes in methodology. Save only the trend components.
    • Compute the annual growth rates in percentage of both trend components.
    • Add both growth rates to the two charts from Question 3a and include a legend. (For help, see R walk-through 8 on plotting the trend component from the HP filter.)
    • What is the relationship of these two variables (female and male LFP trends) to the ALP trend component?

R walk-through 8 Interpreting the trend component of the Hodrick–Prescott (HP) filter

We apply the HP filter (hpfilter) to the log of average labour productivity (ALP, the variable labour_productivity) using the smoothing parameter lambda = 6.25, saving only the trend component. Before doing so, we group by country and filter out missing values in labour_productivity. Then we compute the annual growth rate (in %) of that trend component.

dataset_main_filter <- dataset_main %>%
  group_by(Country) %>% 
  filter(!is.na(labour_productivity)) %>% 
  mutate(labour_productivity_trend = hpfilter(log(labour_productivity), freq=6.25)$trend,
         labour_productivity_trend_growth = (labour_productivity_trend/lag(labour_productivity_trend) - 1) * 100)

dataset_main <- 
  list(dataset_main,dataset_main_filter) %>% 
  reduce(full_join)
rm(dataset_main_filter)

Plot the growth rate of the trend component of US productivity over time, separately for each country, adding in the recession periods as shaded areas.

dataset_main %>%
  filter(Country == "United States") %>%
  ggplot(aes(Year)) +
  geom_line(aes(y = labour_productivity_trend_growth)) +
  theme_bw() + 
	# Scale horizontal axis tick marks in 10-year intervals
  scale_x_continuous(breaks = seq(1960, 2020, by = 10)) +
  # Add the vertical axis label
  labs(y = "Growth rate in %") +
  # Add the title
  ggtitle("Trend productivity (United States)") +
  # Remove the horizontal axis label
  theme(axis.title.x=element_blank()) +
  # Add recession periods as shaded areas
  geom_rect(data = subset(dataset_main, Recession == 1 & Country == "United States"), 
            aes(ymin = -Inf, ymax = Inf, xmin = Year-0.5, xmax = Year+0.5), 
            alpha = 0.2)
Trend productivity (United States)
Fullscreen

Walk-through figure 11 Trend productivity (United States)

Create the same chart for Germany.

dataset_main %>%
  filter(Country == "Germany") %>%
  ggplot(aes(Year)) +
  geom_line(aes(y = labour_productivity_trend_growth)) +
  theme_bw() + 
	# Scale horizontal axis tick marks in 10-year intervals
  scale_x_continuous(breaks = seq(1960, 2020, by = 10)) +
  # Add the vertical axis label
  labs(y = "Growth rate in %") +
  # Add the title
  ggtitle("Trend productivity (Germany)") +
  # Remove the horizontal axis label
  theme(axis.title.x=element_blank()) +
  # Add recession periods as shaded areas
  geom_rect(data = subset(dataset_main, Recession == 1 & Country == "Germany"), 
            aes(ymin = -Inf, ymax = Inf, xmin = Year-0.5, xmax = Year+0.5), 
            alpha = 0.2)
Trend productivity (Germany)
Fullscreen

Walk-through figure 12 Trend productivity (Germany)

Apply the HP filter (hpfilter) to the log of female and male LFP rates with lambda = 6.25. To exclude the jumps in LFP due to changes in methodology, apply the filter only to years 1963 and onwards. Save the trend components, then compute their annual growth rates in percent.

dataset_main_filter <- dataset_main %>%
  group_by(Country) %>% 
  filter(!is.na(LFP_rate_female) & !is.na(LFP_rate_male) & Year >= 1963) %>% 
  mutate(LFP_rate_female_trend = hpfilter(log(LFP_rate_female), freq=6.25)$trend,
         LFP_rate_male_trend = hpfilter(log(LFP_rate_male), freq=6.25)$trend,
         LFP_rate_female_trend_growth = (LFP_rate_female_trend/lag(LFP_rate_female_trend) - 1) * 100,
         LFP_rate_male_trend_growth = (LFP_rate_male_trend/lag(LFP_rate_male_trend) - 1) * 100)

dataset_main <-
  list(dataset_main,dataset_main_filter) %>% 
  reduce(full_join)
rm(dataset_main_filter)

Add the growth rates of the log LFP rates to the productivity charts.

dataset_main %>%
  filter(Country == "United States") %>%
  ggplot(aes(Year)) +
  geom_line(aes(y = LFP_rate_female_trend_growth, colour = "labour_productivity_trend_growth")) +
  geom_line(aes(y = LFP_rate_male_trend_growth, colour = "LFP_rate_male_trend_growth")) +
  geom_line(aes(y = labour_productivity_trend_growth, colour = "LFP_rate_female_trend_growth")) +
  theme_bw() + 
  # Scale horizontal axis tick marks in 10-year intervals
  scale_x_continuous(breaks = seq(1960, 2020, by = 10)) +
  # Add the vertical axis label
  labs(y = "Growth rate in %") +
  # Add the title and legend labels
  ggtitle("Trend productivity and labour force participation (United States)") +
  scale_colour_discrete(name = "", 
    labels = c("Female LFP trend", "ALP trend", "Male LFP trend")) +   
  # Move the legend below the chart
  theme(legend.position="bottom") +
  # Remove the horizontal axis label and the legend title
  theme(axis.title.x=element_blank(),
        legend.title=element_blank()) +
  # Add recession periods as shaded areas
  geom_rect(data = subset(dataset_main, Recession == 1 & Country == "United States"), 
            aes(ymin = -Inf, ymax = Inf, xmin = Year-0.5, xmax = Year+0.5), 
            alpha = 0.2)
Trend productivity and labour force participation (United States)
Fullscreen

Walk-through figure 13 Trend productivity and labour force participation (United States)

dataset_main %>%
  filter(Country == "Germany") %>%
  ggplot(aes(Year)) +
  geom_line(aes(y = LFP_rate_female_trend_growth, colour = "LFP_rate_female_trend_growth")) +
  geom_line(aes(y = LFP_rate_male_trend_growth, colour = "LFP_rate_male_trend_growth")) +
  geom_line(aes(y = labour_productivity_trend_growth, colour = "labour_productivity_trend_growth")) +
  theme_bw() + 
  # Scale horizontal axis tick marks in 10-year intervals
  scale_x_continuous(breaks = seq(1960, 2020, by = 10)) +
  # Add the vertical axis label
  labs(y = "Growth rate in %") +
  # Add the title and legend labels
  ggtitle("Trend productivity and labour force participation (Germany)") +
  scale_colour_discrete(name = "", 
    labels = c("ALP trend","Female LFP trend", "Male LFP trend")) +   
  # Move the legend below the chart
  theme(legend.position="bottom") +
  # Remove the horizontal axis label and the legend title
  theme(axis.title.x=element_blank(),
        legend.title=element_blank()) +
  # Add recession periods as shaded areas
  geom_rect(data = subset(dataset_main, Recession == 1 & Country == "Germany"), 
            aes(ymin = -Inf, ymax = Inf, xmin = Year-0.5, xmax = Year+0.5), 
            alpha = 0.2)
Trend productivity and labour force participation (Germany)
Fullscreen

Walk-through figure 14 Trend productivity and labour force participation (Germany)

  1. Based on the results from Question 3b, come up with an explanation for the productivity slowdown, taking into account how labour force participation (and therefore the composition of the labour force) changed during the time of the productivity slowdown in the 1970s and 1980s. Why would this change in composition initially lower productivity growth? Why might this change in composition increase productivity growth in the longer term?
  1. If there was a productivity slowdown in the other country, could it be explained in the same way as for the US?
  1. The final macro puzzle we will investigate is that of jobless recoveries. As we want to consider the 1970s recessions, we again measure employment through the log of the LFP rate (extensive margin only).
  1. For the US data only, create a line chart of the log LFP rate (of all persons) over time, starting in 1963 (because of the change in methodology). Add the recession periods as shaded areas. (For help, see R walk-through 9 on creating line charts of the labour force participation rates.)
  1. Explain how the line chart shows ‘jobless recoveries’. Approximately when did recoveries start being jobless?
  1. Create two more line charts in the same way as in Question 4a: 1) log of female LFP rate; 2) log of male LFP rate.

R walk-through 9 Jobless recoveries

Plot the log of the LFP rate of all persons over time (starting in 1963), adding in the recession periods as shaded areas.

dataset_main %>%
  filter(Country == "United States" & Year >= 1963) %>%
  ggplot(aes(Year)) +
  geom_line(aes(y = log(LFP_rate_all))) +
  theme_bw() + 
  # Scale horizontal axis tick marks in 10-year intervals
  scale_x_continuous(breaks = seq(1960, 2020, by = 10)) +
  # Add the vertical axis label
  labs(y = "Log value") +
  # Add the title and legend labels
  ggtitle("Labour force participation over time (United States)") +
  # Remove the horizontal axis label
  theme(axis.title.x=element_blank()) +
  # Add recession periods as shaded areas
  geom_rect(data = subset(dataset_main, Recession == 1 & Country == "United States"), 
            aes(ymin = -Inf, ymax = Inf, xmin = Year-0.5, xmax = Year+0.5), 
            alpha = 0.2)
Labour force participation over time (United States)
Fullscreen

Walk-through figure 15 Labour force participation over time (United States)

Now redo this chart, split by sex.

dataset_main %>%
  filter(Country == "United States" & Year >= 1963) %>%
  ggplot(aes(Year)) +
  geom_line(aes(y = log(LFP_rate_female))) +
  theme_bw() + 
  # Scale horizontal axis tick marks in 10-year intervals
  scale_x_continuous(breaks = seq(1960, 2020, by = 10)) +
  # Add the vertical axis label
  labs(y = "Log value") +
  # Add the title and legend labels
  ggtitle("Female labour force participation over time (United States)") +
  # Remove the horizontal axis label
  theme(axis.title.x=element_blank()) +
  # Add recession periods as shaded areas
  geom_rect(data = subset(dataset_main, Recession == 1 & Country == "United States"), 
            aes(ymin = -Inf, ymax = Inf, xmin = Year-0.5, xmax = Year+0.5), 
            alpha = 0.2)
Female labour force participation over time (United States)
Fullscreen

Walk-through figure 16 Female labour force participation over time (United States)

dataset_main %>%
  filter(Country == "United States" & Year >= 1963) %>%
  ggplot(aes(Year)) +
  geom_line(aes(y = log(LFP_rate_male))) +
  theme_bw() + 
  # Scale horizontal axis tick marks in 10-year intervals
  scale_x_continuous(breaks = seq(1960, 2020, by = 10)) +
  # Add the vertical axis label
  labs(y = "Log value") +
  # Add the title and legend labels
  ggtitle("Male labour force participation over time (United States)") +
  # Remove the horizontal axis label
  theme(axis.title.x=element_blank()) +
  # Add recession periods as shaded areas
  geom_rect(data = subset(dataset_main, Recession == 1 & Country == "United States"), 
            aes(ymin = -Inf, ymax = Inf, xmin = Year-0.5, xmax = Year+0.5), 
            alpha = 0.2)
Male labour force participation over time (United States)
Fullscreen

Walk-through figure 17 Male labour force participation over time (United States)

Redo all three charts for Germany.

dataset_main %>%
  filter(Country == "Germany") %>%
  ggplot(aes(Year)) +
  geom_line(aes(y = log(LFP_rate_all))) +
  theme_bw() + 
  # Scale horizontal axis tick marks in 10-year intervals
  scale_x_continuous(breaks = seq(1960, 2020, by = 10)) +
  # Add the vertical axis label
  labs(y = "Log value") +
  # Add the title and legend labels
  ggtitle("Labour force participation over time (Germany)") +
  # Remove the horizontal axis label
  theme(axis.title.x=element_blank()) +
  # Add recession periods as shaded areas
  geom_rect(data = subset(dataset_main, Recession == 1 & Country == "Germany"), 
            aes(ymin = -Inf, ymax = Inf, xmin = Year-0.5, xmax = Year+0.5), 
            alpha = 0.2)
Labour force participation over time (Germany)
Fullscreen

Walk-through figure 18 Labour force participation over time (Germany)

dataset_main %>%
  filter(Country == "Germany") %>%
  ggplot(aes(Year)) +
  geom_line(aes(y = log(LFP_rate_female))) +
  theme_bw() + 
  # Scale horizontal axis tick marks in 10-year intervals
  scale_x_continuous(breaks = seq(1960, 2020, by = 10)) +
  # Add the vertical axis label
  labs(y = "Log value") +
  # Add the title and legend labels
  ggtitle("Female labour force participation over time (Germany)") +
  # Remove the horizontal axis label
  theme(axis.title.x=element_blank()) +
  # Add recession periods as shaded areas
  geom_rect(data = subset(dataset_main, Recession == 1 & Country == "Germany"), 
            aes(ymin = -Inf, ymax = Inf, xmin = Year-0.5, xmax = Year+0.5), 
            alpha = 0.2)
Female labour force participation over time (Germany)
Fullscreen

Walk-through figure 19 Female labour force participation over time (Germany)

dataset_main %>%
  filter(Country == "Germany") %>%
  ggplot(aes(Year)) +
  geom_line(aes(y = log(LFP_rate_male))) +
  theme_bw() + 
  # Scale horizontal axis tick marks in 10-year intervals
  scale_x_continuous(breaks = seq(1960, 2020, by = 10)) +
  # Add the vertical axis label
  labs(y = "Log value") +
  # Add the title and legend labels
  ggtitle("Male labour force participation over time (Germany)") +
  # Remove the horizontal axis label
  theme(axis.title.x=element_blank()) +
  # Add recession periods as shaded areas
  geom_rect(data = subset(dataset_main, Recession == 1 & Country == "Germany"), 
            aes(ymin = -Inf, ymax = Inf, xmin = Year-0.5, xmax = Year+0.5), 
            alpha = 0.2)
Male labour force participation over time (Germany)
Fullscreen

Walk-through figure 20 Male labour force participation over time (Germany)

  1. How do recoveries after recessions differ between female and male LFP? Considering the changes in labour force participation and labour force composition, what could be an explanation for why recoveries in the US have become jobless in the 1990s, 2000s, and 2010s?
  1. Redo Questions 4a and 4c for your other country. Do you also find jobless recoveries? Does the US explanation for jobless recoveries apply to that country too?
  1. Read the introduction (up to ‘The second part of the paper…’ on p. 3) and conclusion of Stefania Albanesi’s research paper ‘Changing Business Cycles: The Role of Women’s Employment’), on which this project is based. Are your results for the US in line with her findings? What have you learned about the micro-to-macro link between growing female labour supply on the household side and the three macroeconomic puzzles?