Extra Empirical Project: Female labour supply and the macroeconomy Working in Google Sheets

Part 1: Collecting and preparing the data

Learning objectives for this part

  • Download and clean customized excerpts from a database.
  • Merge different datasets.
  • Estimate recession dates from GDP data.

In the questions below, we will collect and merge various datasets. We will combine labour force statistics (hours worked and LFP) with macroeconomic statistics (GDP and productivity) so we can assess the relationship between those variables. The goal is to end up with a single dataset containing all the necessary data for the analysis in Part 2.

Before we start, create a subfolder called ‘data’ in your working directory. We will store all datasets in that subfolder.

Choose one country from the following list: Chile, France, Italy, Mexico, New Zealand, Poland, Sweden, and the United Kingdom.

  1. Download annual GDP data for the US and your chosen country from the World Bank statistical database:
  • By clicking the link, you have already selected the correct series, ‘GDP (constant 2015 US$)’. We want to keep prices constant, so that GDP is in real terms (taking out any variation due to changes in the price level). We do not need to worry about fixing exchange rates and purchasing power parity (PPP) here, because we will not compare GDP between countries.
  • On the left under ‘Country’, first unselect all countries (this can be done in one click on the ‘X’ button above the country list), and then select the United States and your other country.
  • Under ‘Time’, select all years by clicking on the checkbox icon (to the left of the ‘Unselect all’ button).
  • Then click on ‘Apply Changes’ in the centre of the screen.
  • Download the data as a CSV file by clicking on ‘Download options’ in the top right (above the data table) and then click on ‘Advanced options’.
  • Select ‘CSV’ in the header (the default is ‘Excel’) and change the ‘Data format’ from ‘Table’ to ‘List’.
  • Click ‘Download’, which will download a ZIP file. Extract the file and delete the CSV file ending in ‘Metadata’. Rename the CSV file ending in ‘Data’ as ‘GDP_annual’ and save it in your ‘data’ subfolder.
  1. Download the following data for the US and the country you chose in Question 1 from the OECD statistical database:
  1. Labour force participation (LFP) rates by sex: You can find the LFP rates under ‘Labour > Labour Force Statistics > LFS by sex and age > LFS by sex and age - indicators’, then click on ‘LFS by sex and age indicators’. You can also find the data series directly by searching for ‘LFS by sex and age indicators’ in the search bar on the top left.
    • Click on the ‘Country’ header in the first column, which should open the customization pop-up.
    • Unselect all countries (can be done in one click via ‘unselect all’ in the top-right corner) and select the United States and one country of your choice.
    • Click on ‘Time & Frequency’ and select from 1960 (leave the upper limit at ‘latest available data’).
    • Click on ‘Sex’ and select all three options.
    • Click on ‘Age’ and unselect all except ‘Total’.
    • Click on ‘Series’ and unselect all except ‘Labour force participation rate’.
    • Click ‘View Data’. The table should now only show the data you selected. To download the data, click on ‘Export > Text file (CSV) > Default format > Download’. After the file has been downloaded, rename it as ‘LFP_rates’, and move it to the ‘data’ subfolder in your working directory.

You can learn more about how GDP is measured and made comparable across countries and time in Section 1.2 of The Economy 2.0: Microeconomics.

  1. Average labour productivity (ALP): One way that the OECD measures labour productivity is by GDP per hour worked, which you can find under ‘Productivity > Productivity and ULC – Annual, Total Economy > Level of GDP per capita and productivity’, then click on ‘Level of GDP per capita and productivity’.
    • Download the data for the US and your other country, from 1970 to the latest available year, selecting ‘GDP per hour worked’ under ‘Subject’.
    • Under ‘Measure’, select only ‘USD, constant prices, 2015 PPPs’. This measure makes sure the productivity (GDP per hour worked) measures are comparable, regardless of country-dependent exchange and inflation rates.
    • Download the data as a CSV file in the same way as before and save it as ‘labour_productivities’ in your ‘data’ subfolder.
  1. Average hours worked per week by sex: The OECD only has hours worked by sex for the main job, so keep in mind that this data disregards hours worked outside of the main job. Also, as these are average hours per worker, they only capture the intensive margin of labour supply. You can find the data series under ‘Labour > Labour Force Statistics > Hours worked > Average usual weekly hours worked on the main job’.

    Do the following customizations for the US and your chosen country:
    • Select data from 1979 to the latest available year.
    • Select all three options for ‘Sex’.
    • Select only ‘Total’ for ‘Age’.
    • For ‘Employment’, select only ‘Dependent employment’. (The data for the US only covers those in dependent employment, so to keep things comparable we will restrict the data to them.)
    • Under ‘Job type’, select only ‘Total declared employment’ (includes both full and part-time workers).
    • Download the data as a CSV file and save it as ‘hours_worked’ in your ‘data’ subfolder.
  1. Quarterly GDP: This data can be found under ‘National Accounts > Quarterly National Accounts > Quarterly National Accounts’, then click on ‘Quarterly National Accounts’ (the first table).
    • Select your two countries, starting from 1947Q1 (you might have to switch from ‘select latest data’ to ‘select date range’ first).
    • Under ‘Subject’, select only ‘Gross domestic product - expenditure approach’.
    • Under ‘Measure’, select only ‘VPVOBARSA: US dollars, volume estimates, fixed PPPs, OECD reference year, annual levels, seasonally adjusted’ (you can find it under ‘VOL – VOLUMES’).
    • Download the data as a CSV file and save it as ‘GDP_quarterly’ in your ‘data’ subfolder.

Find out more Going from micro to macro data

Most macroeconomic researchers nowadays use micro data (for example, at the household or firm level) to construct aggregate variables like the ones you have downloaded from the OECD database. Albanesi, for example, obtains her data series from the Current Population Survey (CPS), a publicly available household survey from the US. You have already seen some limitations of using aggregate data like the OECD’s: you can only use the series they have made available. The OECD does not provide data on hours worked by sex outside of the main job, nor on labour productivity by sex. However, these could easily be computed from micro data. For simplicity, we will not be working with micro data in this Doing Economics project. However, the OECD’s data is based on country-specific micro datasets which have been harmonized and aggregated by the OECD, so by using their data we are indirectly working with micro data as well.

If you are interested in how the NBER defines and dates recessions and expansions, you can read their FAQs.

dummy variable (indicator variable)
A variable that takes the value 1 if a certain condition is met, and 0 otherwise.
  1. We want to link the variables obtained in Question 1 with data on business cycles, for which we will use the US National Bureau of Economic Research (NBER)-based US recession dates from FRED. (Clicking on the link will download the data as a CSV file called ‘USRECQ.csv’; move it into the ‘data’ subfolder in your working directory.) This file contains a quarterly dummy variable (that is, 1 for recession periods and 0 otherwise), which we will use to distinguish between expansion and recession periods in the analysis.
  1. Import the ‘USRECQ.csv’ file into Google Sheets and save it in your Google Drive with the same filename.
  1. We want to create an annual recession dummy because the other data is at the annual level (and we want to have consistent date variables when combining the various datasets). To do so, first create a ‘Year’ variable (of numeric type, not character) based on the ‘DATE’ variable (each year will be repeated four times, as the data is quarterly). Then create an annual dummy variable ‘Recession’ that takes the value 1 for all observations within a year if one or more quarters of that year featured a recession (‘USRECQ’ = 1). (For help see Google Sheets walk-through 1 on creating consistent date variables and Google Sheets walk-through 2 on creating a recession dummy variable.)
  1. Remove (delete) all variables except ‘Year’ and ‘Recession’. Remove duplicate values in ‘Year’ (as we still have every year four times due to the initial data being on the quarterly level). (Hint: Use the ‘Paste special’ and ‘Values’ options to preserve the values of these variables even when the raw data they are based on is deleted.) Create a ‘Country’ variable that takes the value ‘United States’ for all observations. (For help see Google Sheets walk-through 3 on removing duplicate values.)

Google Sheets walk-through 1 Creating consistent date variables

The date variables are in yyyy-mm-dd format. We use the LEFT function to get the year from this variable. Create a new column headed ‘YEAR’ (column C in the sheet shown). In the first cell (C2) of a new column, enter the formula =LEFT(A2,4). This formula will extract the first four characters from the ‘DATE’ variable, which is the year. Click and drag (or double-click) the box in the bottom-right corner of the cell to apply the same formula to the remaining cells in this column.
Fullscreen

Walk-through figure 1 Extract the first four characters from the DATE variable.
The date variables are in yyyy-mm-dd format. We use the LEFT function to get the year from this variable. Create a new column headed ‘YEAR’ (column C in the sheet shown). In the first cell (C2) of a new column, enter the formula =LEFT(A2,4). This formula will extract the first four characters from the ‘DATE’ variable, which is the year. Click and drag (or double-click) the box in the bottom-right corner of the cell to apply the same formula to the remaining cells in this column.

Google Sheets walk-through 2 Creating a recession dummy variable

Creating a recession dummy variable
Fullscreen

Walk-through figure 2 Creating a recession dummy variable

Label the first year as a recession year
:
Fullscreen

Label the first year as a recession year

Create a new column called RECESSION (column D in the sheet shown). There is only one quarter recorded for 1854, so we will record that as a recession. In the first column, enter the formula =IF(B2=1,1,0).

Use the SUM function to identify recession years
: For all remaining years, we have four quarters of data. In cell D3, enter the formula =IF(SUM(B3:B6)>0,1,0). The SUM function calculates the sum of the values in the selected cells, and if this sum is greater than 1, in the case that there is at least one recession that year, it will return the value 1.
Fullscreen

Use the SUM function to identify recession years

For all remaining years, we have four quarters of data. In cell D3, enter the formula =IF(SUM(B3:B6)>0,1,0). The SUM function calculates the sum of the values in the selected cells, and if this sum is greater than 1, in the case that there is at least one recession that year, it will return the value 1.

Duplicate the same value for the next three cells
: Next, type in the function ‘=+D3’ in the next three cells. This formula ensures that the four cells corresponding to the same year all return the same value. For all remaining years, copy the functions in these four cells and paste them into the cells corresponding to the first quarter of that year.
Fullscreen

Duplicate the same value for the next three cells

Next, type in the function ‘=+D3’ in the next three cells. This formula ensures that the four cells corresponding to the same year all return the same value. For all remaining years, copy the functions in these four cells and paste them into the cells corresponding to the first quarter of that year.

Google Sheets walk-through 3 Removing duplicate values

Removing duplicate values
Fullscreen

Walk-through figure 3 Removing duplicate values

Use the ‘Remove duplicates’ feature in Excel
: Select the year and recession dummy columns. In the Data menu, select ‘Data clean-up’, then ‘Remove duplicates’. (Alternatively, you can search for ‘remove duplicates’ in the Help menu.)
Fullscreen

Use the ‘Remove duplicates’ feature in Excel

Select the year and recession dummy columns. In the Data menu, select ‘Data clean-up’, then ‘Remove duplicates’. (Alternatively, you can search for ‘remove duplicates’ in the Help menu.)

Remove duplicates in all columns
: In the pop-up window, make sure the ‘Data has header row’ option and both columns are selected. Click ‘Remove duplicates’, and then in the next window that appears, click ‘OK’.
Fullscreen

Remove duplicates in all columns

In the pop-up window, make sure the ‘Data has header row’ option and both columns are selected. Click ‘Remove duplicates’, and then in the next window that appears, click ‘OK’.

The data with duplicates removed
: There should now only be one row for each year.
Fullscreen

The data with duplicates removed

There should now only be one row for each year.

Recession timing data is not available for other (non-US) countries, so we have to create our own based on the NBER’s definition of a recession (for comparability). The NBER defines a recession as ‘a period when output is declining. It is over once the economy begins to grow again.’

‘A period’ of declining output is usually interpreted as at least two consecutive quarters of economic contraction (negative GDP growth).

  1. In all the steps below, replace ‘YOURCOUNTRYNAME’ with your chosen country’s name or a suitable abbreviation.
  1. Add the quarterly GDP dataset ‘GDP_quarterly.xls’ as a new sheet in your Google Sheets file from Question 3, calling it ‘GDP_quarterly’ (for help on adding datasets as separate sheets see Google Sheets walk-through 4). Use Google Sheets’ filter function to remove all observations for the US (we will be using the NBER’s recession dummy for the US). In a new column, compute quarterly GDP growth rates in percentage points for your non-US country only (for example, a value of 2 if GDP grew from $100 in one quarter to $102 in the next).
  1. Create a recession dummy named ‘Recession_YOURCOUNTRYNAME_quarterly’ that is 1 during recession periods (2 or more consecutive quarters of negative GDP growth) and 0 otherwise. For all such periods, set the dummy to 1 in all quarters (starting from the first quarter of negative GDP growth). A recession is over (the dummy takes the value of 0) as soon as GDP growth turns positive again. For periods of only one quarter of negative GDP growth, the dummy should be set to 0 as well. (Hint: Use Google Sheets’ IF and AND functions in the same formula to set the value of this variable equal to 1 only if both conditions are met; for help see Google Sheets walk-through 5.)
  1. As for the US recession data above, create a variable called ‘Year’ based on the ‘TIME’ variable (each year will be repeated four times as the data is quarterly). (Hint: Use Google Sheets’ ‘LEFT’ function.) Then create an annual dummy variable ‘Recession_YOURCOUNTRYNAME’ that takes the value 1 for all observations within a year if one or more quarters of that year featured a recession (‘Recession_YOURCOUNTRYNAME_quarterly’ = 1). Remove duplicate values in ‘Year’ (as we still have every year listed four times due to the initial data being on the quarterly level).
  1. Keep only the ‘Country’, ‘Year’, and ‘Recession_YOURCOUNTRYNAME’ variables (remove all others).
  1. After you have made these changes, rename this sheet as ‘YOURCOUNTRYNAME_recessions’.

Google Sheets walk-through 4 Adding datasets as separate sheets in the same Google Sheets file

Adding datasets as separate tabs in a spreadsheet
Fullscreen

Walk-through figure 4 Adding datasets as separate tabs in a spreadsheet

Use Google Sheets’ Import feature
: In a blank spreadsheet, select ‘File’, then ‘Import’.
Fullscreen

Use Google Sheets’ Import feature

In a blank spreadsheet, select ‘File’, then ‘Import’.

Find the file on your computer
: In the pop-up window that appears, select ‘Upload’. You can either select ‘Browse’ and find the file on your computer, or drag the file into the pop-up window.
Fullscreen

Find the file on your computer

In the pop-up window that appears, select ‘Upload’. You can either select ‘Browse’ and find the file on your computer, or drag the file into the pop-up window.

Select the correct import options
: Make sure the ‘Convert text to numbers, dates and formulas’ option is selected, then click ‘Import data’. For all subsequent data files you import, use the Import location option of ‘Insert new sheet(s)’ so that the data is added as a separate sheet in your existing file.
Fullscreen

Select the correct import options

Make sure the ‘Convert text to numbers, dates and formulas’ option is selected, then click ‘Import data’. For all subsequent data files you import, use the Import location option of ‘Insert new sheet(s)’ so that the data is added as a separate sheet in your existing file.

Google Sheets walk-through 5 Using IF and AND to make dummy variables that satisfy multiple conditions

Using IF and AND to make dummy variables that satisfy multiple conditions
Fullscreen

Walk-through figure 5 Using IF and AND to make dummy variables that satisfy multiple conditions

Calculate GDP growth across consecutive quarters
: In the spreadsheet shown, columns C–H, J–P, and R–S have been hidden. In the third row of a new column (T in this example), enter the formula ‘=((Q3-Q2)/Q2)*100’ to calculate GDP growth as the percentage change in GDP over consecutive quarters. Double-click the box at the bottom-right corner to apply this formula to all subsequent cells in this column.
Fullscreen

Calculate GDP growth across consecutive quarters

In the spreadsheet shown, columns C–H, J–P, and R–S have been hidden. In the third row of a new column (T in this example), enter the formula ‘=((Q3-Q2)/Q2)*100’ to calculate GDP growth as the percentage change in GDP over consecutive quarters. Double-click the box at the bottom-right corner to apply this formula to all subsequent cells in this column.

Identify recession periods
: The recession variable starts in the fourth row, because it depends on GDP growth in the current and previous quarter. In the fourth row of a new column (U in this example), enter the formula ‘=IF(AND(T3<0,T4<0),1,0)’. This formula sets the value in the cell equal to 1 if all conditions in the ‘AND’ brackets are met (negative values for cells T3 and T4), and 0 otherwise. Double-click the box at the bottom-right corner to apply this formula to all subsequent cells in this column.
Fullscreen

Identify recession periods

The recession variable starts in the fourth row, because it depends on GDP growth in the current and previous quarter. In the fourth row of a new column (U in this example), enter the formula ‘=IF(AND(T3<0,T4<0),1,0)’. This formula sets the value in the cell equal to 1 if all conditions in the ‘AND’ brackets are met (negative values for cells T3 and T4), and 0 otherwise. Double-click the box at the bottom-right corner to apply this formula to all subsequent cells in this column.

Some of the datasets we currently have (‘hours_worked’ and ‘LFP_rates’) are in ‘long’ format, meaning, for example, female, male, and total hours worked are in the same column. It is best practice to work with data in the ‘wide’ format, where each variable (such as female hours worked) is represented by its own column. This simplifies working with the data, such as plotting female hours worked or computing its correlation with male hours worked.

  1. Clean the four remaining datasets (‘GDP_annual’, ‘hours_worked’, ‘labour_productivities’, and ‘LFP_rates’), following the steps described below.
  1. Import ‘GDP_annual’ and ‘labour_productivities’ into your current Google Sheets file as new sheets, using their filenames as sheet names.
  1. In the ‘GDP_annual’ sheet, make sure the ‘Value’ variable is in ‘Number’ format (rather than ‘General’). Remove all observations with missing values for ‘Value’. Rename the ‘Value’ variable as ‘GDP’, the ‘Country.Name’ variable as ‘Country’, and the ‘Time’ variable as ‘Year’. Remove all columns except ‘Country’, ‘Year’, and ‘GDP’.
  1. In the ‘labour_productivities’ sheet, rename the ‘Time’ variable as ‘Year’ and the ‘Value’ variable as ‘labour_productivity’. Remove all columns except ‘Country’, ‘Year’, and ‘labour_productivity’.
  1. Import ‘hours_worked’ and ‘LFP_rates’ into your current Google Sheets file as new sheets, keeping their file names. In each dataset, remove all variables except ‘Country’, ‘Sex’, ‘Time’ (or ‘Year’), and ‘Values’. Then rearrange (‘reshape’) the data so that the values for ‘all persons’, ‘female’, and ‘male’ are in separate columns rather than a single column. For example, in ‘hours_worked’, the hours worked by all persons, females, and males should be their own variables (columns), rather than sharing the ‘SEX’, ‘Sex’, and ‘Value’ columns. You may want to create a new sheet in your Google Sheets file to store the rearranged data.
  1. Using the reshaped data on ‘hours_worked’ and ‘LFP_rates’, name the new variables according to the name of the data series (for example, to ‘hours_worked_all’, ‘hours_worked_female’, or ‘LFP_rate_female’). Rename the ‘Time’ variable as ‘Year’. Remove all columns except ‘Country’, ‘Year’, and the newly created variables.
  1. Copy the ‘GDP_annual’ sheet as a new sheet in the same Google Sheets file. In this sheet, merge all six datasets by their ‘Country’ and ‘Year’ variables. The combined dataset should have 11 variables. (For help on merging data see Google Sheets walk-through 6.) Once you have merged the data:
  1. Combine the recession variable (US data and data for your chosen country) into a single variable. Replace the (missing) values of the ‘Recession’ variable (containing the US recession dummy) with the values of the ‘Recession_YOURCOUNTRYNAME’ variable. Then delete the ‘Recession_YOURCOUNTRYNAME’ variable and sort the data by ‘Country’ and ‘Year’ (in that order).
  1. Delete all observations before 1960 (for both countries), as the OECD data is only available from 1960 onwards (or later), while the US recession period data goes back to 1854.
  1. Save the sheet you created in this question as a new Google Sheets file called ‘dataset_main’. You will work with this file in Part 2.

Google Sheets walk-through 6 Merging data by matching across columns

Merging data by matching across columns
Fullscreen

Walk-through figure 6 Merging data by matching across columns

Example of two data sheets to be merged
: In this example, we are merging labour force participation data (LFP) from the second tab (columns C to E) into the GDP data sheet (GDP_annual). Both sheets have a common variable, called ‘Year’, which we will use to merge the data. In this case, since the data are already sorted by year, you can simply copy and paste the data from one sheet to another, being careful to match the years correctly. Or you can try this more advanced method, which helps when you have larger spreadsheets and when the data is not necessarily sorted in the same order across the spreadsheets you want to merge.
Fullscreen

Example of two data sheets to be merged

In this example, we are merging labour force participation data (LFP) from the second tab (columns C to E) into the GDP data sheet (GDP_annual). Both sheets have a common variable, called ‘Year’, which we will use to merge the data. In this case, since the data are already sorted by year, you can simply copy and paste the data from one sheet to another, being careful to match the years correctly. Or you can try this more advanced method, which helps when you have larger spreadsheets and when the data is not necessarily sorted in the same order across the spreadsheets you want to merge.

Use VLOOKUP to match data across sheets
: For each variable you want to merge, type the variable name in the first row of a new column. Here, we want to add three new columns, containing the labour force participation for the whole population and by sex.To merge the data, type in ‘=VLOOKUP(’ and select the cell with the value you want to match across spreadsheets (A2 in this example, corresponding to the year 1960). Then type ‘,’ go to the tab containing the data you want to merge, and select all the columns starting from the ‘Year’ column. (It is important that ‘Year’ be the first column selected because Google Sheets will match cell A2 with the values in the first column only). Then type ‘,’ and the column number in your selection) corresponding to the variable you want to merge (in this case, column 2 corresponds to the LFP rate for the whole population). Then type ‘,’ and ‘FALSE)’ (‘FALSE’ requires the Year data to match exactly across both sheets). Then press ‘Enter’. You can check that the data was matched correctly. Repeat these steps for all other variables you want to merge in. In this example, to merge in data for female LFP, use the same formula as before but change the column number to 3 (‘=VLOOKUP(A2,LFP!B:E,3,FALSE)’).
Fullscreen

Use VLOOKUP to match data across sheets

For each variable you want to merge, type the variable name in the first row of a new column. Here, we want to add three new columns, containing the labour force participation for the whole population and by sex.
To merge the data, type in ‘=VLOOKUP(’ and select the cell with the value you want to match across spreadsheets (A2 in this example, corresponding to the year 1960). Then type ‘,’ go to the tab containing the data you want to merge, and select all the columns starting from the ‘Year’ column. (It is important that ‘Year’ be the first column selected because Google Sheets will match cell A2 with the values in the first column only). Then type ‘,’ and the column number in your selection) corresponding to the variable you want to merge (in this case, column 2 corresponds to the LFP rate for the whole population). Then type ‘,’ and ‘FALSE)’ (‘FALSE’ requires the Year data to match exactly across both sheets). Then press Enter. You can check that the data was matched correctly. Repeat these steps for all other variables you want to merge in. In this example, to merge in data for female LFP, use the same formula as before but change the column number to 3 (‘=VLOOKUP(A2,LFP!B:E,3,FALSE)’).

Apply the same formula to subsequent cells
: To apply the formula to all subsequent cells, select the cell(s) in the second row and double-click the bottom-right circle.
Fullscreen

Apply the same formula to subsequent cells

To apply the formula to all subsequent cells, select the cell(s) in the second row and double-click the bottom-right circle.

The merged data
: The merged data file looks like this.
Fullscreen

The merged data

The merged data file looks like this.

Learning objectives for this part

  • Understand how the Hodrick–Prescott (HP) filter can separate trend and cyclical components of a variable.
  • Compute business cycle properties of macroeconomic variables.
  • Interpret macroeconomic (aggregate) data.
  • Relate changes on the household level to macroeconomic phenomena.

In this part, we will study three puzzling macroeconomic phenomena in the US:

To learn more about the great moderation, read Section 17.4 of The Economy 1.0.

jobless recovery
A macroeconomic phenomenon in which employment grows slowly (or not at all) during a post-recession economic recovery, thereby making the recovery ‘jobless’. One example is from the US, where recoveries have been jobless from the 1990s onwards.
great moderation
A period of low volatility in aggregate output in advanced economies between the 1980s and the 2008 financial crisis. The name was suggested by James Stock and Mark Watson, the economists, and popularized by Ben Bernanke, then chairman of the Federal Reserve.
  • the great moderation, which describes the reduction in business cycle volatility of output and hours worked
  • the productivity slowdown in the 1970s and 1980s, during which the growth rate of average labour productivity was substantially lower than in previous decades
  • jobless recovery, which describes an economic recovery during which employment grows slowly (or not at all); this phenomenon contrasts with post-recession economic recoveries before the 1990s, when employment used to recover quickly.

Even better, we will show these phenomena empirically using the dataset obtained in Part 1. You will also be using the dataset to check whether these puzzles can also be found outside of the US, by contrasting the US data with data from a country of your choice. We will then explore whether these macro puzzles can be explained by changes at the micro (household) level happening at the same time.

  1. Start by plotting female and male labour force participation over time:
  1. Separately for each country, create a line chart of the labour force participation rate by sex (one line for females, one for males) over time. Add a legend to the chart. The horizontal axis labels should be years in 10-year intervals (1960, 1970, …, 2020).
  1. Add grey vertical shaded bars for recession periods to both countries, using the NBER recession dummy for the US and the one you computed yourself for the other country. (For help adding shaded areas to a line chart see Google Sheets walk-through 7.) Your chart for the US should be similar to Figure 1.
  1. How do both labour force participation rates evolve over time? How do these rates compare between both countries?

Google Sheets walk-through 7 Adding shaded areas to a line chart

Adding shaded areas to a line chart
Fullscreen

Walk-through figure 7 Adding shaded areas to a line chart

Create a new recession variable
: The labour force participation data we will use to make the line chart are in columns D and E. Use the ‘IF’ function to create a new variable (‘Recession (shaded)’ in this example) that equals 100 if the recession dummy equals 1, and 0 otherwise. We will use this variable to create the shaded areas.
Fullscreen

Create a new recession variable

The labour force participation data we will use to make the line chart are in columns D and E. Use the IF function to create a new variable (‘Recession (shaded)’ in this example) that equals 100 if the recession dummy equals 1, and 0 otherwise. We will use this variable to create the shaded areas.

Plot a line chart
: Select the data you want to plot (in this example, the two columns containing the male and female LFP, and the new recession variable). Remember to hold the ‘Command’ or ‘Ctrl’ button to select multiple columns. Click ‘Insert’, then ‘Chart’.
Fullscreen

Plot a line chart

Select the data you want to plot (in this example, the two columns containing the male and female LFP, and the new recession variable). Remember to hold the Command or Ctrl button to select multiple columns. Click ‘Insert’, then ‘Chart’.

Change the chart type to a custom chart
: Google Sheets will plot the data using their default settings. To convert the chart into the desired type (line chart with shaded areas), click on the chart. In the chart editor menu that pops up, select ‘Setup’, then under ‘Chart type’, click ‘Custom chart’.
Fullscreen

Change the chart type to a custom chart

Google Sheets will plot the data using their default settings. To convert the chart into the desired type (line chart with shaded areas), click on the chart. In the chart editor menu that pops up, select ‘Setup’, then under ‘Chart type’, click ‘Custom chart’.

Select the correct chart types for each variable
: Google Sheets will replot the data using their default settings. To change the data series into the correct type, click ‘Customise’. In the ‘Series’ tab, you can select individual data series and change their format. Make sure the LFP data are both formatted as ‘Line’, and the recession variable is formatted as ‘Columns’.
Fullscreen

Select the correct chart types for each variable

Google Sheets will replot the data using their default settings. To change the data series into the correct type, click ‘Customise’. In the ‘Series’ tab, you can select individual data series and change their format. Make sure the LFP data are both formatted as ‘Line’, and the recession variable is formatted as ‘Columns’.

Customise the appearance of the shaded bars
: You can also change the line or fill colour and opacity using the other settings in the ‘Series’ tab. In this example we used light grey to shade the recession years.
Fullscreen

Customise the appearance of the shaded bars

You can also change the line or fill colour and opacity using the other settings in the ‘Series’ tab. In this example, we used light grey to shade the recession years.

Customise the other chart features
: Use the other tabs in the ‘Customise’ menu to add chart and axis titles, customise the legend, and change the horizontal and vertical axis settings.
Fullscreen

Customise the other chart features

Use the other tabs in the ‘Customise’ menu to add chart and axis titles, customise the legend, and change the horizontal and vertical axis settings.

Labour force participation rates in the US, by sex (1960–2021). Shaded bars indicate recession periods.
Fullscreen

Figure 1 Labour force participation rates in the US, by sex (1960–2021). Shaded bars indicate recession periods.

The first macro puzzle we will explore is the great moderation, which describes the reduction in volatility over the business cycles of several aggregate variables, such as GDP and employment. We will measure employment by average hours worked per employed person (intensive margin only) in this question.

trend component
The long-run growth component of a macroeconomic time series. Commonly obtained using the Hodrick–Prescott (HP) filter.
cyclical component
The short-run (business cycle) fluctuations around the long-run trend component of a macroeconomic time series. Commonly obtained using the Hodrick–Prescott (HP) filter.
Hodrick–Prescott (HP) filter
A mathematical tool used by macroeconomists to estimate the cyclical and trend components of time series data. Its main purpose is to fit a smooth curve (the trend) through the time series, where the trend reacts more to long-term fluctuations than to short-term fluctuations (the latter will mostly affect the cyclical component). The HP filter uses a parameter λ (‘lambda’) to dictate how sensitive this trend is to short-term fluctuations. This lambda needs to be chosen depending on the frequency of the data; popular values are λ = 6.25 for annual and λ = 16,000 for quarterly data.

We will start by showing the great moderation in the data. To do so, we need to isolate the cyclical (that is, relating to the business cycle) components of GDP and hours worked. This is a standard exercise in business cycle macroeconomics, where we deal with time series variables (such as GDP) that have a trend component (for example, the long-run growth of GDP over time) and a cyclical component that fluctuates around that trend. This cyclical component indicates the business cycles, and it is this component we need to isolate to see the great moderation in the data.

Macroeconomists typically use the Hodrick–Prescott (HP) filter to estimate the cyclical and trend components. Its main idea is to fit a smooth curve (the trend) through the time series, where the trend reacts more to long-term fluctuations than to short-term fluctuations (the latter will mostly affect the cyclical component). The HP filter uses a parameter (‘lambda’) to dictate how sensitive this trend is to short-term fluctuations (greater values of mean less sensitivity; that is, a smoother trend). This lambda value needs to be chosen depending on the frequency of the data; a popular value for annual data (like ours) is = 6.25.

  1. We will first show the great moderation in the US data and then explore a potential explanation for it.
  1. Create new variables for the natural log of annual GDP and hours worked (female, male, and all persons). Read the ‘Find out more’ box in Doing Economics Project 4.2 on why economists take logs of variables and how to do it in Google Sheets.
  1. Download this spreadsheet, which contains data from running the HP filter with lambda = 6.25 on 1) log annual GDP and 2) log hours worked per capita over time (female, male, and total separately), for both the trend and cyclical components.
  1. To show the great moderation in your data, plot line charts of the cyclical components of log GDP and of hours worked (of all persons) separately, by country (four charts in total). Add the recession periods as shaded vertical areas as before.
  1. Looking at the log GDP chart for the US, can you see the great moderation in your chart? When does it start (roughly)? Some economists argue that the great moderation in the US has ended; which year/event do you think they regard as its end? Does your other country also display a moderation period? If so, when does it start and end (roughly)?

As you can see in your charts, the cyclical components of GDP and employment seem to co-move over the business cycle. That is, in expansion periods, both output and employment grow (and in recessions, both shrink). Economists call this property of employment ‘procyclical’. Unemployment behaves the other way around; therefore it is ‘countercyclical’. To measure the degree of procyclicality and of cyclical volatility of an economic variable, economists typically compare it relative to GDP, which we will do in the next tasks.

  1. To study the cyclicality of total employment, use Google Sheets’ CORREL function to compute the correlation coefficient of the cyclical component of log hours worked of all persons with the cyclical component of log GDP (separately for each country). (For help on using Google Sheets’ CORREL function, use this Google Sheets walk-through). How can you tell that employment is a procyclical variable (at least in the US)? In which of your countries is employment more cyclical?
  1. To evaluate the cyclical volatility of employment, compute the standard deviation (s.d.) of each of the three cyclical components of log hours (all persons, female, and male) separately for each country. Then divide it by the standard deviation of the cyclical component of log GDP. Make sure to compute the standard deviations over the same time period within each country (the time periods can be different between countries, but must be the same between economic variables to compare them). Is employment more or less volatile than GDP? Which is more volatile—male or female employment? How does it differ between the US and your other country?
  1. Now we get to finding an explanation for the great moderation in the US:
    • Compute the relative standard deviations (that is, cyclical volatilities) in the same way as in Question 2e for the US, but only for female and male employment, splitting the sample into two periods: 1983 to 2007 (the last pre-global financial crisis year), and 2008 until your last year of data.
    • Create a grouped column chart with four columns, each column representing one relative standard deviation (all from US data): pre-global financial crisis female and male, and post-global financial crisis female and male.
    • Create the same column chart for your other country as well (use the same time periods as for the US).
  1. Interpret the column chart for the US. Do your results on female vs male cyclical volatility from Question 2e hold for both time periods? Taking into account the changes in labour force participation shown in Question 1, come up with a hypothesis for how these changes caused GDP and employment to become less volatile (the defining features of the great moderation). (Hint: Think about the change in the composition of total employment.)
  1. Compare the column charts between the US and your chosen country. Do the same characteristics of female employment (less volatile than male) also apply to your other country? If you have found a great moderation for your other country in Question 2d, can it be explained in the same way as for the US? (Note that you might have to adjust the years of the sample split to those you found in Question 2d for your other country.)

In the next question, we will look at the second macro puzzle: the slowdown of US productivity growth in the 1970s and 1980s. We take the average labour productivity (ALP) series as a measure of productivity. To measure sex-specific employment, we will use the labour force participation (LFP) rates instead of hours worked in this question because the hours data for the US only starts in 1979 (while the LFP data starts in 1960). Note that LFP only captures the extensive margin of employment, not the intensive margin. This again highlights the limitations of using aggregated data compared to micro data, which would contain hours worked by sex for earlier years as well.

We will use data that has been run through the HP filter to isolate the trend (that is, growth) components of the log of ALP and of log female and male LFP rates, as we are interested in the long-run trend rather than the (business) cyclical variations around that trend.

  1. We will again show the productivity slowdown first and then try to find an explanation for it:
  1. First document the productivity slowdown in the US and check if a similar trend can be found for your other country:
    • The spreadsheet you downloaded in Question 2 contains results from applying the HP filter to the log of ALP with lambda = 6.25 to calculate the trend component only. Compute the annual growth rate in percentage of that trend component.
    • Compute the annual growth rate in percentage of that trend component.
    • Plot the growth rate in a line chart over time (separate charts by country), using all available years of data.
    • Add in the recession periods as grey shaded areas.
    • Verify that productivity growth for the US has slowed down during the 1970s and 1980s.
    • Repeat the steps above for your chosen country. Was there a similar slowdown in your chosen country?
  1. Now add employment (measured by LFP rate) to both line charts:
    • The spreadsheet you downloaded in Question 2 contains results from applying the HP filter to the log of the female and male LFP rates with lambda = 6.25. (The filter was applied only to observations from 1963 onwards to exclude the jumps in LFP rates due to changes in methodology.)
    • Compute the annual growth rates in percentage of both trend components.
    • Add both growth rates to the two charts from Question 3a and include a legend.
    • What is the relationship of these two variables (female and male LFP trends) to the ALP trend component?
  1. Based on the results from Question 3b, come up with an explanation for the productivity slowdown, taking into account how labour force participation (and therefore the composition of the labour force) changed during the time of the productivity slowdown in the 1970s and 1980s. Why would this change in composition initially lower productivity growth? Why might this change in composition increase productivity growth in the longer term?
  1. If there was a productivity slowdown in the other country, could it be explained in the same way as for the US?
  1. The final macro puzzle we will investigate is that of jobless recoveries. As we want to consider the 1970s recessions, we again measure employment through the log of the LFP rate (extensive margin only).
  1. For the US data only, create a line chart of the log LFP rate (of all persons) over time, starting in 1963 (because of the change in methodology). Add the recession periods as shaded areas.
  1. Explain how the line chart shows ‘jobless recoveries’. Approximately when did recoveries start being jobless?
  1. Create two more line charts in the same way as in Question 4a: 1) log of female LFP rate; 2) log of male LFP rate.
  1. How do recoveries after recessions differ between female and male LFP? Considering the changes in labour force participation and labour force composition, what could be an explanation for why recoveries in the US have become jobless in the 1990s, 2000s, and 2010s?
  1. Redo Questions 4a and 4c for your other country. Do you also find jobless recoveries? Does the US explanation for jobless recoveries apply to that country too?
  1. Read the introduction (up to ‘The second part of the paper…’ on p. 3) and conclusion of Stefania Albanesi’s research paper ‘Changing Business Cycles: The Role of Women’s Employment’, on which this project is based. Are your results for the US in line with her findings? What have you learned about the micro-to-macro link between growing female labour supply on the household side and the three macroeconomic puzzles?