Extra Empirical Project: Female labour supply and the macroeconomy Working in Excel

Part 1: Collecting and preparing the data

Learning objectives for this part

  • download and clean customized excerpts from a database
  • merge different datasets
  • estimate recession dates from GDP data.

In the questions below, we will collect and merge various datasets. We will combine labour force statistics (hours worked and LFP) with macroeconomic statistics (GDP and productivity), so that we can assess the relationship between those variables. The goal is to end up with a single dataset containing all the necessary data for the analysis in Part 2.

Before we start, create a subfolder called ‘data’ in your working directory. We will store all datasets in that subfolder.

  1. Download annual GDP data for the US and one other country of your choice from the World Bank statistical database:
  • By clicking the link, you have already selected the correct series, ‘GDP (constant 2015 US$)’. We want to keep prices constant, so that GDP is in real terms (taking out any variation due to changes in the price level). We do not need to worry about fixing exchange rates and purchasing power parity (PPP) here, because we will not compare GDP between countries.
  • On the left under ‘Country’, first unselect all countries (this can be done in one click on the ‘X’ button above the country list), and then select the United States and your other country.
  • Under ‘Time’, select all years by clicking on the checkbox icon (to the left of the ‘Unselect all’ button).
  • Then click on ‘Apply Changes’ in the centre of the screen.
  • Download the data as a CSV file by clicking on ‘Download options’ in the top right (above the data table) and then clicking on ‘Advanced options’.
  • Select ‘CSV’ in the header (the default is ‘Excel’) and change the ‘Data format’ from ‘Table’ to ‘List’.
  • Click ‘Download’, which will download a ZIP file. Extract the file and delete the CSV file ending in ‘Metadata’. Rename the CSV file ending in ‘Data’ as ‘GDP_annual’ and save it in your ‘data’ subfolder.
  1. Download the following data for the US and the country you chose in Question 1 from the OECD statistical database:
  1. Labour force participation (LFP) rates by sex: You can find the LFP rates under ‘Labour > Labour Force Statistics > LFS by sex and age > LFS by sex and age - indicators’, then click on ‘LFS by sex and age indicators’. You can also find the data series directly by searching for ‘LFS by sex and age indicators’ in the search bar on the top left.
    • Click on the ‘Country’ header in the first column, which should open the customization pop-up.
    • Unselect all countries (can be done in one click via ‘unselect all’ in the top-right corner) and select the United States and one country of your choice.
    • Click on ‘Time & Frequency’ and select from 1960 (leave the upper limit at ‘latest available data’).
    • Click on ‘Sex’ and make sure that all three options are selected.
    • Click on ‘Age’ and unselect all except ‘Total’.
    • Click on ‘Series’ and unselect all except ‘Labour force participation rate’.
    • Click ‘View Data’. The table should now only show the data you selected. To download the data, click on ‘Export > Text file (CSV) > Default format > Download’. After the file has been downloaded, rename it as ‘LFP_rates’, and move it to the ‘data’ subfolder in your working directory.

You can learn more about how GDP is measured and made comparable across countries and time in Section 1.2 of The Economy 2.0: Microeconomics.

  1. Average labour productivity (ALP): One way that the OECD measures labour productivity is by GDP per hour worked, which you can find under ‘Productivity > Productivity and ULC – Annual, Total Economy > Level of GDP per capita and productivity’, then click on ‘Level of GDP per capita and productivity’.
    • Download the data for the US and your other country, from 1970 to the latest available year, selecting ‘GDP per hour worked’ under ‘Subject’.
    • Under ‘Measure’, select only ‘USD, constant prices, 2015 PPPs’. This measure makes sure the productivity (GDP per hour worked) measures are comparable, regardless of country-dependent exchange and inflation rates.
    • Download the data as a CSV file in the same way as before, and save it as ‘labour_productivities’ in your ‘data’ subfolder.
  1. Average hours worked per week by sex: The OECD only has hours worked by sex for the main job, so keep in mind that this data disregards hours worked outside of the main job. Also, as these are average hours per worker, they only capture the intensive margin of labour supply. You can find the data series under ‘Labour > Labour Force Statistics > Hours worked > Average usual weekly hours worked on the main job’.

    Do the following customizations for the US and your chosen country:
    • Select data from 1979 to the latest available year.
    • Make sure that all three options for ‘Sex’ are selected.
    • Select only ‘Total’ for ‘Age’.
    • For ‘Employment’, select only ‘Dependent employment’. (The data for the US only covers those in dependent employment, so to keep things comparable we will restrict the data to them.)
    • Under ‘Job type’, select only ‘Total declared employment’ (includes both full and part-time workers).
    • Download the data as a CSV file and save it as ‘hours_worked’ in your ‘data’ subfolder.
  1. Quarterly GDP: This data can be found under ‘National Accounts > Quarterly National Accounts > Quarterly National Accounts’, then click on ‘Quarterly National Accounts’ (the first table).
    • Select your two countries, starting from 1947Q1 (you might have to switch from ‘select latest data’ to ‘select date range’ first).
    • Under ‘Subject’, select only ‘Gross domestic product - expenditure approach’.
    • Under ‘Measure’, select only ‘VPVOBARSA: US dollars, volume estimates, fixed PPPs, OECD reference year, annual levels, seasonally adjusted’ (you can find it under ‘VOL – VOLUMES’).
    • Download the data as a CSV file and save it as ‘GDP_quarterly’ in your ‘data’ subfolder.

Find out more Going from micro to macro data

Most macroeconomic researchers nowadays use micro data (for example, at the household or firm level) to construct aggregate variables like the ones you have downloaded from the OECD database. Albanesi, for example, obtains her data series from the Current Population Survey (CPS), a publicly available household survey from the US. You have already seen some limitations of using aggregate data like the OECD’s: you can only use the series they have made available. The OECD does not provide data on hours worked by sex outside of the main job, nor on labour productivity by sex. However, these could easily be computed from micro data. For simplicity, we will not be working with micro data in this Doing Economics project. However, the OECD’s data is based on country-specific micro datasets which have been harmonized and aggregated by the OECD, so by using their data we are indirectly working with micro data as well.

If you are interested in how the NBER defines and dates recessions and expansions, you can read their FAQs.

dummy variable (indicator variable)
A variable that takes the value 1 if a certain condition is met, and 0 otherwise.
  1. We want to link the variables obtained in Question 1 with data on business cycles, for which we will use the US National Bureau of Economic Research (NBER)-based US recession dates from FRED. (Clicking on the link will download the data as a CSV file called ‘USRECQ.csv’; move it into the ‘data’ subfolder in your working directory.) This file contains a quarterly dummy variable (that is, 1 for recession periods and 0 otherwise), which we will use to distinguish between expansion and recession periods in the analysis.
  1. Open the ‘USRECQ.csv’ file in Excel and save it as an Excel file with the same filename.
  1. We want to create an annual recession dummy because the other data is at the annual level (and we want to have consistent date variables when combining the various datasets). To do so, first create a ‘Year’ variable (of numeric type, not character) based on the ‘DATE’ variable (each year will be repeated four times, as the data is quarterly). Then create an annual dummy variable ‘Recession’ that takes the value 1 for all observations within a year if one or more quarters of that year featured a recession (‘USRECQ’ = 1). (For help, see Excel walk-through 1 on creating consistent date variables and Excel walk-through 2 on creating a recession dummy variable.)
  1. Remove (delete) all variables except ‘Year’ and ‘Recession’. Remove duplicate values in ‘Year’ (as we still have every year four times due to the initial data being on the quarterly level). (Hint: Use the ‘Paste Special’ and ‘Values’ options to preserve the values of these variables even when the raw data they are based on is deleted.) Create a ‘Country’ variable that takes the value ‘United States’ for all observations. (For help see Excel walk-through 3 on removing duplicate values.)
  1. After you have made these changes, save your data file.

Excel walk-through 1 Creating consistent date variables

Creating consistent date variables
Fullscreen

Walk-through figure 1 Creating consistent date variables

Create a new column for the year data
: Date formats in cells C2:C182 are in form yyyy-mm-dd and date formats in cells C183 (January 1900) onwards are in form dd/mm/yyyy. To create consistent date variables, first create a new column headed ‘YEAR’ (column C in the sheet shown).
Fullscreen

Create a new column for the year data

Date formats in cells C2:C182 are in form yyyy-mm-dd and date formats in cells C183 (January 1900) onwards are in form dd/mm/yyyy. To create consistent date variables, first create a new column headed ‘YEAR’ (column C in the sheet shown).

Use the LEFT function for dates in yyyy-mm-dd format
: When date variables are in yyyy-mm-dd format, to fill in the ‘year’ column, use the LEFT function. In the first cell (C2), enter in the formula =LEFT(A2,4). This formula will extract the first four characters from the ‘DATE’ variable, which is the year. Click and drag the box in the bottom-right corner of the cell to apply the same formula to cells C3 to C182.
Fullscreen

Use the LEFT function for dates in yyyy-mm-dd format

When date variables are in yyyy-mm-dd format, to fill in the ‘year’ column, use the LEFT function. In the first cell (C2), enter in the formula =LEFT(A2,4). This formula will extract the first four characters from the ‘DATE’ variable, which is the year. Click and drag the box in the bottom-right corner of the cell to apply the same formula to cells C3 to C182.

Use the TEXT function for dates in dd/mm/yyyy format
: When date is in format dd/mm/yyyy and formatted as dates, you can extract the year into a new column using the TEXT function. In cell C183, enter in the formula =TEXT(A183,“yyyy”). Click and drag (or double-click) the box in the bottom-right corner of the cell to apply the same formula to all remaining cells in column C.
Fullscreen

Use the TEXT function for dates in dd/mm/yyyy format

When date is in format dd/mm/yyyy and formatted as dates, you can extract the year into a new column using the TEXT function. In cell C183, enter in the formula =TEXT(A183,“yyyy”). Click and drag (or double-click) the box in the bottom-right corner of the cell to apply the same formula to all remaining cells in column C.

Excel walk-through 2 Creating a recession dummy variable

Creating a recession dummy variable
Fullscreen

Walk-through figure 2 Creating a recession dummy variable

Label the first year as a recession year
: Create a new column called RECESSION (column D in the sheet shown). There is only one quarter recorded for 1854, so we will record that as a recession. In the first column, enter in the formula =IF(B2=1,1,0).
Fullscreen

Label the first year as a recession year

Create a new column called RECESSION (column D in the sheet shown). There is only one quarter recorded for 1854, so we will record that as a recession. In the first column, enter in the formula =IF(B2=1,1,0).

Use the SUM function to identify recession years
: For all remaining years, we have 4 quarters of data. In cell D3 enter in the formula =IF(SUM(B3:B6)>0,1,0). The SUM function calculates the sum of the values in the selected cells, and if this sum is greater than 1, in the case that there is at least one recession that year, it will return the value 1.
Fullscreen

Use the SUM function to identify recession years

For all remaining years, we have four quarters of data. In cell D3, enter in the formula =IF(SUM(B3:B6)>0,1,0). The SUM function calculates the sum of the values in the selected cells, and if this sum is greater than 1, in the case that there is at least one recession that year, it will return the value 1.

Duplicate the same value for the next three cells
: Next, type in the function ‘=+D3’ in the next three cells. This formula ensures that the four cells corresponding to the same year all return the same value. For all remaining years, copy the functions in these four cells and paste them into the cells corresponding to the first quarter of that year.
Fullscreen

Duplicate the same value for the next three cells

Next, type in the function ‘=+D3’ in the next three cells. This formula ensures that the four cells corresponding to the same year all return the same value. For all remaining years, copy the functions in these four cells and paste them into the cells corresponding to the first quarter of that year.

Excel walk-through 3 Removing duplicate values

Removing duplicate values
Fullscreen

Walk-through figure 3 Removing duplicate values

Use the ‘Remove Duplicates’ feature in Excel
: In the sheet shown, column A contains years and column B contains the recession variable. Select the ‘YEAR’ and ‘RECESSION’ columns. Under the ‘Data’ tab, select ‘Remove Duplicates’ (this icon may appear on its own without a drop-down menu, depending on your version of Excel). Make sure that the option ‘My list has headers’ and the variable ‘YEAR’ are selected. Then click ‘OK’.
Fullscreen

Use the ‘Remove Duplicates’ feature in Excel

In the sheet shown, column A contains years and column B contains the recession variable. Select the ‘YEAR’ and ‘RECESSION’ columns. Under the ‘Data’ tab, select ‘Remove Duplicates’ (this icon may appear on its own without a drop-down menu, depending on your version of Excel). Make sure that the option ‘My list has headers’ and the variable ‘YEAR’ are selected. Then click ‘OK’.

The data with duplicate years removed
: An alert box may pop up to notify you of how many duplicates there are in the data. Click ‘OK’ to close the pop-up box. There should now only be one row for each year.
Fullscreen

The data with duplicate years removed

An alert box may pop up to notify you of how many duplicates there are in the data. Click ‘OK’ to close the pop-up box. There should now only be one row for each year.

Recession timing data is not available for other (non-US) countries, so we have to create our own based on the NBER’s definition of a recession (for comparability). The NBER defines a recession as ‘a period when output is declining. It is over once the economy begins to grow again.’

‘A period’ of declining output is usually interpreted as at least two consecutive quarters of economic contraction (negative GDP growth).

  1. In all the steps below, replace ‘YOURCOUNTRYNAME’ with your chosen country’s name or a suitable abbreviation.
  1. Add the quarterly GDP dataset ‘GDP_quarterly.xls’ as a new sheet in your Excel file from Question 3, calling it ‘GDP_quarterly’ (for help on adding datasets as separate sheets, see Excel walk-through 4). Use Excel’s filter function to remove all observations for the US (we will be using the NBER’s recession dummy for the US). In a new column, compute quarterly GDP growth rates in percentage points for your non-US country only (for example, a value of 2 if GDP grew from $100 in one quarter to $102 in the next).
  1. Create a recession dummy named ‘Recession_YOURCOUNTRYNAME_quarterly’ that is 1 during recession periods (2 or more consecutive quarters of negative GDP growth) and 0 otherwise. For all such periods, set the dummy to 1 in all quarters (starting from the first quarter of negative GDP growth). A recession is over (the dummy takes the value of 0) as soon as GDP growth turns positive again. For periods of only one quarter of negative GDP growth, the dummy should be set to 0 as well. (Hint: Use Excel’s IF and AND function in the same formula to set the value of this variable equal to 1 only if both conditions are met; for help see Excel walk-through 5.)
  1. As for the US recession data above, create a variable called ‘Year’ based on the ‘TIME’ variable (each year will be repeated four times as the data is quarterly). (Hint: Use Excel’s ‘LEFT’ function.) Then create an annual dummy variable ‘Recession_YOURCOUNTRYNAME’ that takes the value 1 for all observations within a year if one or more quarters of that year featured a recession (‘Recession_YOURCOUNTRYNAME_quarterly’ = 1). Remove duplicate values in ‘Year’ (as we still have every year listed four times due to the initial data being on the quarterly level).
  1. Keep only the ‘Country’, ‘Year’, and ‘Recession_YOURCOUNTRYNAME’ variables (remove all others).
  1. After you have made these changes, rename this sheet as ‘YOURCOUNTRYNAME_recessions’.

Excel walk-through 4 Adding datasets as separate sheets in the same Excel file

Adding datasets as separate tabs in a spreadsheet
Fullscreen

Walk-through figure 4 Adding datasets as separate tabs in a spreadsheet

Open Power Query in Excel
: Open a new sheet in Excel. Under the ‘Data’ tab, select ‘Get Data (Power Query)’. (Depending on your version of Excel, this option may appear as ‘Get Data’, but the icon image is the same.)
Fullscreen

Open Power Query in Excel

Open a new sheet in Excel. Under the ‘Data’ tab, select ‘Get Data (Power Query)’. (Depending on your version of Excel, this option may appear as ‘Get Data’, but the icon image is the same.)

Import data from a text or CSV file
: In the pop-up window that appears, select ‘Text/CSV’. (Depending on your version of Excel, the option ‘From Text/CSV’ may already appear in the ribbon beside the ‘Get Data’ icon).
Fullscreen

Import data from a text or CSV file

In the pop-up window that appears, select ‘Text/CSV’. (Depending on your version of Excel, the option ‘From Text/CSV’ may already appear in the ribbon beside the ‘Get Data’ icon).

Find the data to import
: Click ‘Browse’. In the pop-up window that appears, find the file you want to import and select it, then click the ‘Next’ button.
Fullscreen

Find the data to import

Click ‘Browse’. In the pop-up window that appears, find the file you want to import and select it, then click the ‘Next’ button.

Preview and load the data
: A preview of the file will appear in the window, so you can check that the data is organized correctly. Click the ‘Load’ button to import it into Excel.
Fullscreen

Preview and load the data

A preview of the file will appear in the window, so you can check that the data is organized correctly. Click the ‘Load’ button to import it into Excel.

The imported data
: This image shows how the imported file may appear in Excel.
Fullscreen

The imported data

This image shows how the imported file may appear in Excel.

Excel walk-through 5 Using IF and AND to make dummy variables that satisfy multiple conditions

Using IF and AND to make dummy variables that satisfy multiple conditions
Fullscreen

Walk-through figure 5 Using IF and AND to make dummy variables that satisfy multiple conditions

Calculate GDP growth across consecutive quarters
: In the spreadsheet shown, columns C–H, J–P, and R–S have been hidden. In the third row of a new column (T in this example), enter the formula ‘=((Q3-Q2)/Q2)*100’ to calculate GDP growth as the percentage change in GDP over consecutive quarters. Double-click the box at the bottom-right corner to apply this formula to all subsequent cells in this column.
Fullscreen

Calculate GDP growth across consecutive quarters

In the spreadsheet shown, columns C–H, J–P, and R–S have been hidden. In the third row of a new column (T in this example), enter the formula ‘=((Q3-Q2)/Q2)*100’ to calculate GDP growth as the percentage change in GDP over consecutive quarters. Double-click the box at the bottom-right corner to apply this formula to all subsequent cells in this column.

Identify recession periods
: The recession variable starts in the fourth row, because it depends on GDP growth in the current and previous quarter. In the fourth row of a new column (U in this example), enter the formula ‘=IF(AND(T3<0,T4<0),1,0)’. This formula sets the value in the cell equal to 1 if all conditions in the ‘AND’ brackets are met (negative values for cells T3 and T4), and 0 otherwise. Double-click the box at the bottom-right corner to apply this formula to all subsequent cells in this column.
Fullscreen

Identify recession periods

The recession variable starts in the fourth row, because it depends on GDP growth in the current and previous quarter. In the fourth row of a new column (U in this example), enter the formula ‘=IF(AND(T3<0,T4<0),1,0)’. This formula sets the value in the cell equal to 1 if all conditions in the ‘AND’ brackets are met (negative values for cells T3 and T4), and 0 otherwise. Double-click the box at the bottom-right corner to apply this formula to all subsequent cells in this column.

Some of the datasets we currently have (‘hours_worked’ and ‘LFP_rates’) are in ‘long’ format, meaning, for example, female, male, and total hours worked are in the same column. It is best practice to work with data in the ‘wide’ format, where each variable (such as female hours worked) is represented by its own column. This simplifies working with the data, such as plotting female hours worked or computing its correlation with male hours worked.

  1. Clean the four remaining datasets (‘GDP_annual’, ‘hours_worked’, ‘labour_productivities’, and ‘LFP_rates’), following the steps described below.
  1. Import ‘GDP_annual’ and ‘labour_productivities’ into your current Excel file as new sheets, using their filenames as sheet names.
  1. In the ‘GDP_annual’ sheet, make sure the ‘Value’ variable is in ‘Number’ format (rather than ‘General’). Remove all observations with missing values for ‘Value’. Rename the ‘Value’ variable as ‘GDP’, the ‘Country.Name’ variable as ‘Country’, and the ‘Time’ variable as ‘Year’. Remove all columns except ‘Country’, ‘Year’, and ‘GDP’.
  1. In the ‘labour_productivities’ sheet, rename the ‘Time’ variable as ‘Year’ and the ‘Value’ variable as ‘labour_productivity’. Remove all columns except ‘Country’, ‘Year’, and ‘labour_productivity’.
  1. Import ‘hours_worked’ and ‘LFP_rates’ into your current Excel file as new sheets, keeping their filenames. In each dataset, remove all variables except ‘Country’, ‘Sex’, ‘Time’ (or ‘Year’), and ‘Values’. Then rearrange (‘reshape’) the data so that the values for ‘all persons’, ‘female’, and ‘male’ are in separate columns, rather than a single column. For example, in ‘hours_worked’, the hours worked by all persons, females, and males should be their own variables (columns), rather than sharing the ‘SEX’, ‘Sex’, and ‘Value’ columns. You may want to create a new sheet in your Excel file to store the rearranged data.
  1. Using the reshaped data on ‘hours_worked’ and ‘LFP_rates’, name the new variables according to the name of the data series (for example, to ‘hours_worked_all’, ‘hours_worked_female’, or ‘LFP_rate_female’). Rename the ‘Time’ variable as ‘Year’. Remove all columns except ‘Country’, ‘Year’, and the newly created variables.
  1. Copy the ‘GDP_annual’ sheet as a new sheet in the same Excel file. In this sheet, merge all six datasets by their ‘Country’ and ‘Year’ variables. The combined dataset should have 11 variables. (For help on merging data, see Excel walk-through 6.) Once you have merged the data:
  1. Combine the recession variable (US data and data for your chosen country) into a single variable. Replace the (missing) values of the ‘Recession’ variable (containing the US recession dummy) with the values of the ‘Recession_YOURCOUNTRYNAME’ variable. Then delete the ‘Recession_YOURCOUNTRYNAME’ variable and sort the data by ‘Country’ and ‘Year’ (in that order).
  1. Delete all observations before 1960 (for both countries), as the OECD data is only available from 1960 onwards (or later), while the US recession period data goes back to 1854.
  1. Save the sheet you created in this question as a new Excel file called ‘dataset_main’. You will work with this file in Part 2.

Excel walk-through 6 Merging data by matching across columns

Merging data by matching across columns
Fullscreen

Walk-through figure 6 Merging data by matching across columns

Open Power Query in Excel
: Open a new file (worksheet) in Excel. In the Data tab, select the ‘Get Data’ icon, then ‘From file’, then ‘From Excel workbook’. (In some versions of Excel, this icon is called ‘Get Data (Power Query)’. Clicking the icon will open a pop-up window, where you can then select the ‘Excel workbook’ option.)
Fullscreen

Open Power Query in Excel

Open a new file (worksheet) in Excel. In the Data tab, select the ‘Get Data’ icon, then ‘From file’, then ‘From Excel workbook’. (In some versions of Excel, this icon is called ‘Get Data (Power Query)’. Clicking the icon will open a pop-up window, where you can then select the ‘Excel workbook’ option.)

Select the data files that need merging
: In the pop-up window that appears, click ‘Browse’, then find and select the previous Excel file that has all of the cleaned data in separate tabs. When the ‘Navigator’ pop-up appears, select ‘select multiple items’ (in the left-side menu) and select all of the relevant sheets from the original file that will be merged together. (In some versions of Excel, this pop-up is called ‘Choose data’.) Make sure to select the data sheets rather than the Table versions. Then click ‘Transform data’.
Fullscreen

Select the data files that need merging

In the pop-up window that appears, click ‘Browse’, then find and select the previous Excel file that has all of the cleaned data in separate tabs. When the ‘Navigator’ pop-up appears, select ‘select multiple items’ (in the left-side menu) and select all of the relevant sheets from the original file that will be merged together. (In some versions of Excel, this pop-up is called ‘Choose data’.) Make sure to select the data sheets rather than the Table versions. Then click ‘Transform data’.

Select the sheet that all others will be merged into
: The Power Query Editor will now open. In the left-side menu, select the sheet you would like to use as your ‘main sheet’ in which all the others will be merged (‘GDP_annual’ in this example). Then select ‘Merge queries’ from the menu at the top (in some versions of Excel, this option may appear in the top menu as ‘Combine’, then ‘Merge queries’, as shown).
Fullscreen

Select the sheet that all others will be merged into

The Power Query Editor will now open. In the left-side menu, select the sheet you would like to use as your ‘main sheet’ in which all the others will be merged (‘GDP_annual’ in this example). Then select ‘Merge queries’ from the menu at the top (in some versions of Excel, this option may appear in the top menu as ‘Combine’, then ‘Merge queries’, as shown).

Select the columns in each spreadsheet that Excel will use to merge the data
: In the drop-down menu, select the first sheet you would like to merge with your main sheet (in this example, we chose ‘GDP_quarterly’). Then, for both tables, select the column that you want to merge by (‘Year’ in this case), as shown in the example. Click ‘OK’ to exit the menu.
Fullscreen

Select the columns in each spreadsheet that Excel will use to merge the data

In the drop-down menu, select the first sheet you would like to merge with your main sheet (in this example, we chose ‘GDP_quarterly’). Then, for both tables, select the column that you want to merge by (‘Year’ in this case), as shown in the example. Click ‘OK’ to exit the menu.

Select the variables in the new sheet to merge into the base sheet
: You will be taken back to the main sheet, which will have an extra column added to the end. Select the drop-down menu in the column header corresponding to the sheet that you selected in Step 7, and select the variables from the new sheet you would like to add to the main sheet, then click ‘OK’. After you click ‘OK’, you should see your new updated main sheet with the new column(s) you have just merged in. Repeat this process for all other sheets you wish to merge into the base sheet. When completed, select ‘Close and Load’ in the top left and you will see the final merged spreadsheet, along with all the individual sheets you have merged, all as separate tabs. From this sheet, you can sort the data according to one or more variables by selecting the header of the variable you wish to sort by.
Fullscreen

Select the variables in the new sheet to merge into the base sheet

You will be taken back to the main sheet, which will have an extra column added to the end. Select the drop-down menu in the column header corresponding to the sheet that you selected in Step 7, and select the variables from the new sheet you would like to add to the main sheet, then click ‘OK’. After you click ‘OK’, you should see your new updated main sheet with the new column(s) you have just merged in. Repeat this process for all other sheets you wish to merge into the base sheet. When completed, select ‘Close and Load’ in the top left and you will see the final merged spreadsheet, along with all the individual sheets you have merged, all as separate tabs. From this sheet, you can sort the data according to one or more variables by selecting the header of the variable you wish to sort by.

Learning objectives for this part

  • use the Hodrick–Prescott (HP) filter to separate trend and cyclical components
  • compute business cycle properties of macroeconomic variables
  • interpret macroeconomic (aggregate) data
  • relate changes on the household level to macroeconomic phenomena.

To learn more about the great moderation, read Section 17.4 of The Economy 1.0.

jobless recovery
A macroeconomic phenomenon in which employment grows slowly (or not at all) during a post-recession economic recovery, thereby making the recovery ‘jobless’. One example is from the US, where recoveries have been jobless from the 1990s onwards.
great moderation
A period of low volatility in aggregate output in advanced economies between the 1980s and the 2008 financial crisis. The name was suggested by James Stock and Mark Watson, the economists, and popularized by Ben Bernanke, then chairman of the Federal Reserve.

In this part, we will study three puzzling macroeconomic phenomena in the US:

  • the great moderation, which describes the reduction in business cycle volatility of output and hours worked
  • the productivity slowdown in the 1970s and 1980s, during which the growth rate of average labour productivity was substantially lower than in previous decades
  • jobless recovery, which describes an economic recovery during which employment grows slowly (or not at all); this phenomenon contrasts with post-recession economic recoveries before the 1990s, when employment used to recover quickly.

Even better, we will show these phenomena empirically using the dataset obtained in Part 1. You will also be using the dataset to check whether these puzzles can also be found outside of the US, by contrasting the US data with data from a country of your choice. We will then explore whether these macro puzzles can be explained by changes at the micro (household) level happening at the same time.

  1. Start by plotting female and male labour force participation over time:
  1. Separately for each country, create a line chart of the labour force participation rate by sex (one line for females, one for males) over time. Add a legend to the chart. The horizontal axis labels should be years in 10-year intervals (1960, 1970, …, 2020).
  1. Add grey vertical shaded bars for recession periods to both countries, using the NBER recession dummy for the US and the one you computed yourself for the other country. (For help adding shaded areas to a line chart see Excel walk-through 7.) Your chart for the US should be similar to Figure 1.
  1. How do both labour force participation rates evolve over time? How do these rates compare between both countries?

Excel walk-through 7 Adding shaded areas to a line chart

Adding shaded areas to a line chart
Fullscreen

Walk-through figure 7 Adding shaded areas to a line chart

Create a new recession variable and plot a line chart: For Mac users only
: Use the IF function to create a new variable that equals 100 if the recession dummy equals 1, and 0 otherwise. We will use this variable to create the shaded areas.
Fullscreen

Create a new recession variable and plot a line chart: For Mac users only

Use the IF function to create a new variable that equals 100 if the recession dummy equals 1, and 0 otherwise. We will use this variable to create the shaded areas.

Add the recession data to the line chart: For all users
: Create the line chart by selecting the male and female labour participation variables, then Insert > Line chart > 2D line.
Fullscreen

Add the recession data to the line chart: For all users

Create the line chart by selecting the male and female labour participation variables, then Insert > Line chart > 2D line.

Add the recession data to the line chart: For Mac users
: Right-click the chart created, and select ‘Select Data’. In the pop-up window that appears, select ‘+’ and select the new recession variable you created. Then click ‘OK’.
Fullscreen

Add the recession data to the line chart: For Mac users

Right-click the chart created, and select ‘Select Data’. In the pop-up window that appears, select ‘+’ and select the new recession variable you created. Then click ‘OK’.

Add the recession data to the line chart: For Windows users
: Right-click the chart and select ‘Select Data’. In the pop-up window that appears, select ‘Add’ and in the dialog box, enter the desired data series for the shaded areas (column containing the dummy for recession years). Once you have correctly added in the new series, click ‘OK’.
Fullscreen

Add the recession data to the line chart: For Windows users

Right-click the chart and select ‘Select Data’. In the pop-up window that appears, select ‘Add’ and in the dialog box, enter the desired data series for the shaded areas (column containing the dummy for recession years). Once you have correctly added in the new series, click ‘OK’.

Convert the recession variable into a column chart: For Mac users
: The new variable you added will appear as a line. To convert it to shaded areas, select that line. In the ‘Insert’ tab, select the column chart icon and choose the ‘Stacked Column’ option.
Fullscreen

Convert the recession variable into a column chart: For Mac users

The new variable you added will appear as a line. To convert it to shaded areas, select that line. In the ‘Insert’ tab, select the column chart icon and choose the ‘Stacked Column’ option.

Convert the recession variable into a column chart: For Windows users
: To convert the recession years to shaded areas, right-click the chart, and select ‘Change Chart Type’. Select the ‘Combo’ chart type. A pop-up window will appear. By using the drop-down menus next to each series name, make sure that the data you want as a line has the ‘Line’ chart type, and the data you would like as your vertical ‘shaded areas’ has the ‘100% stacked column’ chart type. Click ‘OK’. Your line chart should now have vertical shaded areas.
Fullscreen

Convert the recession variable into a column chart: For Windows users

To convert the recession years to shaded areas, right-click the chart, and select ‘Change Chart Type’. Select the ‘Combo’ chart type. A pop-up window will appear. By using the drop-down menus next to each series name, make sure that the data you want as a line has the ‘Line’ chart type, and the data you would like as your vertical ‘shaded areas’ has the ‘100% stacked column’ chart type. Click ‘OK’. Your line chart should now have vertical shaded areas.

Customise the recession variable: Optional, for Mac users only
: You can adjust the vertical axis range so that the shaded bars cover the entire height of the chart. Double-click on the vertical axis and in the menu that appears, select the column chart icon. In the ‘Axis Options’ drop-down menu, under ‘Bounds’, change the ‘Maximum’ to 100, as shown.
Fullscreen

Customise the recession variable: Optional, for Mac users only

You can adjust the vertical axis range so that the shaded bars cover the entire height of the chart. Double-click on the vertical axis and in the menu that appears, select the column chart icon. In the ‘Axis Options’ drop-down menu, under ‘Bounds’, change the ‘Maximum’ to 100, as shown.

Labour force participation rates in the US, by sex (1960–2021). Shaded bars indicate recession periods.
Fullscreen

Figure 1 Labour force participation rates in the US, by sex (1960–2021). Shaded bars indicate recession periods.

The first macro puzzle we will explore is the great moderation, which describes the reduction in volatility over the business cycles of several aggregate variables, such as GDP and employment. We will measure employment by average hours worked per employed person (intensive margin only) in this question.

trend component
The long-run growth component of a macroeconomic time series. Commonly obtained using the Hodrick–Prescott (HP) filter.
cyclical component
The short-run (business cycle) fluctuations around the long-run trend component of a macroeconomic time series. Commonly obtained using the Hodrick–Prescott (HP) filter.
Hodrick–Prescott (HP) filter
A mathematical tool used by macroeconomists to estimate the cyclical and trend components of time series data. Its main purpose is to fit a smooth curve (the trend) through the time series, where the trend reacts more to long-term fluctuations than to short-term fluctuations (the latter will mostly affect the cyclical component). The HP filter uses a parameter λ (‘lambda’) to dictate how sensitive this trend is to short-term fluctuations. This lambda needs to be chosen depending on the frequency of the data; popular values are λ = 6.25 for annual and λ = 16,000 for quarterly data.

We will start by showing the great moderation in the data. To do so, we need to isolate the cyclical (that is, relating to the business cycle) components of GDP and hours worked. This is a standard exercise in business cycle macroeconomics, where we deal with time series variables (such as GDP) that have a trend component (for example, the long-run growth of GDP over time) and a cyclical component that fluctuates around that trend. This cyclical component indicates the business cycles, and it is this component we need to isolate to see the great moderation in the data.

Macroeconomists typically use the Hodrick–Prescott (HP) filter to estimate the cyclical and trend components. Its main idea is to fit a smooth curve (the trend) through the time series, where the trend reacts more to long-term fluctuations than to short-term fluctuations (the latter will mostly affect the cyclical component). The HP filter uses a parameter (‘lambda’) to dictate how sensitive this trend is to short-term fluctuations (greater values of mean less sensitivity; that is, a smoother trend). This lambda value needs to be chosen depending on the frequency of the data; a popular value for annual data (like ours) is = 6.25.

  1. We will first show the great moderation in the US data and then explore a potential explanation for it.
  1. Create new variables for the natural log of annual GDP and hours worked (female, male, and all persons). Read the ‘Find out more’ box in Doing Economics Project 4.2 on why economists take logs of variables and how to do it in Excel.
  1. For both countries, run the HP filter with lambda = 6.25 on 1) log annual GDP and 2) log hours worked per capita over time (female, male, and total, separately) to find the trend and cyclical components. Save each component as a separate variable. (For help downloading and using the HP filter, see Excel walk-through 8.)
  1. To show the great moderation in your data, plot line charts of the cyclical components of log GDP and of hours worked (of all persons) separately, by country (four charts in total). Add the recession periods as shaded vertical areas as before.
  1. Looking at the log GDP chart for the US, can you see the great moderation in your chart? When does it start (roughly)? Some economists argue that the great moderation in the US has ended; which year/event do you think they regard as its end? Does your other country also display a moderation period? If so, when does it start and end (roughly)?

Excel walk-through 8 Download and use the HP filter in Excel

Download and use the HP filter in Excel
Fullscreen

Walk-through figure 8 Download and use the HP filter in Excel

Download the HP filter add-in
: Go to the website https://web-reg.de/webreg-hodrick-prescott-filter/. Select the button ‘Download “[web:reg] HP Filter”’ to download the HP filter add-in for Excel as a ZIP file. Open the ZIP file and double-click on the file ‘HPFilter.xla’. If a pop-up window appears, select ‘Enable Macros’.
Fullscreen

Download the HP filter add-in

Go to the website https://web-reg.de/webreg-hodrick-prescott-filter/. Select the button ‘Download “[web:reg] HP Filter”’ to download the HP filter add-in for Excel as a ZIP file. Open the ZIP file and double-click on the file ‘HPFilter.xla’. If a pop-up window appears, select ‘Enable Macros’.

Install the HP filter add-in
: In the top menu bar, select ‘Tools’, then ‘Excel Add-ins’.
Fullscreen

Install the HP filter add-in

In the top menu bar, select ‘Tools’, then ‘Excel Add-ins’.

Add the HP filter to Excel
: This pop-up window should appear. If the ‘Hodrick-Prescott Filter (HP Filter)’ option does not appear in the pop-up window, click ‘Browse’ to find where the file ‘HPFilter.xla’ is saved on your computer. Select the file then click ‘Open’ to return to the pop-up window. Make sure the ‘Hodrick-Prescott Filter (HP Filter)’ option is selected, then click ‘OK’.
Fullscreen

Add the HP filter to Excel

This pop-up window should appear. If the ‘Hodrick-Prescott Filter (HP Filter)’ option does not appear in the pop-up window, click ‘Browse’ to find where the file ‘HPFilter.xla’ is saved on your computer. Select the file then click ‘Open’ to return to the pop-up window. Make sure the ‘Hodrick-Prescott Filter (HP Filter)’ option is selected, then click ‘OK’.

Use the HP filter to calculate the trend component of a data series
: To apply the HP filter, in a new column, select the same number of cells as there are years of GDP data. In the formula box, type ‘=HP(‘, then select all the cells containing log GDP data (L2 to L63 in this example), then type ‘,6.25)’ and press Ctrl + Shift + Enter (or Ctrl + Shift + Return on a Mac). This formula calculates the trend component.
Fullscreen

Use the HP filter to calculate the trend component of a data series

To apply the HP filter, in a new column, select the same number of cells as there are years of GDP data. In the formula box, type ‘=HP(‘, then select all the cells containing log GDP data (L2 to L63 in this example), then type ‘,6.25)’ and press Ctrl + Shift + Enter (or Ctrl + Shift + Return on a Mac). This formula calculates the trend component.

Calculate the cyclical component of a data series
: Calculate the cyclical component by subtracting the trend component from the actual log GDP values.
Fullscreen

Calculate the cyclical component of a data series

Calculate the cyclical component by subtracting the trend component from the actual log GDP values.

As you can see in your charts, the cyclical components of GDP and employment seem to co-move over the business cycle. That is, in expansion periods, both output and employment grow (and in recessions, both shrink). Economists call this property of employment ‘procyclical’. Unemployment behaves the other way around; therefore it is ‘countercyclical’. To measure the degree of procyclicality and of cyclical volatility of an economic variable, economists typically compare it relative to GDP, which we will do in the next tasks.

  1. To study the cyclicality of total employment, use Excel’s CORREL function to compute the correlation coefficient of the cyclical component of log hours worked of all persons with the cyclical component of log GDP (separately for each country). (For help on using Excel’s CORREL function, use this Excel walk-through.) How can you tell that employment is a procyclical variable (at least in the US)? In which of your countries is employment more cyclical?
  1. To evaluate the cyclical volatility of employment, compute the standard deviation (s.d.) of each of the three cyclical components of log hours (all persons, female, and male) separately for each country. Then divide it by the standard deviation of the cyclical component of log GDP. Make sure to compute the standard deviations over the same time period within each country (the time periods can be different between countries, but must be the same between economic variables to compare them). Is employment more or less volatile than GDP? Which is more volatile—male or female employment? How does it differ between the US and your other country?
  1. Now we get to finding an explanation for the great moderation in the US:
    • Compute the relative standard deviations (that is, cyclical volatilities) in the same way as in Question 2e for the US, but only for female and male employment, splitting the sample into two periods: 1983 to 2007 (the last pre-global financial crisis year), and 2008 until your last year of data.
    • Create a grouped column chart with four columns, each column representing one relative standard deviation (all from US data): pre-global financial crisis female and male, and post-global financial crisis female and male.
    • Create the same column chart for your other country as well (use the same time periods as for the US).
  1. Interpret the column chart for the US. Do your results on female vs male cyclical volatility from Question 2e hold for both time periods? Taking into account the changes in labour force participation shown in Question 1, come up with a hypothesis for how these changes caused GDP and employment to become less volatile (the defining features of the great moderation). (Hint: Think about the change in the composition of total employment.)
  1. Compare the column charts between the US and your chosen country. Do the same characteristics of female employment (less volatile than male) also apply to your other country? If you have found a great moderation for your other country in Question 2d, can it be explained in the same way as for the US? (Note that you might have to adjust the years of the sample split to those you found in Question 2d for your other country.)

In the next question, we will look at the second macro puzzle: the slowdown of US productivity growth in the 1970s and 1980s. We take the average labour productivity (ALP) series as a measure of productivity. To measure sex-specific employment, we will use the labour force participation (LFP) rates instead of hours worked in this question because the hours data for the US only starts in 1979 (while the LFP data starts in 1960). Note that LFP only captures the extensive margin of employment, not the intensive margin. This again highlights the limitations of using aggregated data compared to micro data, which would contain hours worked by sex for earlier years as well.

We will use the HP filter to isolate the trend (that is, growth) components of the log of ALP and of log female and male LFP rates, as we are interested in the long-run trend rather than the (business) cyclical variations around that trend.

  1. We will again show the productivity slowdown first and then try to find an explanation for it:
  1. First document the productivity slowdown in the US and check if a similar trend can be found for your other country:
    • Apply the HP filter to the log of ALP with lambda = 6.25 to calculate the trend component only.
    • Compute the annual growth rate in percentage of that trend component.
    • Plot the growth rate in a line chart over time (separate charts by country), using all available years of data.
    • Add in the recession periods as grey shaded areas.
    • Verify that productivity growth for the US has slowed down during the 1970s and 1980s.
    • Repeat the steps above for your chosen country. Was there a similar slowdown in your chosen country?
  1. Now add employment (measured by LFP rate) to both line charts:
    • Apply the HP filter to the log of the female and male LFP rates with lambda = 6.25. In doing so, apply the filter only to observations from 1963 onwards to exclude the jumps in LFP rates due to changes in methodology. Save only the trend components.
    • Compute the annual growth rates in percentage of both trend components.
    • Add both growth rates to the two charts from Question 3a and include a legend.
    • What is the relationship of these two variables (female and male LFP trends) to the ALP trend component?
  1. Based on the results from Question 3b, come up with an explanation for the productivity slowdown, taking into account how labour force participation (and therefore the composition of the labour force) changed during the time of the productivity slowdown in the 1970s and 1980s. Why would this change in composition initially lower productivity growth? Why might this change in composition increase productivity growth in the longer term?
  1. If there was a productivity slowdown in the other country, could it be explained in the same way as for the US?
  1. The final macro puzzle we will investigate is that of jobless recoveries. As we want to consider the 1970s recessions, we again measure employment through the log of the LFP rate (extensive margin only).
  1. For the US data only, create a line chart of the log LFP rate (of all persons) over time, starting in 1963 (because of the change in methodology). Add the recession periods as shaded areas.
  1. Explain how the line chart shows ‘jobless recoveries’. Approximately when did recoveries start being jobless?
  1. Create two more line charts in the same way as in Question 4a: 1) log of female LFP rate; 2) log of male LFP rate.
  1. How do recoveries after recessions differ between female and male LFP? Considering the changes in labour force participation and labour force composition, what could be an explanation for why recoveries in the US have become jobless in the 1990s, 2000s, and 2010s?
  1. Redo Questions 4a and 4c for your other country. Do you also find jobless recoveries? Does the US explanation for jobless recoveries apply to that country too?
  1. Read the introduction (up to ‘The second part of the paper…’ on p. 3) and conclusion of Stefania Albanesi’s research paper ‘Changing Business Cycles: The Role of Women’s Employment’, on which this project is based. Are your results for the US in line with her findings? What have you learned about the micro-to-macro link between growing female labour supply on the household side and the three macroeconomic puzzles?