# Empirical Project 4 Working in Excel

## Excel-specific learning objectives

In addition to the learning objectives for this project, in this section you will learn how to generate new variables using cell formulas.

## Part 4.1 GDP and its components as a measure of material wellbeing

### Learning objectives for this part

• check datasets for missing data
• sort data and assign ranks based on values
• distinguish between time series and cross sectional data, and plot appropriate charts for each type of data.

The GDP data we will look at is from the United Nations’ National Accounts Main Aggregates Database, which contains estimates of total GDP and its components for all countries over the period 1970–Present. We will look at how GDP and its components have changed over time, and investigate the usefulness of GDP per capita as a measure of wellbeing.

To answer the questions below, download the data and make sure you understand how the measure of total GDP is constructed.

• Go to the United Nations’ National Accounts Main Aggregates Database website. On the right-hand side of the page, under ‘Data Availability’, click ‘Downloads’.
• Under the subheading ‘GDP and its breakdown at constant 2010 prices in US Dollars’, select the Excel file ‘All countries for all years – sorted alphabetically’.
• Save it in an easily accessible location, such as a folder on your Desktop or in your personal folder.
1. You can see from the tab ‘Download-GDPconstant-USD-countr’ that some countries have missing data for some of the years. Data may be missing due to political reasons (for example, countries formed after 1970) or data availability issues.
• Make and fill a frequency table similar to Figure 4.1, showing the number of years that data is available for each country in the category ‘Final consumption expenditure’.
• How many countries have data for the entire period (1970 to the latest year available)? Do you think that missing data is a serious issue in this case?
Country Number of years of GDP data

Figure 4.1 Number of years of GDP data available for each country.

### Excel walk-through 4.1 Making a frequency table

Figure 4.2 How to make a frequency table.

The data

This is what the data looks like. Column B has country names, Column C has the different components of GDP, and Column D onwards has the values of the GDP components for a particular year.

Count the years of data available in each row

We will first count the number of years of data available using the COUNTA function, which tells us how many cells in a given selection are non-empty (i.e. have data). We will store this information in Column AY, as shown. Note: The numbers you get may be slightly different if you are using the latest data.

Filter the data

We only want to know how many years of ‘Final consumption expenditure’ are available for each country, so we need to filter out the rest of the data.

Filter the data

After step 7, the data will be filtered so there is only one number for each country. Your table is ready to copy and paste into a new tab.

Paste the data in a new tab

The table will now look like Figure 4.1. The final step is to count the number of countries with missing data.

Count the number of countries with missing data

To count the number of countries with missing data, use the COUNTIF function. Excel will count the number of cells in a given selection that meet the specified criteria. Note: If you are using the latest data, you should change the number in the COUNTIF function from ‘47’ to the maximum number of years available (the latest year in your data minus 1970 plus 1), so the numbers you get may be slightly different.

If you add up the data on the right-hand side of this equation, you may find that it does not add up to the reported GDP value. The UN notes this discrepancy in Section J, item 17 of the ‘Methodology for the national accounts’: ‘The sums of com­ponents in the tables may not necessarily add up to totals shown because of rounding’.

There are three different ways in which countries calculate GDP for their national accounts, but we will focus on the expenditure approach, which calculates gross domestic product (GDP) as:

Gross capital formation refers to the creation of fixed assets in the economy (such as the construction of buildings, roads, and new machinery) and changes in inventories (stocks of goods held by firms).

1. Rather than looking at exports and imports separately, we usually look at the difference between them (exports minus imports), also known as net exports. Choose three countries that have GDP data over the entire period (1970 to the latest year available). For each country, create a new row that shows the values of net exports in each year. Make sure to give each row an appropriate label.

Now, we will create charts to show the GDP components in order to look for general patterns over time and make comparisons between countries.

1. Evaluate the components over time, for two countries of your choice.
• Create a new row for each of the four components of GDP (Household consumption expenditure, General government final consumption expenditure, Gross capital formation, Net exports). To make the charts easier to read, convert the values into billions (for example, 4.38 billion instead of 4,378,772,008). Round your values to two decimal places.
• Plot a separate line chart for each country, showing the value of the four components of GDP on the vertical axis and time (the years 1970–Present) on the horizontal. (Use more than one line chart per country if necessary, to show the data more clearly). Name each component in the chart legend appropriately.
• Which of the components would you expect to move together (increasing or decreasing together) or move in opposite directions, and why? Using your charts from Question 3(b), describe any patterns you find in the relationship between the components. Does the data support your hypothesis about the behaviour of the components?
• For each country, describe any patterns you find in the movement of components over time. What factors could explain the patterns that you find within countries, and any differences between countries (for example, economic or political events)? You may find it helpful to research the history of the countries you have chosen.
• Extension: For one country, add data labels to your chart to indicate the relevant events that happened in that year.

### Excel walk-through 4.2 Adding data labels to a chart

Figure 4.3 How to add data labels to a chart.

The data

This example uses GDP data from Afghanistan. The data has been filtered so that only the components of interest are visible (you should filter your data so that the four components given in Question 3(a) are visible). Note: The numbers in your dataset may differ slightly if you are using the latest data.

Draw a line chart

After step 3, the line chart will look like the one shown above.

Add a data label to a particular point in the chart

Instead of adding data labels to every point in the chart, you can choose to label specific points only. After step 4, only the selected data point will be labelled.

Change the text in the label and reposition the label

By default, Excel will label data points with the horizontal axis and vertical axis values. You can change the label to text by clicking and typing into the box.

1. Another way to visualize the GDP data is to look at each component as a proportion of total GDP. Use the same countries that you chose for Question 3.
time series data
A time series is a set of time-ordered observations of a variable taken at successive, in most cases regular, periods or points of time. Example: The population of a particular country in the years 1990, 1991, 1992, … , 2015 is time series data.
cross-sectional data
Data that is collected from participants at one point in time or within a relatively short time frame. In contrast, time series data refers to data collected by following an individual (or firm, country, etc.) over a course of time. Example: Data on degree courses taken by all the students in a particular university in 2016 is considered cross-sectional data. In contrast, data on degree courses taken by all students in a particular university from 1990 to 2016 is considered time series data.
• For each country, create a new row in Excel to show the sum of all four components (remember that this total may not add up to the reported GDP). Next, create a new row for each component, showing its proportion of total GDP (as a value ranging from 0 to 1), rounded to two decimal places. (Hint: to calculate the proportion of a component, divide the value of that component by the sum of all four components.)
• Plot a separate line chart for each country, showing the proportion of the component of GDP on the vertical axis and time (the years 1970 to the latest year available) on the horizontal axis.
• Describe any patterns in the proportion of spending over time for each country, and compare these patterns across countries.
• Compared to the charts in Question 3, what are some advantages of this method for making comparisons over time and between countries?

So far, we have done comparisons of time series data, which is a collection of values for the same variables and subjects, taken at different points in time (for example, GDP of a particular country, measured each year). We will now make some charts using cross-sectional data, which is a collection of values for the same variables for different subjects, usually taken at the same time.

1. Choose three developed countries, three countries in economic transition, and three developing countries (for a list of these countries, see Tables A-C in the UN country classification document).
• For each country, calculate each component as a proportion of GDP for the year 2015 only. (You may find it helpful to copy and paste the relevant data into a new Excel sheet.)
• Now create a stacked bar chart that shows the composition of GDP in 2015 on the horizontal axis, and country on the vertical axis. Arrange the columns so that the countries in a particular category are grouped together. (See the walk-through in Figure 3.8 of Economy, Society, and Public Policy.)
• Describe the differences (if any) between the spending patterns of developed, economic transition, and developing countries.
1. GDP per capita is often used to indicate material wellbeing instead of GDP, because it accounts for differences in population across countries. Refer to the following articles to help you to answer the questions:

• Discuss the usefulness and limitations of GDP per capita as a measure of material wellbeing.
• Based on the arguments in the articles, do you think GDP per capita is an appropriate measure of both material wellbeing and overall wellbeing? Why or why not?

## Part 4.2 The HDI as a measure of wellbeing

### Learning objectives for this part

• sort data and assign ranks based on values
• distinguish between time series and cross sectional data, and plot appropriate charts for each type of data
• calculate the geometric mean and explain how it differs from the arithmetic mean
• construct indices using the geometric mean, and use index values to rank observations
• explain the difference between two measures of wellbeing (GDP per capita and the Human Development Index).

In Part 4.1 we looked at GDP per capita as a measure of material wellbeing. While income has a major influence on wellbeing because it allows us to buy the goods and services we need or enjoy, it is not the only determinant of wellbeing. Many aspects of our wellbeing cannot be bought, for example, good health or having more time to spend with friends and family.

We are now going to look at the Human Development Index (HDI), a measure of wellbeing that includes non-material aspects, and make comparisons with GDP per capita (a measure of material wellbeing). GDP per capita is a simple index calculated as the sum of its elements, whereas the HDI is more complex. Instead of using different types of expenditure or output to measure wellbeing or living standards, the HDI consists of three dimensions associated with wellbeing:

• a long and healthy life (health)
• knowledge (education)
• a decent standard of living (income).

We will first learn about how the HDI is constructed, and then use this method to construct indices of wellbeing according to criteria of our choice.

The HDI data we will look at is from the Human Development Report 2016 by the United Nations Development Programme (UNDP). To answer the questions below, download the data and technical notes from the report:

• Go to the UNDP’s website.
• Click ‘Table 1: Human Development Index and its components’ on the left-hand side of the page. Then right-click on the ‘Download Data’ box on the top right-hand side of the page and select ‘Save Link As…’.
• Save the file in an easily accessible location, and make sure to give it a suitable name.
• The ‘Technical notes’ give a diagrammatic presentation of how the HDI is constructed from four indicators.
1. Refer to the technical notes and the spreadsheet you have downloaded. For each indicator, explain whether you think it is a good measure of the dimension, and suggest alternative indicators, if any. (For example, is GNI per capita a good measure of the dimension ‘a decent standard of living’?)
1. Figure 4.4 shows the minimum and maximum values for each indicator. Discuss whether you think these are reasonable. (You can read the justification for these values in the technical notes.)
Dimension Indicator Minimum Maximum
Health Life expectancy (years) 20 85
Education Expected years of schooling (years) 0 18
Mean years of schooling (years) 0 15
Standard of living Gross national income per capita (2011 PPP $) 100 75,000 Figure 4.4 Maximum and minimum values for each indicator in the HDI. United Nations Development Programme. 2016. ‘Technical notes’ in Human Development Report 2016: p. 2. We are now going to apply the method for constructing the HDI, by recalculating the HDI from its indicators. We will use the formula below, and the minimum and maximum values in the table in Figure 4.4. These are taken from page 2 of the technical notes, which you can refer to for additional details. The HDI indicators are measured in different units and have different ranges, so in order to put them together into a meaningful index, we need to normalize the indicators using the following formula: Doing so will give a value in between 0 and 1 (inclusive), which will allow comparison between different indicators. 1. Refer to Figure 4.4 and calculate the dimension index for each of the dimensions in separate columns in the same spreadsheet tab as your data: • Using the HDI indicator data in Column E, calculate the dimension index for a long and healthy life (health). • Using the HDI indicator data in Columns G and I, calculate the dimension index for knowledge (education). Note that the knowledge dimension index is the average of the dimension index for expected years of schooling for those entering school, and mean years of schooling for adults aged 25 or older. • Using the HDI indicator data in Column K, calculate the dimension index for a decent standard of living (income). Note (from the technical notes) that you should calculate the GNI index using the natural log of the values. (See the ‘Find out more’ box below for an explanation of the natural log and how to calculate it in Excel.) ### Find out more The natural log: What it means, and how to calculate it in Excel The natural log turns a linear variable into a concave variable, as shown in Figure 4.5. For any value of income on the horizontal axis, the natural log of that value on the vertical axis is smaller. At first, the difference between income and log income is not that big (for example, an income of 2 corresponds to a natural log of 0.7), but the difference becomes bigger as we move rightwards along the horizontal axis (for example, when income is 100,000, the natural log is only 11.5). Figure 4.5 Comparing income with the natural logarithm of income. The reason why natural logs are useful in economics is because they can represent variables that have diminishing marginal returns: an additional unit of input results in a smaller increase in the total output than did the previous unit. (If you have studied production functions, then the shape of the natural log function might look familiar.) When applied to the concept of wellbeing, the ‘input’ is income, and the ‘output’ is material wellbeing. It makes intuitive sense that a$100 increase in per capita income will have a much greater effect on wellbeing for a poor country compared to a rich country. Using the natural log of income incorporates this notion into the index we create. Conversely, the notion of diminishing marginal returns (the larger the value of the input, the smaller the contribution of an additional unit of input) is not captured by GDP per capita, which uses actual income and not its natural log. Doing so makes the assumption that a $100 increase in per capita income has the same effect on wellbeing for rich and poor countries. Excel’s LN function calculates the natural log of a value for you. Simply type ‘=LN(’, then the number you would like to take the natural log of, then ‘)’ and press Enter. If you have a scientific calculator, you can check that the calculation is correct by using the ‘ln’ or ‘log’ key. geometric mean A summary measure calculated by multiplying N numbers together and then taking the Nth root of this product. The geometric mean is useful when the items being averaged have different scoring indices or scales, because it is not sensitive to these differences, unlike the arithmetic mean. For example, if education ranged from 0 to 20 years and life expectancy ranged from 0 to 85 years, life expectancy would have a bigger influence on the HDI than education if we used the arithmetic mean rather than the geometric mean. Conversely, the geometric mean treats each criteria equally. Example: Suppose we use life expectancy and mean years of schooling to construct an index of wellbeing. Country A has life expectancy of 40 years and a mean of 6 years of schooling. If we used the arithmetic mean to make an index, we would get (40 + 6)/2 = 23. If we used the geometric mean, we would get (40 × 6)1/2 = 15.5. Now suppose life expectancy doubled to 80 years. The arithmetic mean would be (80 + 6)/2 = 43, and the geometric mean would be (80 × 6)1/2 = 21.9. If, instead, mean years of schooling doubled to 12 years, the arithmetic mean would be (40 + 12)/2 = 26, and the geometric mean would be (40 × 12)1/2 = 21.9. This example shows that the arithmetic mean can be ‘unfair’ because proportional changes in one variable (life expectancy) have a larger influence over the index than changes in the other variable (years of schooling). The geometric mean gives each variable the same influence over the value of the index, so doubling the value of one variable would have the same effect on the index as doubling the value of another variable. Now that you know about the natural log, you might want to go back to Question 3(c) in Part 4.1, and create a new chart using the natural log scale. The natural log is used in economics because it approximates percentage changes i.e. log(x) – log(y) is a close approximation to the percentage change between x and y. So, using the natural log scale, you will be able to ‘read off’ the relative growth rates from the slopes of the different series you have plotted. For example, a 0.01 change in the vertical axis value corresponds to a 1% change in that variable. This will allow you to compare the growth rates of the different components of GDP. Now, we can combine these dimensional indices to give the HDI. The HDI is the geometric mean of the three dimension indices (IHealth = Life expectancy index, IEducation = Education index, and IIncome = GNI index): 1. Use the formula above and the data in the spreadsheet to calculate the HDI for all the countries excluding those in the ‘Other countries or territories’ category. You should get the same values as those in Column C, rounded to 3 decimal places. ### Excel walk-through 4.3 Calculating the HDI Figure 4.6 How to calculate the HDI. The data The HDI data looks like this. The HDI value for each country is in Column C, and its four components are in Columns E, G, I, and K. Note: If you are using the latest data, the numbers in your dataset may differ slightly, and your spreadsheet may only have one tab. Calculate dimension indices of each component. First, we will calculate the dimension indices of each component of the HDI, and store this information in Columns Q to T, as shown above. Calculate dimension indices of each component After step 4, you will have dimension indices for each component. Make sure to use the LN function to calculate the dimension index for income. Calculate the index value for education To calculate the HDI, we need the index value for education, which we get by taking the average of the two dimension indices for schooling. Calculate the HDI After step 6, you will have the correct HDI value for most countries. For the other countries, we need to recalculate the dimension index for expected years of schooling, in order to get the correct value. Amend the formula for expected years of schooling to give the correct HDI values The problem with some countries is that expected years of schooling is greater than the value chosen as the maximum (18). In these cases, replacing the dimension index with 1 solves the problem. Round the calculated values to 3 decimal places To make the HDI values easier to read, we will round them to 3 decimal places, as was done in Column C. Round the calculated values to 3 decimal places After step 10, your HDI values should look the same as those in Column C. The HDI is one way to measure wellbeing, but you may think that it does not use the most appropriate measures for the non-material aspects of wellbeing (health and education). Now we will use the same method to create our own index of non-material wellbeing (an ‘alternative HDI’), using different indicators. You can find alternative indicators to measure health and education on the UNDP website, in the file ‘Download 2018 Human Development Data Bank’ (left-hand side of the page). (The first column of the spreadsheet, ‘dimension’, tells you whether the indicators refer to ‘Health’, ‘Education’, or other categories). 1. Create an alternative index of wellbeing: • Choose two to three indicators to measure health, and two to three indicators to measure education. Explain why you have chosen these indicators. • Carefully copy and paste the column(s) containing these indicator values into new columns of your spreadsheet from Question 4 (containing your HDI calculations), making sure to match the data to the correct country. Choose a reasonable maximum and minimum value for each indicator and justify your choices. • Using the indicators and maximum and minimum values you have chosen, calculate the alternative HDI as done in Questions 3 and 4. Remember to include the existing income index from Question 2. Since you have chosen more than one indicator per dimension, make sure to average the dimension indices as done in Question 3(b). Also ensure that higher indicator values always represent better outcomes. • Create a new column showing each country’s rank according to your alternative HDI, where 1 is assigned to the country with the highest value. Compare your ranking to the HDI rank (Column A). Are the rankings generally similar, or very different? (See Excel walk-through 4.4 on how to do this.) ### Excel walk-through 4.4 Ranking data Follow the walk-through in the CORE video, or in Figure 4.7, to find out how to rank data in Excel. Figure 4.7 How to use Excel’s RANK function. The data In this example, the created index is in Column Q (titled ‘Own index’) and was calculated as the dimension index of Life expectancy at birth. Note: If you are using the latest data, the numbers in your dataset may differ slightly, and your spreadsheet may only have one tab. Rank the index values from largest to smallest We will rank the values of our own index from largest to smallest, and store this information in Column R. Compare the rankings with the HDI index rankings There are many ways to compare your ranking with the HDI ranking; two ways are shown here. 1. Compare your alternative index to the HDI: • The UN classifies countries into four groups depending on their HDI, as shown in Figure 4.8. Would the classification of any country change under your alternative HDI? • Based on your answers to Questions 5(d) and 6(a), do you think that the HDI is a robust measure of non-material wellbeing? (In other words, does changing the indicators used in the HDI give similar conclusions about the non-material wellbeing of countries?) Classification HDI Very high human development 0.800 and above High human development 0.700–0.799 Medium human development 0.550–0.699 Low human development Below 0.550 Figure 4.8 Classification of countries according to their HDI value. United Nations Development Programme. 2016. ‘Technical notes’ in Human Development Report 2016: p.3. We will now investigate whether HDI and GDP per capita give similar information about overall wellbeing, by comparing a country’s rank in both measures. The spreadsheet you downloaded for Question 5 (containing alternative indicators) also has information on GDP per capita, measured in 2011 constant prices in US dollars. This indicator is called ‘GDP per capita (2011 PPP$)’ in Column C of that spreadsheet. To answer Question 7, first filter the data so that only rows containing this indicator are visible, then copy and paste this data into a new column in the spreadsheet containing your HDI calculations, making sure to match the data to the correct country.

1. Evaluate GDP per capita and the HDI as measures of overall wellbeing:
• Create a new column showing each country’s rank according to GDP per capita, where 1 is assigned to the country with the highest value. Show this rank on a scatterplot, with GDP per capita rank on the vertical axis and HDI rank on the horizontal axis. (See Excel walk-through 1.7 for a step-by-step explanation of how to create scatterplots in Excel.)
• Does there appear to be a strong correlation between these two variables? Give an explanation for the observed pattern in your chart.
• Create a table similar to Figure 4.9. Using your answers to Question 7(a), fill each box with three countries. You can use the UN’s definition of ‘high’ for the HDI, as in Figure 4.8, and choose a reasonable definition of ‘high’ for GDP. Based on this table, which country or countries would you prefer to grow up in, and why?
• Explain the differences between HDI and GDP as measures of wellbeing. You may want to consider the way each measure includes income in its calculation (the actual value or a transformation), and the inclusion of other aspects of wellbeing.
HDI
Low High
GDP Low
High

Figure 4.9 Classification of countries according to their HDI and GDP values.

1. The HDI is one way to measure wellbeing, but there are many other ways to measure wellbeing.
• What are the strengths and limitations of the HDI as a measure of wellbeing?
• Find some alternative measures of non-material wellbeing that we could use alongside the HDI to provide a more comprehensive picture of wellbeing. For each measure, evaluate the elements used to construct the measure, and discuss the additional information we can learn from it. (You may find it helpful to read Our World in Data’s page on happiness and life satisfaction, particularly the section ‘Correlates, Determinants, and Consequences’.)