# Empirical Project 10 Working in Excel

## Part 10.1 Summarizing the data

We will be using the World Bank’s Global Financial Development Database.

The World Bank’s Database contains information about four categories:

• financial depth: the size of financial institutions and markets
• access: the degree to which individuals are able to use financial services
• stability of financial institutions and markets
• efficiency of financial intermediaries and markets in facilitating financial transactions.

We will be looking at the first three categories, focusing particularly on measures of stability before and after the 2008 global financial crisis. Each category is measured by a number of indicators. Figure 10.1 shows the indicators we will be using in this project.

Category Indicator name Indicator code
Depth Private credit by deposit money banks to GDP (%) GFDD.DI.01
Deposit money banks’ assets to GDP (%) GFDD.DI.02
Access Bank accounts per 1,000 adults GFDD.AI.01
Bank branches per 100,000 adults GFDD.AI.02
Firms with a bank loan or line of credit (%) GFDD.AI.03
Small firms with a bank loan or line of credit (%) GFDD.AI.04
Stability Bank Z-score GFDD.SI.01
Bank regulatory capital to risk-weighted assets (%) GFDD.SI.05

Indicators used in this project.

Figure 10.1 Indicators used in this project.

1. The ‘Definitions and Sources’ tab in your spreadsheet contains a description of all indicators in the Database. Use the information provided to explain briefly why each of the indicators listed in Figure 10.1 may be a good measure of that category, or may give misleading information about that category.

The ‘Data – June 2016’ tab contains the values of each indicator over time (1960–2014) for various countries around the world, though data may be missing for some countries and years.

1. For each category, calculate and interpret the correlation coefficient between the indicators. Remember that the correlation between A and B is the same as the correlation between B and A, so you only need to calculate the correlation for each pair of items once. (Hint: For Access, you may find it helpful to display the pairwise correlations in a table like Figure 10.2.)
GFDD.AI.01 GFDD.AI.02 GFDD.AI.03 GFDD.AI.04
GFDD.AI.01 1
GFDD.AI.02   1
GFDD.AI.03     1
GFDD.AI.04       1

Correlation table for indicators in Question 2.

Figure 10.2 Correlation table for indicators in Question 2.

Box and whisker plots are useful for looking at a single variable and checking if there are many extreme values (either very large or very small, relative to the rest of the values).

1. Make a separate box and whisker plot for each indicator, with the outliers displayed (see Excel Walk-through 6.3 for help on how to do this). From looking at your plots, do extreme values appear to be an issue? (In Question 6, we will look at one way to handle extreme values if there is a concern that one or a few very extreme values will significantly affect the average.)

Now we will use Excel’s PivotTable option to make summary tables of some indicators and look at how they have changed over time. Each country belongs to a particular region and income group.

1. Choose one indicator in Depth and one indicator in Access:
• Make tables showing the average of those indicators and number of observations (count), with ‘Region’ or ‘Income Group’ as the column variable(s) and ‘Year’ (2000–2014 only) as the row variable. (Make a separate table for region and income group.)
• For each indicator chosen, make a line chart with the average indicator value (either for region or income group) as the vertical axis variable, and year as the horizontal axis variable. Comment on any patterns you see across regions/income groups and time.

So far, we have been looking at simple averages where each observation is given the same weight (importance), so we simply add up all the numbers and divide by the number of observations. However, when we take averages across regions or income groups, we may want to account for the fact that countries differ in size (population or GDP). For example, if one country is far larger than another, we may want that country to have a larger influence on the average. See the box below for more about weighted averages.

weighted average
A type of average that assigns greater importance (weight) to some components than to others, in contrast with a simple average, which weights each component equally. Components with a larger weight can have a larger influence on the average.

### Weighted averages

An example of weighted averages that you have probably experienced is in calculating course grades. Usually, course grades are not calculated by simply summing up the scores in all components and dividing by the number of components. Instead, certain components such as the final exam are given more importance (influence over the overall grade) than the midterm exam or course assignments.

To calculate the weighted average, we first determine the weight of each component (these are fractions or proportions that sum to 1). Then we multiply each component by its respective weight, and then sum over all components. Using the course grade as an example, suppose the final exam is worth 70% of the final grade and the midterm exam is worth 30%, with both being scored out of 100. Then the weighted average would be:

In comparison, the simple average would give both components equal weight:

To develop your intuition for this concept, you can experiment by choosing values for the final exam score and midterm exam score and seeing how a change in one of the scores affects the weighted and simple averages.

The indicator ‘Bank regulatory capital to risk-weighted assets (%)’ in the Database also uses weights to account for the fact that some assets are riskier than others, and should therefore not be considered equally.

We will practice calculating weighted averages for the indicator ‘Bank accounts per 1,000 adults’, weighting according to total population in each region (so countries with a larger population will have a larger influence on the average). Since data is missing for some countries, we will calculate the total population in each region as the total population for countries with non-missing data.

1. For each region and for the years 2004–2014:
• In the ‘Data – June 2016’ tab, create a new variable for the weight, which is the ratio of ‘SP.POP.TOTL’ and the sum of ‘SP.POP.TOTL’ for each country within the relevant region and year. Only use observations that have non-missing values for the ‘GFDD.AI.01’ indicator variable. (Hint: Use Excel’s IF function to separate data with non-missing values, then use Excel’s SUMIFS function, conditioning on ‘Region’ and ‘Year’.)
• Check that your answer is correct by filtering the data for a particular region and year and verifying that the weights sum to 1.
• Now multiply the ‘Bank accounts per 1,000 adults’ indicator by the weights in Question 5(a) and sum up the resulting values according to region to get the weighted average. (Hint: Excel’s PivotTable option can help you sum the values and put them in a table.)
• Compare your answers to Question 5(c) with the corresponding simple averages in Question 4(a) and comment on any similarities or differences.

### Extension Using Winsorization to handle extreme values

If we are interested in combining indicators into a single index (as in Empirical Project 4), we may be concerned about extreme values, but still want to include these countries in the index (rather than excluding them from the calculations). On page 19 of the paper ‘Benchmarking financial systems around the world’, the authors discuss Winsorization (replacing extreme values with either the 95th or the 5th percentile value) as one way to handle these extreme values. Sometimes the extreme values are due to peculiar features of a single country, so we might want to adjust the data to make the values ‘less extreme’.

1. For an indicator you have used in Questions 4 and 5 and for the year 2010:
• Calculate the 95th and 5th percentile value of that indicator, across all countries. (Hint: Use Excel’s PERCENTILE.INC function.)
• Replace any value larger than the 95th percentile value with the 95th percentile value, and replace any value smaller than the 5th percentile value with the 5th percentile value. (Hint: Use Excel’s IF function.)
• Use your ‘Winsorized’ values from Question 6(b) to calculate the average values of the indicator, by region and income group (separately). Compare these values to the simple averages from Question 4(a).

## Part 10.2 Comparing financial stability before and after the 2008 global financial crisis

Now we will assess whether financial stability (measured by the two indicators in Figure 10.1) has changed since the 2008 global financial crisis.

1. For both indicators of stability in Figure 10.1, explain what effect the post-crisis banking regulations are likely to have on the value of the indicator (for example, would the value increase or decrease?), and why. You may find it helpful to research the regulations that were implemented as a result of the 2008 global financial crisis.
1. For the years 2007 and 2014:
• Use Excel’s PivotTable option to make tables showing the average of those indicators, with ‘Region’ or ‘Income Group’ as the row variable(s) and ‘year’ (2000–2014 only) as the column variable. (Make a separate table for region and income group.)
• Add a column showing the difference in means (2014 minus 2007).
• Add four extra columns containing the standard deviation and number of observations for each year. Calculate the standard deviation for the difference in means, and the number of observations in both years.
• Use Excel’s CONFIDENCE.T function to calculate the 95% confidence interval ‘width’ of the difference in means (the distance from one end of the interval to the mean). (See Part 8.3 of Empirical Project 8 for help on how to do this.)
1. For each indicator:
• Plot column charts (one for regions, one for income groups) showing the differences on the vertical axis and indicator on the horizontal axis. Add the confidence intervals from Question 2(d) to your charts.
leverage ratio (for banks or households)
The value of assets divided by the equity stake (capital contributed by owners and shareholders) in those assets.
• Interpret your findings. Is there evidence that stability has increased since the 2008 global financial crisis? (Note that ‘Bank regulatory capital to risk-weighted assets’ is inversely related to the leverage ratio. High leverage ratios were common in the lead-up to the 2008 crisis, and contributed to financial instability.)