Empirical Project 10 Working in Excel

Part 10.1 Summarizing the data

Learning objectives for this part

  • compare characteristics of banking systems around the world and across time
  • use box and whisker plots to summarize distributions and identify outliers
  • calculate weighted averages and explain the differences between weighted and simple averages.

We will be using the World Bank’s Global Financial Development Database.

Download the data and documentation:

The World Bank’s Global Financial Development Database contains information about four categories:

We will be looking at the first three categories, focusing particularly on measures of stability before and after the 2008 global financial crisis. Each category is measured by a number of indicators. Figure 10.1 shows the indicators we will be using in this project. (Note that in other versions of the dataset, the indicators may be in lowercase instead of uppercase.)

Category Indicator name Indicator code
Depth Private credit by deposit money banks to GDP (%) GFDD.DI.01
  Deposit money banks’ assets to GDP (%) GFDD.DI.02
Access Bank accounts per 1,000 adults GFDD.AI.01
  Bank branches per 100,000 adults GFDD.AI.02
  Firms with a bank loan or line of credit (%) GFDD.AI.03
  Small firms with a bank loan or line of credit (%) GFDD.AI.04
Stability Bank Z-score GFDD.SI.01
  Bank regulatory capital to risk-weighted assets (%) GFDD.SI.05

Figure 10.1 Indicators used in this project.

  1. The ‘Definitions and Sources’ tab in your spreadsheet contains a description of all indicators in the Database. Use the information provided in the ‘Short Description’ column to explain briefly why each of the indicators listed in Figure 10.1 may be a good measure of that category, or may give misleading information about that category. (You may find it helpful to conduct some research on these measures, especially if the explanation contains technical terms).

The ‘Data – June 2016’ tab contains the values of each indicator over time (1960–2014) for various countries around the world, though data may be missing for some countries and years.

box and whisker plot
A graphic display of the range and quartiles of a distribution, where the first and third quartile form the ‘box’ and the maximum and minimum values form the ‘whiskers’.

To get an idea of what the distribution of values for each variable looks like, we will use box and whisker plots. Box and whisker plots are useful for looking at the distribution of a single variable and checking if there are many extreme values (either very large or very small, relative to the rest of the values).

  1. Make a separate box and whisker plot for each indicator, with the outliers displayed (see Excel walk-through 6.3 for help on how to do this). Comment on the overall distribution, the number of outliers, and suggest why there may be many outliers. (In Question 5, we will look at one way to handle extreme values if there is a concern that one or a few very extreme values will significantly affect the average.)

Now we will use Excel’s PivotTable option to make summary tables of some indicators and look at how they have changed over time. Each country belongs to a particular region and income group.

  1. Choose one indicator in Depth and one indicator in Access:

So far, we have been looking at simple averages where each observation is given the same weight (importance), so we simply add up all the numbers and divide by the number of observations. However, when we take averages across regions or income groups, we may want to account for the fact that countries differ in size (population or GDP). For example, if one country is far larger than another, we may want that country to have a larger influence on the average. See the box below for more about weighted averages.

weighted average
A type of average that assigns greater importance (weight) to some components than to others, in contrast with a simple average, which weights each component equally. Components with a larger weight can have a larger influence on the average.

Weighted averages

An example of weighted averages that you have probably experienced is in calculating course grades. Usually, course grades are not calculated by simply summing up the scores in all components and dividing by the number of components. Instead, certain components such as the final exam are given more importance (influence over the overall grade) than the midterm exam or course assignments.

To calculate the weighted average, we first determine the weight of each component (these are fractions or proportions that sum to 1). Then we multiply each component by its respective weight, and then sum over all components. Using the course grade as an example, suppose the final exam is worth 70% of the final grade and the midterm exam is worth 30%, with both being scored out of 100. Then the weighted average would be:

In comparison, the simple average would give both components equal weight:

To develop your intuition for this concept, you can experiment by choosing values for the final exam score and midterm exam score and seeing how a change in one of the scores affects the weighted and simple averages.

The indicator ‘Bank regulatory capital to risk-weighted assets (%)’ in the Database also uses weights to account for the fact that some assets are riskier than others, and should therefore not be considered equally.

We will practice calculating weighted averages for the indicator ‘Bank accounts per 1,000 adults’, weighting according to total population in each region (so countries with a larger population will have a larger influence on the average). Since data is missing for some countries, we will calculate the total population in each region as the total population for countries with non-missing data.

  1. For each region and for the years 2004–2014:

Extension Using Winsorization to handle extreme values

If we are interested in combining indicators into a single index (as in Empirical Project 4), we may be concerned about extreme values, but still want to include these countries in the index (rather than excluding them from the calculations). When calculating summary statistics, we can deal with these extreme values by using the median instead of the mean.

On page 19 of the paper ‘Benchmarking financial systems around the world’, the authors discuss Winsorization (replacing extreme values with either the 95th or the 5th percentile value) as one way to handle these extreme values. Sometimes the extreme values are due to peculiar features of a single country, so we might want to adjust the data to make the values ‘less extreme’.

  1. For an indicator you have used in Questions 3 and 4 and for the year 2010:
  • Calculate the 95th and 5th percentile value of that indicator, across all countries. (Hint: Use Excel’s PERCENTILE.INC function.)
  • Replace any value larger than the 95th percentile value with the 95th percentile value, and replace any value smaller than the 5th percentile value with the 5th percentile value. (Hint: Use Excel’s IF function.)
  • Use your ‘Winsorized’ values from Question 5(b) to calculate the average values of the indicator, by region and income group (separately). Compare these values to the simple averages from Question 3(a).

Part 10.2 Comparing financial stability before and after the 2008 global financial crisis

Learning objectives for this part

  • use confidence intervals to assess changes in the stability of financial institutions before and after the 2008 global financial crisis.

Now we will assess whether financial stability (measured by the two indicators in Figure 10.1) has changed since the 2008 global financial crisis.

  1. For both indicators of stability in Figure 10.1, explain what effect the post-crisis banking regulations are likely to have on the value of the indicator (for example, would the value increase or decrease?), and why. You may find it helpful to research the regulations that were implemented as a result of the 2008 global financial crisis.
  1. For the years 2007 and 2014:
  1. For each indicator:
leverage ratio (for banks or households)
The value of assets divided by the equity stake (capital contributed by owners and shareholders) in those assets.