Empirical Project 10 Working in Google Sheets

Part 10.1 Summarizing the data

Learning objectives for this part

  • compare characteristics of banking systems around the world and across time
  • use box and whisker plots to summarize distributions and identify outliers
  • calculate weighted averages and explain the differences between weighted and simple averages.

We will be using the World Bank’s Global Financial Development Database.

Download the data and documentation:

  • Go to the Global Financial Development Database. At the bottom of the page, click the ‘June 2017 Version’ to download the Excel file. (You can download a later version of the data, though your results may be slightly different from those shown here.)
  • The paper ‘Benchmarking financial systems around the world’ gives an overview of the data. You may find it helpful to read Section 3 (pages 7–9) for a summary of the framework used to measure financial systems.

The World Bank’s Global Financial Development Database contains information about four categories:

  • financial depth: the size of financial institutions and markets
  • access: the degree to which individuals are able to use financial services
  • stability of financial institutions and markets
  • efficiency of financial intermediaries and markets in facilitating financial transactions.

We will be looking at the first three categories, focusing particularly on measures of stability before and after the 2008 global financial crisis. Each category is measured by a number of indicators. Figure 10.1 shows the indicators we will be using in this project. (Note that in other versions of the dataset, the indicators may be in lowercase instead of uppercase.)

Category Indicator name Indicator code
Depth Private credit by deposit money banks to GDP (%) GFDD.DI.01
  Deposit money banks’ assets to GDP (%) GFDD.DI.02
Access Bank accounts per 1,000 adults GFDD.AI.01
  Bank branches per 100,000 adults GFDD.AI.02
  Firms with a bank loan or line of credit (%) GFDD.AI.03
  Small firms with a bank loan or line of credit (%) GFDD.AI.04
Stability Bank Z-score GFDD.SI.01
  Bank regulatory capital to risk-weighted assets (%) GFDD.SI.05

Figure 10.1 Indicators used in this project.

  1. The ‘Definitions and Sources’ tab in your spreadsheet contains a description of all indicators in the Database. Use the information provided in the ‘Short Description’ column to explain briefly why each of the indicators listed in Figure 10.1 may be a good measure of that category, or may give misleading information about that category. (You may find it helpful to conduct some research on these measures, especially if the explanation contains technical terms).

The ‘Data – June 2016’ tab contains the values of each indicator over time (1960–2014) for various countries around the world, though data may be missing for some countries and years.

box and whisker plot
A graphic display of the range and quartiles of a distribution, where the first and third quartile form the ‘box’ and the maximum and minimum values form the ‘whiskers’.

To get an idea of what the distribution of values for each variable looks like, we will use box and whisker plots. Box and whisker plots are useful for looking at the distribution of a single variable and checking if there are many extreme values (either very large or very small, relative to the rest of the values).

  1. Make a separate box and whisker plot for each indicator, with the outliers displayed (see Google Sheets walk-through 6.3 for help on how to do this). Comment on the overall distribution, the number of outliers, and suggest why there may be many outliers. (In Question 5, we will look at one way to handle extreme values if there is a concern that one or a few very extreme values will significantly affect the average.)

Now we will use Google Sheets’ PivotTable option to make summary tables of some indicators and look at how they have changed over time. Each country belongs to a particular region and income group.

  1. Choose one indicator in Depth and one indicator in Access:
  • Make tables showing the average of those indicators and number of observations (count), with ‘Region’ or ‘Income Group’ as the column variable(s) and ‘Year’ (2000–2014 only) as the row variable. (Make a separate table for region and income group.)
  • For each indicator chosen, make a line chart with the average indicator value (either for region or income group) as the vertical axis variable, and year as the horizontal axis variable. Comment on any patterns you see across regions/income groups and time.

So far, we have been looking at simple averages where each observation is given the same weight (importance), so we simply add up all the numbers and divide by the number of observations. However, when we take averages across regions or income groups, we may want to account for the fact that countries differ in size (population or GDP). For example, if one country is far larger than another, we may want that country to have a larger influence on the average. See the box below for more about weighted averages.

weighted average
A type of average that assigns greater importance (weight) to some components than to others, in contrast with a simple average, which weights each component equally. Components with a larger weight can have a larger influence on the average.

Weighted averages

An example of weighted averages that you have probably experienced is in calculating course grades. Usually, course grades are not calculated by simply summing up the scores in all components and dividing by the number of components. Instead, certain components such as the final exam are given more importance (influence over the overall grade) than the midterm exam or course assignments.

To calculate the weighted average, we first determine the weight of each component (these are fractions or proportions that sum to 1). Then we multiply each component by its respective weight, and then sum over all components. Using the course grade as an example, suppose the final exam is worth 70% of the final grade and the midterm exam is worth 30%, with both being scored out of 100. Then the weighted average would be:

In comparison, the simple average would give both components equal weight:

To develop your intuition for this concept, you can experiment by choosing values for the final exam score and midterm exam score and seeing how a change in one of the scores affects the weighted and simple averages.

The indicator ‘Bank regulatory capital to risk-weighted assets (%)’ in the Database also uses weights to account for the fact that some assets are riskier than others, and should therefore not be considered equally.

We will practice calculating weighted averages for the indicator ‘Bank accounts per 1,000 adults’, weighting according to total population in each region (so countries with a larger population will have a larger influence on the average). Since data is missing for some countries, we will calculate the total population in each region as the total population for countries with non-missing data.

  1. For each region and for the years 2004–2014:
  • In the ‘Data – June 2016’ tab, create a new variable for the weight, which is the ratio of ‘SP.POP.TOTL’ and the sum of ‘SP.POP.TOTL’ for each country within the relevant region and year. Only use observations that have non-missing values for the ‘GFDD.AI.01’ indicator variable (Bank accounts per 1,000 adults). (Hint: Use Google Sheets’ IF function to separate data with non-missing values, then use Google Sheets’ SUMIFS function, conditioning on ‘Region’ and ‘Year’.)
  • Check that your answer is correct by filtering the data for a particular region and year and verifying that the weights sum to 1.
  • Now multiply the ‘Bank accounts per 1,000 adults’ indicator by the weights in Question 4(a) and sum up the resulting values according to region to get the weighted average. (Hint: Google Sheets’ PivotTable option can help you sum the values and put them in a table.)
  • Compare your answers to Question 4(c) with the corresponding simple averages in Question 3(a) and comment on any similarities or differences.

Extension Using Winsorization to handle extreme values

If we are interested in combining indicators into a single index (as in Empirical Project 4), we may be concerned about extreme values, but still want to include these countries in the index (rather than excluding them from the calculations). When calculating summary statistics, we can deal with these extreme values by using the median instead of the mean.

On page 19 of the paper ‘Benchmarking financial systems around the world’, the authors discuss Winsorization (replacing extreme values with either the 95th or the 5th percentile value) as one way to handle these extreme values. Sometimes the extreme values are due to peculiar features of a single country, so we might want to adjust the data to make the values ‘less extreme’.

  1. For an indicator you have used in Questions 3 and 4 and for the year 2010:
  • Calculate the 95th and 5th percentile value of that indicator, across all countries. (Hint: Use Google Sheets’ PERCENTILE.INC function.)
  • Replace any value larger than the 95th percentile value with the 95th percentile value, and replace any value smaller than the 5th percentile value with the 5th percentile value. (Hint: Use Google Sheets’ IF function.)
  • Use your ‘Winsorized’ values from Question 5(b) to calculate the average values of the indicator, by region and income group (separately). Compare these values to the simple averages from Question 3(a).

Part 10.2 Comparing financial stability before and after the 2008 global financial crisis

Learning objectives for this part

  • use confidence intervals to assess changes in the stability of financial institutions before and after the 2008 global financial crisis.

Now we will assess whether financial stability (measured by the two indicators in Figure 10.1) has changed since the 2008 global financial crisis.

  1. For both indicators of stability in Figure 10.1, explain what effect the post-crisis banking regulations are likely to have on the value of the indicator (for example, would the value increase or decrease?), and why. You may find it helpful to research the regulations that were implemented as a result of the 2008 global financial crisis.
  1. For the years 2007 and 2014:
  • Use Google Sheets’ PivotTable option to make tables showing the average of those indicators, with ‘Region’ or ‘Income Group’ as the row variable(s) and ‘year’ (2000–2014 only) as the column variable. (Make a separate table for region and income group.)
  • Add a column showing the difference in means (2014 minus 2007).
  • Add four extra columns containing the standard deviation and number of observations for each year. Calculate the standard deviation for the difference in means, and the number of observations in both years.
  • Use Google Sheets’ CONFIDENCE.T function to calculate the 95% confidence interval ‘width’ of the difference in means (the distance from one end of the interval to the mean). (See Part 8.3 of Empirical Project 8 for help on how to do this.)
  1. For each indicator:
  • Plot column charts (one for regions, one for income groups) showing the differences on the vertical axis and indicator on the horizontal axis. Add the confidence intervals from Question 2(d) to your charts.
leverage ratio (for banks or households)
The value of assets divided by the equity stake (capital contributed by owners and shareholders) in those assets.
  • Interpret your findings. Is there evidence that stability has increased since the 2008 global financial crisis? (Note that ‘Bank regulatory capital to risk-weighted assets’ is inversely related to the leverage ratio. High leverage ratios were common in the lead-up to the 2008 crisis, and contributed to financial instability.)