6. Measuring management practices Working in Python

Download the code

To download the code chunks used in this project, right-click on the download link and select ‘Save Link As…’. You’ll need to save the code download to your working directory, and open it in Python.

Don’t forget to also download the data into your working directory by following the steps in this project.

Getting started in Python

Head to the ‘Getting started in Python’ page for help and advice on setting up a Python session to work with. Remember, you can run any page from this book as a notebook by downloading the relevant file from this repository and running it on your own computer. Alternatively, you can run pages online in your browser over at Binder.

Preliminary settings

Let’s import the packages we’ll need and also configure the settings we want:

import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
from pathlib import Path
import pingouin as pg
from lets_plot import *

LetsPlot.setup_html(no_js=True)

### You don't need to use these settings yourself
### — they are just here to make the book look nicer!
# Set the plot style for prettier charts:
plt.style.use(
    "https://raw.githubusercontent.com/aeturrell/core_python/main/plot_style.txt"
)

Part 6.1 Looking for patterns in the survey data

Learning objectives for this part

  • Explain how survey data is collected, and describe measures that can increase the reliability and validity of survey data.
  • Use column charts and box and whisker plots to compare distributions.
  • Calculate conditional means for one or more conditions, and compare them on a bar chart.
  • Use line charts to describe the behaviour of real-world variables over time.

First download the data used in the paper to understand how this information was collected. The data is publicly available and free of charge.

  1. To learn about how Bloom et al. (2012) conducted their survey, read the sections ‘How Can Management Practices Be Measured?’ and ‘Validating the Management Data’ (pages 5–9) of their paper.
  • Briefly describe how the interviews with managers were conducted, and explain some of the methods that the researchers used to improve the reliability and validity of their data. (There are a few technical terms that you may not understand, but these are not necessary for answering this question.)
  • Three aspects of management practices were evaluated: monitoring, targets, and incentives. Do you think that these are the best criteria for assessing management practices? What (if any) important aspects of management are not included in this assessment? (You may also find it helpful to refer to the ‘Contingent Management’ section on pages 23–25 of the paper.)

Now we will create some charts to summarize the data and make comparisons across countries, industries (manufacturing, healthcare, retail, and education), and firm characteristics.

  1. In ‘Manufacturing: 2004–2010 combined survey data (AMP)’, open the file ‘AMP_graph_manufacturing.csv’. Use this data on manufacturing firms to do the following:
  • Create a table like Figure 6.2a, that shows the average management scores for all the firms in each of the twenty countries, and fill it in with the required values. The variables for the overall score and three individual criteria are ‘management’, ‘monitor’, ‘target’, and ‘people’. You may find it helpful to refer to Python walk-through 3.3 if you need guidance. For each criterion, rank countries from highest to lowest, then create and fill in a table like Figure 6.2b (see Python walk-through 4.8 for help on how to assign ranks). Do countries with a high overall rank also tend to rank highly on individual criteria?
Country Overall management (mean) Monitoring management (mean) Targets management (mean) Incentives management (mean)
         
         

Figure 6.2a Mean of management scores.

Country Overall management (rank) Monitoring management (rank) Targets management (rank) Incentives management (rank)
         
         

Figure 6.2b Rank according to management scores.

  • Create a bar chart that shows the average overall management score (the variable management) for each country, ordered from highest to lowest. (Hint: You will need to sort your data from highest to lowest, so that it appears correctly in the chart.) Your chart should look similar to Figure 6.1.
  • Compare your chart with Figure 1 in Bloom et al. (2012). Can you explain why your chart is slightly different? (Hint: See the note at the bottom of Figure 1.)

Python walk-through 6.1 Importing data into Python and creating tables and charts

Before opening an Excel or CSV file using Python, you can open the file in spreadsheet software (such as Excel) to understand how it’s structured. From looking at the file, we learn that:

  • The variable names are in the first row (no need to use the skiprows keyword argument).
  • Missing values are represented by empty cells.
  • The last variable is in Column S, with short variable descriptions in Column U: it is easier to import everything first and remove the unnecessary data afterwards.

We will call our imported data df.

df = pd.read_csv(Path("data/AMP_graph_manufacturing.csv"))
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9207 entries, 0 to 9206
Data columns (total 21 columns):
 #   Column                                    Non-Null Count  Dtype  
---  ------                                    --------------  -----  
 0   management                                9207 non-null   float64
 1   monitor                                   9206 non-null   float64
 2   target                                    9207 non-null   float64
 3   people                                    9207 non-null   float64
 4   lemp_firm                                 9207 non-null   float64
 5   export                                    3652 non-null   float64
 6   competition                               8248 non-null   float64
 7   ownership                                 8540 non-null   object 
 8   mne_country                               4279 non-null   object 
 9   mne_f                                     9207 non-null   int64  
 10  mne_d                                     9207 non-null   int64  
 11  degree_m                                  8150 non-null   float64
 12  degree_nm                                 7942 non-null   float64
 13  country                                   9207 non-null   object 
 14  competition2004                           732 non-null    float64
 15  year                                      8929 non-null   float64
 16  sic                                       9207 non-null   int64  
 17  lb_employindex                            9137 non-null   float64
 18  pppgdp                                    9207 non-null   float64
 19  Unnamed: 19                               0 non-null      float64
 20                storage  display     value  22 non-null     object 
dtypes: float64(14), int64(3), object(4)
memory usage: 1.5+ MB

The penultimate column with values, Column T, was imported as "Unnamed: 19" and only contains NaNs. The final column of values that has been imported into pandas comes from Column U in the spreadsheet and contains information about the variables (named "storage display value").

Let’s extract the information about the variables in a new pandas series called man_varinfo and then remove both of these columns from the dataset. To make it easier to see the man_varinfo variable labels, we’ll temporarily override pandas column width limits.

man_varinfo = df.iloc[:, -1].dropna()

with pd.option_context("display.max_colwidth", 80):
    print(man_varinfo)
0                        variable name   type   format      label      variable label
1     ----------------------------------------------------------------------------...
2     management      float  %9.0g                * Average of all management ques...
3             monitor         float  %9.0g                  Average of perf1 to perf5
4            target          float  %9.0g                  Average of perf6 to perf10
5         people          float  %9.0g                  Average of talent1 to talent6
6     lemp_firm       float  %9.0g                  Log of 'No. of firm employees ...
7              export          double %10.0g               * % of production exported
8                    competition     byte   %12.0g               * No. of competitors
9                    ownership       str33  %33s                 * Who owns the firm?
10    mne_country     str19  %19s                 * Country of origin of multinati...
11                   mne_f           byte   %9.0g                  = 1 if foreign MNE
12                  mne_d           byte   %9.0g                  = 1 if domestic MNE
13    degree_m        byte   %8.0g                * % of managers with a college d...
14    degree_nm       float  %8.0g                * % of non-managers with a colle...
15    country         str19  %19s                   Country in which plant is located
16    competition2004 byte   %9.0g                  1=No competitors, 2=A few comp...
17    year            int    %9.0g                * SENSITIVE: Accts: Year of Acco...
18    sic             int    %8.0g                * Three digit US SIC 1987 code (...
19    lb_employindex  byte   %10.0g               * WB: Rigidity of employment ind...
20    pppgdp          float  %9.0g                * IMF: GDP based on PPP valuatio...
21                                                                          Billions)
Name:               storage  display     value, dtype: object

Now that we’ve viewed the data, we can drop the last two columns from df using slicing. The syntax is .iloc[:, :-i] where i is the number of rows we wish to drop.

df = df.iloc[:, :-2]
df.head()
management monitor target people lemp_firm export competition ownership mne_country mne_f mne_d degree_m degree_nm country competition2004 year sic lb_employindex pppgdp
0 3.500000 3.60 3.6 3.500000 5.991465 NaN NaN NaN NaN 0 0 NaN NaN United States 3.0 2004.0 382 0.0 11,867.75
1 3.166667 3.80 2.6 2.500000 6.396930 NaN 2.0 Dispersed Shareholders NaN 0 0 100.0 75.0 United States NaN 2006.0 382 0.0 13,398.93
2 3.000000 2.80 3.6 3.000000 7.600903 70.0 4.0 Dispersed Shareholders US 0 1 100.0 5.0 United States NaN 2006.0 382 0.0 14,119.05
3 2.411765 2.75 2.4 2.666667 8.039157 NaN NaN NaN NaN 0 0 NaN NaN United States NaN 2002.0 308 NaN 10,642.30
4 4.444445 4.60 4.4 4.333333 5.241747 NaN NaN NaN NaN 0 0 NaN NaN United States 3.0 2004.0 281 0.0 11,867.75

A few of the variables that have been imported as numbers are actually categorical variables: "mne_f" , "mne_d", and "competition2004". pandas doesn’t automatically know what datatypes each variable should have. However, we can set the type of these variables as categorical and we can use labels to define what each of the numbers in the variables represents.

The first two (foreign/domestic multinational enterprise) have clear labels. For the third (number of competitors), we’ll use some string manipulation tools (split) to grab the labels directly from the man_varinfo variable.

lab_mne_f = ["No MNE_f", "MNE_f"]
lab_mne_d = ["No MNE_d", "MNE_d"]
lab_comp2004 = man_varinfo.iloc[16].split("  ")[-1].split(",")
print(lab_comp2004)
['1=No competitors', '2=A few competitors', '3=Many competitors']

The third line of code above is doing a lot of work here. When you do coding, you will often use someone else’s code as a starting point for your code, and trying to figure out what some code does is therefore a very important skill. Below, you can see a dissection of what each part of this line does.

explain_1 = man_varinfo.iloc[16]
print(explain_1)
print("\n")  # Prints new line
explain_2 = man_varinfo.iloc[16].split("  ")
print(explain_2)
print("\n")  # Prints new line
explain_3 = man_varinfo.iloc[16].split("  ")[-1]
print(explain_3)
print("\n")  # Prints new line
explain_4 = man_varinfo.iloc[16].split("  ")[-1].split(",")
print(explain_4)
competition2004 byte   %9.0g                  1=No competitors, 2=A few competitors, 3=Many competitors

['competition2004 byte', ' %9.0g', '', '', '', '', '', '', '', '', '1=No competitors, 2=A few competitors, 3=Many competitors']

1=No competitors, 2=A few competitors, 3=Many competitors

['1=No competitors', ' 2=A few competitors', ' 3=Many competitors']

Let’s now encode these variables as categorical variables, with suitable names.

for col in ["mne_f", "mne_d", "competition2004"]:
    df[col] = pd.Categorical(df[col])


df["mne_f"] = df["mne_f"].cat.rename_categories(lab_mne_f)
df["mne_d"] = df["mne_d"].cat.rename_categories(lab_mne_d)
df["competition2004"] = df["competition2004"].cat.rename_categories(lab_comp2004)

When you create new labels, check that they have been attached to the correct entries. Although we set them as a list here using, for example, ["No MNE_f", "MNE_f"], we could have also used a dictionary, for example {0: "No MNE_f", 1: "MNE_f"}, mapping old values into new.

To create the tables, we can use a technique called method chaining. Rather than have a series of separate assignment commands using =, method chaining ‘chains’ together a series of methods (these are preceded by .). This approach has pros and cons: it can be easier to read, but harder to debug in case of errors.

First, we will group data by country (using .groupby), then calculate the required aggregate statistics for each of these groups (using .agg), then order the countries according to their overall score (highest to lowest) (.sort_values).

.groupby groups the data according to a given column.

.agg aggregates data, and returns it with a different index. In combination with .groupby, it will return an index based on what column(s) was/were passed to the groupby operation. There are many ways to use .agg, including just setting .agg.mean() and other functions such as count(), median(), sum(), and std(). However, to change the column names of the returned series, you can either pass a dictionary, say {"columnname": "mean"}, or pass an object called a ‘tuple’ that species the new column name, old column name, and the aggregation function, for example newcolumnname = ("columnname", "count").

table_mean = (
    df.groupby("country")
    .agg(
        obs=("management", "count"),
        m_overall=("management", "mean"),
        m_monitor=("monitor", "mean"),
        m_target=("target", "mean"),
        m_incentives=("people", "mean"),
    )
    .sort_values(by="m_overall", ascending=False)
)

table_mean.round(2)
obs m_overall m_monitor m_target m_incentives
country
United States 1,225 3.35 3.58 3.26 3.25
Germany 646 3.23 3.57 3.22 2.98
Japan 176 3.23 3.50 3.34 2.92
Sweden 388 3.21 3.64 3.19 2.83
Canada 385 3.17 3.55 3.07 2.94
UK 1,242 3.03 3.34 2.98 2.86
France 613 3.03 3.43 2.97 2.74
Italy 289 3.03 3.26 3.10 2.76
Australia 392 3.02 3.29 3.02 2.74
New Zealand 106 2.93 3.18 2.96 2.63
Mexico 189 2.92 3.29 2.88 2.71
Poland 351 2.90 3.12 2.94 2.83
Republic of Ireland 106 2.89 3.14 2.81 2.79
Portugal 247 2.87 3.27 2.83 2.59
Chile 317 2.83 3.14 2.72 2.67
Argentina 249 2.76 3.08 2.68 2.56
Greece 251 2.73 2.97 2.66 2.58
China 746 2.71 2.90 2.63 2.69
Brazil 569 2.71 3.06 2.69 2.55
India 720 2.67 2.91 2.66 2.63

Let’s make a table showing the ranks. We can use the .agg function again, but in this case it keeps the same index as before because we’re not using groupby.

We will also drop the "obs" column, and rename all of the columns so that they begin with "r" (for rank) rather than "m".

table_rank = table_mean.agg("rank", ascending=False).drop("obs", axis=1)
table_rank.columns = [x.replace("m_", "r_") for x in table_rank.columns]
table_rank
r_overall r_monitor r_target r_incentives
country
United States 1.0 2.0 2.0 1.0
Germany 2.0 3.0 3.0 2.0
Japan 3.0 5.0 1.0 4.0
Sweden 4.0 1.0 4.0 6.0
Canada 5.0 4.0 6.0 3.0
UK 6.0 7.0 8.0 5.0
France 7.0 6.0 9.0 11.0
Italy 8.0 11.0 5.0 9.0
Australia 9.0 8.0 7.0 10.0
New Zealand 10.0 12.0 10.0 16.0
Mexico 11.0 9.0 12.0 12.0
Poland 12.0 15.0 11.0 7.0
Republic of Ireland 13.0 14.0 14.0 8.0
Portugal 14.0 10.0 13.0 17.0
Chile 15.0 13.0 15.0 14.0
Argentina 16.0 16.0 17.0 19.0
Greece 17.0 18.0 19.0 18.0
China 18.0 20.0 20.0 13.0
Brazil 19.0 17.0 16.0 20.0
India 20.0 19.0 18.0 15.0

Now we use lets_plot to create a bar chart of the "m_overall" value in table_mean. To present countries in order of their management score, we will first re-order the dataframe using sort_values.

table_mean = table_mean.sort_values(by="m_overall")
(
    ggplot(table_mean.reset_index(), aes(y="country", x="m_overall"))
    + geom_bar(stat="identity", orientation="y")
    + labs(
        x="Average management practice score",
        y="Country",
        title="Management practices in manufacturing firms",
    )
)
Management practices in manufacturing firms around the world.
Fullscreen

Figure 6.3 Management practices in manufacturing firms around the world.

If you want to switch the order of the bars (so the smallest value is the topmost bar instead of the bottommost bar), use table_mean.sort_values(by="m_overall", ascending=False) before creating the plot.

To look at how management quality varies within countries, instead of just looking at the mean, we can use column charts to visualize the entire distribution of scores (as in Empirical Project 1). To compare distributions, we have to use the same horizontal axis, so we will first need to make a frequency table for each distribution to be used. Also, since each country has a different number of observations, we will use percentages instead of frequencies as the vertical axis variable.

  1. For three countries of your choice and for the US, carry out the following:
  • Using the overall management score (variable ‘management’), create and fill in a frequency table similar to Figure 6.4 below for the US, and separately for each chosen country. The values in the first column should range from 1 to 5, in intervals of 0.2.
Range of management score Frequency Percentage of firms (%)
1.00
1.20
4.80
5.00

Figure 6.4 Frequency table for overall management score.

  • Plot a column chart for each country to show the distribution of management scores, with the percentage of firms on the vertical axis and the range of management scores on the horizontal axis. On each country’s chart, plot the distribution of the US on top of that country’s distribution, as shown in Python walk-through 6.2.
  • Describe any visual similarities and differences between the distributions of your chosen countries and that of the US. (Hint: For example, look at where the distribution is centred, the percentage of observations on the left tail or the right tail of the distribution, and how spread out the scores are.)

Python walk-through 6.2 Obtaining frequency counts and plotting overlapping histograms

To get frequency counts, we first use numpy’s arange function (np.arange) to specify a list of bins. This function accepts arguments of the form arange(start, stop, step) to create intervals. Then we use the pd.cut function to count the number of observations that fall within the intervals specified.

We store this information in the pandas series chile_intervals. This gives the appropriate interval for each entry in the series. To return the counts, we need to aggregate this information to the total counts per bin, which we can do with value_counts. To keep the order of the intervals, we’ll specify value_counts(sort=False) too.

chile_intervals = pd.cut(
    df.loc[df["country"] == "Chile", "management"], bins=np.arange(0, 5, 0.2)
)
chile_counts = chile_intervals.value_counts(sort=False)
chile_counts
management
(0.0, 0.2]     0
(0.2, 0.4]     0
(0.4, 0.6]     0
(0.6, 0.8]     0
(0.8, 1.0]     0
(1.0, 1.2]     0
(1.2, 1.4]     1
(1.4, 1.6]     3
(1.6, 1.8]     6
(1.8, 2.0]    24
(2.0, 2.2]    15
(2.2, 2.4]    30
(2.4, 2.6]    25
(2.6, 2.8]    49
(2.8, 3.0]    52
(3.0, 3.2]    28
(3.2, 3.4]    27
(3.4, 3.6]    21
(3.6, 3.8]    20
(3.8, 4.0]     9
(4.0, 4.2]     6
(4.2, 4.4]     1
(4.4, 4.6]     0
(4.6, 4.8]     0
Name: count, dtype: int64

We’ve obtained the information for the histogram in a data table format. If instead we want to plot a histogram, we have a few options.

If we just want a quick look at the data, we can use pandas built-in histogram function, .plot.hist(). The default chart will not contain a horizontal axis label.

country_to_hist = "Chile"
df.loc[df["country"] == country_to_hist, "management"].plot.hist();
Distribution of management scores, Chile (default chart).
Fullscreen

Figure 6.5 Distribution of management scores, Chile (default chart).

You can of course add plot.hist() to an existing (matplotlib) plot if you like, and turn it into a neater figure. Here’s an example of how to do that:

min_val, max_val, step = 0, 5, 0.2

fig, ax = plt.subplots()
df.loc[df["country"] == country_to_hist, "management"].hist(
    bins=np.arange(min_val, max_val, step)
).plot(ax=ax)
ax.set_xlabel("Management score")
ax.set_ylabel("Count")
ax.set_xlim(min_val, max_val)
ax.set_axisbelow(True)
ax.set_title(f"Histogram of management scores for {country_to_hist}")
plt.show()
Distribution of management scores, Chile (neater chart).
Fullscreen

Figure 6.6 Distribution of management scores, Chile (neater chart).

We created a figure and chart axes (called ax) first, then put the information on them by calling .plot(ax=ax). Then we added contextual information such as a title and axes labels.

An alternative way of generating histograms is using lets_plot. matplotlib is great for when you need flexibility and various customisation options, but as histograms are such common chart types, they’re covered by lets_plot.

(
    ggplot(df.loc[df["country"] == country_to_hist, :], aes(x="management"))
    + geom_histogram(color="black", alpha=0.5)
    + labs(
        x="Management score",
        y="Counts",
        title=f"Histogram of management scores for {country_to_hist}",
    )
)
Distribution of management scores, Chile.
Fullscreen

Figure 6.7 Distribution of management scores, Chile.

Using matplotlib, it is possible to add a second country to the same chart, but in this case we’re going to use lets_plot. If you did want to use matplotlib, then you could look up how to do it on the internet. Everyone, no matter how expert they are at coding, uses the internet to grab snippets that help them achieve what they want. In this case, you could try searching for ‘Plot two histograms on the same graph matplotlib’.

lets_plot will also allow us to have two histograms on the same chart.

We’re going to demonstrate this by using similar code to above to plot a second country too, the United States.

country_to_hist = "Chile"
scnd_country_hist = "United States"

df = df.rename(columns={'country': 'Country'}) # Chart legend title

(
    ggplot(
        df.loc[df["Country"].isin(["Chile", "United States"]), :],
        aes(x="management", fill="Country"),
    )
    + geom_histogram(alpha=0.6, color="black", position="identity")
    + labs(
        x="Management score",
	  y=Counts
        title=f"Management scores for {country_to_hist} and {scnd_country_hist}",
        caption="Source: World Management Survey",
    )
)
Comparing the distribution of management scores for the US and Chile.
Fullscreen

Figure 6.8 Comparing the distribution of management scores for the US and Chile.

Ideally we would want to put these two countries on the same chart but normalized to show proportions or percentages rather than frequencies, to make it easier to compare distributions across countries. Unfortunately, lets_plot doesn’t yet have a geom_histogram option that will normalize the data.

box and whisker plot
A graphic display of the range and quartiles of a distribution, where the first and third quartile form the ‘box’ and the maximum and minimum values form the ‘whiskers’.

Another way to visualize distributions is a box and whisker plot, which shows some parts of a distribution rather than the whole distribution. We can use box and whisker plots to compare particular aspects of distributions more easily than when looking at the entire distribution.

As shown in Figure 6.9, the ‘box’ consists of the first quartile (value corresponding to the bottom 25 per cent, or 25th percentile, of all values), the median, and the third quartile (75th percentile). The ‘whiskers’ are the minimum and maximum values. (In Python, the ‘whiskers’ may not be the actual maximum or minimum, since any values larger than 1.5 times the width of the box are considered outliers and are shown as separate points.)

Example of a box and whisker plot. (Note: The mean is not shown in Python’s default chart setting, but is denoted here by X. In general, the median may not be in the centre of the box, and can differ greatly from the mean. Using the data shown in Figure 6.9 for a variable from the dataset, the mean and median are very similar.)
Fullscreen

Figure 6.9 Example of a box and whisker plot.
(Note: The mean is not shown in Python’s default chart setting, but is denoted here by X. In general, the median may not be in the centre of the box, and can differ greatly from the mean. Using the data shown in Figure 6.9 for a variable from the dataset, the mean and median are very similar.)

  1. Using the same countries you chose in Question 3:
  • Make a box and whisker plot for each country and the US, showing the distribution of management scores. You can either make a separate chart for each country or show all countries in the same plot. To check that your plots make sense, compare your box and whisker plots to the distributions from Question 3.
  • Use your box and whisker plots to add to your comparisons from Question 3(c).

Python walk-through 6.3 Drawing box and whisker plots

We will use a similar code structure as we did for the overlapping histograms, this time plotting countries on the horizontal axis and management scores on the vertical axis. Instead of laboriously specifying a different box and whisker plot for each country independently, we will specify a list of countries to plot, then use df["country"].isin() to filter the dataframe for each country in the list and use boxplot(..., by="country", ...) to make the chart.

countries_to_include = ["Chile", "United States", "Brazil", "Germany", "UK"]

(
    ggplot(
        df.loc[df["Country"].isin(countries_to_include), :],
        aes(x="Country", y="management"),
    )
    + geom_boxplot()
    + labs(
        y="Management score", x="Country", title="Management scores grouped by country"
    )
)
Box and whisker plots for a selection of countries.
Fullscreen

Figure 6.10 Box and whisker plots for a selection of countries.

From the manufacturing data, firms in the US seem to be managed better (on average) than firms in other countries. To investigate whether this is the case in other sectors, we will use data gathered on hospitals and schools.

  1. Using the data for hospitals and schools (AMP_graph_public.csv):
  • Create a table for hospitals and schools, showing the mean management score and criteria score (monitoring, targets, incentives) for each country, as in Figure 6.2a.
  • Make separate bar charts for hospitals and schools showing the mean overall management score for each country, sorted from highest to lowest, as in Figure 6.1. Are the country rankings similar to those in manufacturing?
  • Using your average criteria scores from Question 5(a), suggest some explanations for the observed rankings in either hospitals or schools. (You may find it helpful to research healthcare or educational policies and reforms in those countries to support your explanations.)

Part 6.2 Do management practices differ between countries?

Learning objectives for this part

  • Calculate conditional means for one or more conditions, and compare them on a bar chart.
  • Construct confidence intervals and use them to assess differences between groups.

Using the management survey data collected by Bloom et al. (2012), we can compare average management scores across countries and industries. When we find differences between groups in the survey, we are interested in what that tells us about the true differences in management practices between the countries.

confidence interval
A range of values centred around the sample mean value, with a corresponding percentage (usually 90%, 95%, or 99%). When we use a sample to calculate a 95% confidence interval, there is a probability of 0.95 that we will get an interval containing the true value of interest.

In Empirical Project 2, we used p-values to assess differences between groups. A p-value tells us how unusual it would be to observe the differences between groups that we did, assuming our hypothesis is correct (and additional model assumptions about the data hold). If the p-value is small, we might then conclude that the data is not compatible with our assumptions (for example, that the two groups were drawn from populations with the same mean), and that there is a real difference between the underlying populations. Now we will use another method that helps us to allow for random variation when we interpret data, called a confidence interval.

When we work with data, we usually have only a small sample from the entire population of interest. For example, the World Management Survey collects information from a selection of all the firms in a particular country. If we calculate the average management score for the sample, we have an estimate of the average management score across all firms in the country (the ‘true value’), but it may not be a very accurate estimate⁠—especially if the sample is small and management scores vary a lot between firms.

A 95% confidence interval is a range of possible values within which the true value might lie. It is estimated from the mean and standard deviation of the data. As in the process of calculating p-values, we use a standard method that gives a good estimate provided that certain statistical conditions are satisfied. We cannot be certain that the true value lies in the range; we might have the bad luck to pick an atypical sample, in which case our estimate of the confidence interval would be atypical too. But we can say that when we use this method, there is a 95% probability that we will find an interval that contains the true value. One way to interpret this is to say that if we were able to repeat the process of sampling and calculating confidence intervals many times, roughly 95% of these confidence intervals would contain the true value.

As the name suggests, confidence intervals tell us how much confidence we can place in our estimates, or in other words, how precisely the sample mean is estimated. The confidence interval gives us a margin of error for our estimate of the true value. If the data varies a lot, the 95% confidence interval may be quite wide. If we have plenty of data, and the standard deviation is low, the estimate will be more precise and the 95% interval will be narrow.

Rule of thumb for comparing means

When comparing two distributions, if neither mean is in the 95% confidence interval for the other mean, the p-value for the difference in means is less than 5%.

This rule of thumb is handy when looking at charts. If two 95% confidence intervals don’t overlap, we can say immediately that the observed difference between the means for the two groups is unlikely to have arisen by chance alone (given that our hypothesis that there were no differences in the populations from which these groups were drawn and that our other assumptions about the data, such as random sampling, are correct). For a more definite conclusion, we can calculate the actual p-value (see Empirical Project 2) or construct a confidence interval for the difference in means. (This method involves more mathematics, so we will discuss that in Empirical Project 8.)

It is possible to calculate a confidence interval for any probability: however wide the 95% confidence interval, a 99% confidence interval would be wider, and an 80% one would be narrower. 95% is a common choice: it gives us quite a high degree of confidence, and to go higher tends to lead to very wide intervals. We will use 95% confidence intervals throughout this project.

To sum up: A confidence interval is a range of values centred around the sample mean value, with a corresponding percentage (usually 90%, 95%, or 99%). When we use a sample to calculate a 95% confidence interval, there is a probability of 0.95 that we will get an interval containing the true value of interest.

We will now build on the results from the Bloom et al. (2012) paper by using 95% confidence intervals to make comparisons between the mean overall management score for different countries and types of firms. The confidence interval for the population mean (mean management score for that country) is centred around the sample mean. To determine the width of the interval, we use the standard deviation and number of firms.

  1. First look at manufacturing firms in different countries. Using the manufacturing data (AMP_graph_manufacturing.csv) for three countries of your choice and for the US:
  • Create a summary table for the overall management score as shown in Figure 6.11 below, with one row for each country.
Country Mean Standard deviation Number of firms
       
       

Figure 6.11 Summary table for manufacturing firms.

  • Determine the width of the 95% confidence interval (this is the distance from the mean to one end of the interval). See Python walk-through 6.4 for help on how to do this. You should get a different number for each country.
  • Plot a bar chart that shows the mean management score and add the confidence intervals to your chart.
  • Use the width of the confidence intervals to describe how precisely each mean was estimated.
  • Using your chart from Question 1(c) and the rule of thumb, what can you say about the differences between the US management score, and the scores of other countries? How would your results change if you use a different specified probability (for example, 99%)?

Python walk-through 6.4 Creating confidence intervals and adding them to a chart

As in Python walk-through 6.1, we use method chaining to do this task. First, we take the management data dataframe, df, and extract the countries we need using df.loc. Then, we group the data by country (using group_by), and calculate some of the required aggregate measures (agg) mapping the management score into three new variables; the mean, the standard deviation, and the number of observations (using the len function).

The final step is to compute the standard error from the new columns and sort the data according to the values of "mean_m". We save the final result in table_stats. We will display this table with values rounded (.round) to 2 decimal places.

countries_to_include = ["Chile", "United States", "Brazil", "Germany", "UK"]

table_stats = (
    df.loc[df["country"].isin(countries_to_include), :]
    .groupby("country")
    .agg(
        mean_m=("management", "mean"),
        sd_m=("management", "std"),
        obs=("management", len),
    )
    .assign(m_err=lambda x: 1.96 * np.sqrt(x["sd_m"] ** 2 / (x["obs"] - 1)))
    .sort_values("mean_m", ascending=False)
)

table_stats.round(2)
mean_m sd_m obs m_err
Country
United States 3.35 0.64 1,225 0.04
Germany 3.23 0.57 646 0.04
UK 3.03 0.68 1,242 0.04
Chile 2.83 0.60 317 0.07
Brazil 2.71 0.68 569 0.06

We can use this table as the basis of a bar chart in matplotlib to quickly visualise the mean management scores.

fig, ax = plt.subplots()
table_stats.plot.bar(ax=ax, y="mean_m", yerr="m_err", rot=0, capsize=5)
ax.set_ylim(2, 4)
ax.legend([])  # Turns legend off
ax.set_title("Mean management score across selected countries")
plt.show()
Bar chart of mean management score in manufacturing firms for a selection of countries, with 95% confidence intervals.
Fullscreen

Figure 6.12 Bar chart of mean management score in manufacturing firms for a selection of countries, with 95% confidence intervals.

  1. Using the data for hospitals or schools (AMP_graph_public.csv), using all available countries:
  • Create a summary table like Figure 6.11 for the overall management score, with one row for each country. Add a column containing the widths of the confidence intervals for the country means.
  • Plot a column chart that shows the confidence intervals. Compare the management practices in the US with those in other countries. Are there any countries for which you can be confident that management practices are either better, or worse, on average than in the US? Explain your answer.
  1. Look at the width of your confidence intervals and the corresponding standard deviation and number of observations for each one. Explain whether or not the relationship between them is what you would expect.

Part 6.3 What factors affect the quality of management?

Learning objectives for this part

  • Calculate conditional means for one or more conditions, and compare them on a bar chart.
  • Construct confidence intervals and use them to assess differences between groups.
  • Evaluate the usefulness and limitations of survey data for determining causality.

Besides documenting and comparing management practices across industries and countries, another purpose of the World Management Survey was to investigate factors that affect management quality.

One possible factor affecting differences in management is firm ownership. To look at the data for this factor in the healthcare and education sectors, we will focus on broad groups (public versus privately-owned firms), and for manufacturing firms we will focus on different kinds of private ownership.

  1. Using the data for hospitals and schools (AMP_graph_public.csv):
  • Create a table for hospitals and schools, showing the average management score, standard deviation (StdDev), and number of observations, with country as the row variable, and ownership (public or private) and ind as the column variables.
  • Use your table from Question 1(a) to calculate the confidence interval widths for management in public and private hospitals. Then do the same for schools.
  • Plot a bar chart (one for hospitals and another for schools) showing the means and the confidence intervals for the management score. Describe the differences between public and private firms within countries and compare management scores for the same firm type across countries. (For example, is one type of firm generally better managed than the other? Are there similar patterns for hospitals and schools?) If you have done Question 5 in Part 6.1, you may want to discuss whether the rankings change after conditioning on ownership type.

Besides ownership type, management practices may vary depending on firm size, though it is difficult to predict what the relationship between these variables might be. Larger firms have more employees and could be more difficult to manage well, but may also attract more experienced managers. We will look at the conditional means for manufacturing firms, depending on whether they are above or below the median number of employees (calculated from the data), and see if there is a clear relationship.

  1. Using the data for manufacturing firms (AMP_graph_manufacturing.csv):
  • Create a new variable that equals ‘Smaller’ if a firm has less than the median number of employees (330), and ‘Larger’ otherwise. In natural log terms, ‘Smaller’ corresponds to log employment of less than 5.80.
  • For two countries of your choice and the US, create a table that shows the mean overall management score, standard deviation, and number of observations, with country and ownership as the row variables, and firm size as the column variable. (Note: When there is only one observation in a group, there is no standard deviation.)
  • Use your table to calculate the confidence interval width for each firm size and ownership type.
  • Plot a bar chart for each country, showing the means and the confidence intervals. Describe any patterns you observe across ownership types and firm size in each country.

Python walk-through 6.5 Calculating and adding conditional summary statistics and confidence intervals to a chart

To do this, we will use many techniques encountered previously, but first we have to create a new variable called size that indicates whether a firm is larger or smaller than a certain threshold. A firm with a value of "lemp_firm" greater than 5.8 is considered ‘larger’. We use a new categorical column (based on a Boolean check on this condition) to make this distinction, and rename the Falses as "smaller" and the Trues as "larger".

df["size"] = pd.Categorical(df["lemp_firm"] > 5.8).rename_categories(
    {False: "smaller", True: "larger"}
)
df.head()
management monitor target people lemp_firm export competition ownership mne_country mne_f mne_d degree_m degree_nm country competition2004 year sic lb_employindex pppgdp size
0 3.500000 3.60 3.6 3.500000 5.991465 NaN NaN NaN NaN No MNE_f No MNE_d NaN NaN United States 3=Many competitors 2004.0 382 0.0 11,867.75 larger
1 3.166667 3.80 2.6 2.500000 6.396930 NaN 2.0 Dispersed Shareholders NaN No MNE_f No MNE_d 100.0 75.0 United States NaN 2006.0 382 0.0 13,398.93 larger
2 3.000000 2.80 3.6 3.000000 7.600903 70.0 4.0 Dispersed Shareholders US No MNE_f MNE_d 100.0 5.0 United States NaN 2006.0 382 0.0 14,119.05 larger
3 2.411765 2.75 2.4 2.666667 8.039157 NaN NaN NaN NaN No MNE_f No MNE_d NaN NaN United States NaN 2002.0 308 NaN 10,642.30 larger
4 4.444445 4.60 4.4 4.333333 5.241747 NaN NaN NaN NaN No MNE_f No MNE_d NaN NaN United States 3=Many competitors 2004.0 281 0.0 11,867.75 smaller

Let’s look at Canada, Brazil, and the United States. Again, we use method chaining to make the table. In the groupby command, we group the variables by size and ownership, as we did previously.

countries_to_include = ["Canada", "United States", "Brazil"]

table_stats2 = (
    df.loc[df["Country"].isin(countries_to_include), :]
    .groupby(["Country", "ownership", "size"])
    .agg(
        mean_m=("management", "mean"),
        sd_m=("management", "std"),
        obs=("management", len),
    )
)

table_stats2.round(2)
mean_m sd_m obs
Country ownership size
Brazil Dispersed shareholders smaller 3.06 0.67 28.0
larger 3.48 0.73 45.0
Family owned, external CEO smaller 2.82 0.73 8.0
larger 2.99 0.69 10.0
Family owned, family CEO smaller 2.50 0.67 80.0
larger 2.70 0.64 41.0
Founder smaller 2.35 0.52 124.0
larger 2.66 0.59 72.0
Government smaller 4.00 NaN 1.0
larger 2.44 1.18 2.0
Managers smaller 2.64 0.57 23.0
larger 2.51 0.63 7.0
Other smaller 2.57 0.40 13.0
larger 3.01 0.54 29.0
Private equity smaller 3.23 0.59 5.0
larger NaN NaN NaN
Private individuals smaller 2.69 0.71 39.0
larger 2.94 0.52 42.0
Canada Dispersed shareholders smaller 3.43 0.60 53.0
larger 3.52 0.58 53.0
Family owned, external CEO smaller 2.90 0.47 6.0
larger 3.31 0.49 9.0
Family owned, family CEO smaller 2.75 0.55 25.0
larger 3.02 0.61 14.0
Founder smaller 2.86 0.56 37.0
larger 3.01 0.69 14.0
Government smaller 3.00 NaN 1.0
larger NaN NaN NaN
Managers smaller 3.17 0.49 5.0
larger 3.01 0.57 5.0
Other smaller 3.15 0.44 16.0
larger 3.33 0.40 12.0
Private equity smaller 3.34 0.67 11.0
larger 3.12 0.58 21.0
Private individuals smaller 2.90 0.60 66.0
larger 3.46 0.45 37.0
United States Dispersed shareholders smaller 3.45 0.56 158.0
larger 3.50 0.56 295.0
Family owned, external CEO smaller 2.86 0.63 6.0
larger 3.45 0.54 22.0
Family owned, family CEO smaller 2.96 0.68 73.0
larger 3.44 0.58 42.0
Founder smaller 3.14 0.61 60.0
larger 3.14 0.51 28.0
Government smaller NaN NaN NaN
larger 4.06 NaN 1.0
Managers smaller 3.57 0.66 6.0
larger 3.80 0.73 6.0
Other smaller 3.06 0.74 21.0
larger 3.48 0.48 31.0
Private equity smaller 3.34 0.48 27.0
larger 3.50 0.43 27.0
Private individuals smaller 3.07 0.61 93.0
larger 3.40 0.68 68.0

Now we use the variable "size" as a column variable, so that we can see the summary statistics in two blocks of columns (separately for larger and smaller firms). Because "size" is the innermost index column of our three index columns (they are the three columns we passed to the groupby variable), we can use the unstack method to split the main table by the unique values in the "size" index. We also round the values to two decimal places using .round(2). The ‘smaller’ and ‘larger’ categories now appear under the three headings mean_m, sd_m, and obs separately.

table_stats2.unstack().round(2)
mean_m sd_m obs
size smaller larger smaller larger smaller larger
Country ownership
Brazil Dispersed shareholders 3.06 3.48 0.67 0.73 28.0 45.0
Family owned, external CEO 2.82 2.99 0.73 0.69 8.0 10.0
Family owned, family CEO 2.50 2.70 0.67 0.64 80.0 41.0
Founder 2.35 2.66 0.52 0.59 124.0 72.0
Government 4.00 2.44 NaN 1.18 1.0 2.0
Managers 2.64 2.51 0.57 0.63 23.0 7.0
Other 2.57 3.01 0.40 0.54 13.0 29.0
Private equity 3.23 NaN 0.59 NaN 5.0 NaN
Private individuals 2.69 2.94 0.71 0.52 39.0 42.0
Canada Dispersed shareholders 3.43 3.52 0.60 0.58 53.0 53.0
Family owned, external CEO 2.90 3.31 0.47 0.49 6.0 9.0
Family owned, family CEO 2.75 3.02 0.55 0.61 25.0 14.0
Founder 2.86 3.01 0.56 0.69 37.0 14.0
Government 3.00 NaN NaN NaN 1.0 NaN
Managers 3.17 3.01 0.49 0.57 5.0 5.0
Other 3.15 3.33 0.44 0.40 16.0 12.0
Private equity 3.34 3.12 0.67 0.58 11.0 21.0
Private individuals 2.90 3.46 0.60 0.45 66.0 37.0
United States Dispersed shareholders 3.45 3.50 0.56 0.56 158.0 295.0
Family owned, external CEO 2.86 3.45 0.63 0.54 6.0 22.0
Family owned, family CEO 2.96 3.44 0.68 0.58 73.0 42.0
Founder 3.14 3.14 0.61 0.51 60.0 28.0
Government NaN 4.06 NaN NaN NaN 1.0
Managers 3.57 3.80 0.66 0.73 6.0 6.0
Other 3.06 3.48 0.74 0.48 21.0 31.0
Private equity 3.34 3.50 0.48 0.43 27.0 27.0
Private individuals 3.07 3.40 0.61 0.68 93.0 68.0

So far, we have looked at associations between firm characteristics and management practices, but have not made any causal statements. We will now discuss the difficulties with making causal statements using this data and examine how we might determine the direction of causation.

  1. For each of the following variables, explain how it could affect management practices, and then explain how management practices could affect it:
  • education level of managers (percentage with a college degree)
  • number of competitors
  • firm size (number of employees).
  1. One way to establish the direction of causation is through a randomized field experiment. Read the discussion on pages 22–23 of the Bloom et al. paper (the section ‘Experimental Evidence on Management Quality and Firm Performance’) about one such experiment that was conducted in Indian textile factories.
  • Briefly describe the idea behind a randomized field experiment, and explain, with reference to the results of the experiment in India, whether we can use it to determine the direction of causation between management practice and firm performance. The paper ‘Does Management Matter? Evidence from India’ provides more details about the experiment (pages 9–10 are particularly useful).
  • Figure 12 in the paper shows productivity in treatment and control firms over time, with 95% confidence intervals. Use the information in the chart to describe the effect of the treatment on firm productivity.