Empirical Project 6 Working in Excel

Part 6.1 Looking for patterns in the survey data

First download the data used in the paper to understand how this information was collected. The data is publicly available and free of charge, but you will need to create a user account in order to access it.

• Go to the World Management Survey main page.
• In the top right corner, click ‘Register’, and then click the ‘Register’ button on the next page.
• Fill in the form with the required details, then click ‘Register’.
• An account activation link will be sent to the email you provided. Click on it to activate your account.
• Unzip the files in the downloaded zip folder into your working directory (the folder you will be working from).
• You may also find it helpful to download the Bloom et al. paper ‘Management practices across firms and countries’.
1. To learn about how Bloom et al. (2012) conducted their survey, read the sections ‘How Can Management Practices Be Measured?’ and ‘Validating the Management Data’ (pages 5–9) of their paper.
• Briefly describe how the interviews with managers were conducted, and explain some measures the researchers took to improve the reliability and validity of their data. (There are a few technical terms that you may not understand, but these are not necessary for answering this question.)
• Three aspects of management practices were evaluated: monitoring, targets, and incentives. Do you think that these are the best criteria for assessing management practices? What (if any) important aspects of management are not included in this assessment? (You may also find it helpful to refer to the ‘Contingent Management’ section on pages 23–25 of the paper.)

Now we will create some charts to summarize the data and make comparisons across countries, industries (manufacturing, healthcare, retail, and education), and firm characteristics.

1. In ‘Manufacturing: 2004–2010 combined survey data (AMP)’, open the Excel document ‘AMP_graph_manufacturing.csv’. Use this data on manufacturing firms to do the following:
• In a new tab, create two tables, as shown in Figures 6.2a and 6.2b, and calculate the required values. The variables for individual criteria are ‘monitor’, ‘target’, and ‘people’. You may find it helpful to use Excel’s PivotTable option—see Excel walk-through 3.1 if you need guidance. For each criterion, rank countries from highest to lowest (see Excel walk-through 4.4 for help on how to assign ranks). Do countries with a high overall rank also tend to rank highly on individual criteria?
Country Overall management (mean) Monitoring management (mean) Targets management (mean) Incentives management (mean)

Mean of management scores.

Figure 6.2a Mean of management scores.

Country Overall management (rank) Monitoring management (rank) Targets management (rank) Incentives management (rank)

Rank according to management scores.

Figure 6.2b Rank according to management scores.

• Create a bar chart showing the average overall management score (the variable ‘management’) for each country, ordered from highest to lowest. (Hint: You will need to sort your data from highest to lowest so it appears correctly in the chart.) Your chart should look similar to Figure 6.1.
• Compare your chart with Figure 1 in Bloom et al. (2012). Can you explain why your chart is slightly different? (Hint: See the note at the bottom of Figure 1.)

To look at how management quality varies within countries, instead of just looking at the mean we can use column charts to visualize the entire distribution of scores (as in Empirical Project 1). To compare distributions, we have to use the same horizontal axis, so we will first need to make a frequency table for each distribution to be used. Also, since each country has a different number of observations, we will use percentages instead of frequencies as the vertical axis variable.

1. For three countries of your choice and for the US, carry out the following:
• Using the overall management score (variable ‘management’), create a frequency table similar to Figure 6.3 below for the US, and separately for each chosen country. The values in the first column should range from 1 to 5, in intervals of 0.2. (Hint: To count observations for a specific country only, you will need to use the IF function and FREQUENCY function together, as shown in Excel walk-through 6.1).
Range of management score Frequency Percentage of firms (%)
1.00
1.20
4.80
5.00

Frequency table for overall management score.

Figure 6.3 Frequency table for overall management score.

• Plot a column chart for each country to show the distribution of management scores, with the percentage of firms on the vertical axis and the range of management scores on the horizontal axis. On each country’s chart, plot the distribution of the US on top of that country’s distribution, as shown in Excel walk-through 6.2.
• Describe any visual similarities and differences between the distributions of your chosen countries and that of the US. (Hint: For example look at where the distribution is centred, the percentages of observations on the left tail or the right tail of the distribution, and how spread out the scores are.)

Excel walk-through 6.1 Using Excel’s IF function

Follow the walk-through in the CORE video, or in Figure 6.4, to find out how to use Excel’s IF function.

How to use Excel’s IF function within another function.

Figure 6.4 How to use Excel’s IF function within another function.

The data

In this example, we will make a frequency table for the US data in Columns A to C. It’s a good idea to put all the tables in a separate place from the data.

Figure 6.4a In this example, we will make a frequency table for the US data in Columns A to C. It’s a good idea to put all the tables in a separate place from the data.

Calculating frequencies for a particular country

We want to calculate the frequency but only for firms in the US. To do this, add an IF condition within the FREQUENCY function to tell Excel which data to use.

Figure 6.4b We want to calculate the frequency but only for firms in the US. To do this, add an IF condition within the FREQUENCY function to tell Excel which data to use.

Calculating frequencies for a particular country

By putting the IF function inside the FREQUENCY function, Excel will only use the data that satisfies the condition we specified (firms in the US).

Figure 6.4c By putting the IF function inside the FREQUENCY function, Excel will only use the data that satisfies the condition we specified (firms in the US).

Calculating frequencies for a particular country

Afer step 4, all the frequency values will be filled in.

Figure 6.4d Afer step 4, all the frequency values will be filled in.

Using frequencies to calculate percentages

The $symbol in the formula tells Excel to keep these row or column numbers the same when copying the formula to other cells. We used it here because we are dividing the frequency value by the total number of observations (cells B2 to B22). Figure 6.4e The$ symbol in the formula tells Excel to keep these row or column numbers the same when copying the formula to other cells. We used it here because we are dividing the frequency value by the total number of observations (cells B2 to B22).

Excel walk-through 6.2 Overlaying one column chart over another

How to overlay one column chart over another.

Figure 6.5 How to overlay one column chart over another.

The data

In this example, we use data for the US (Columns A to C) and Chile (Columns F to H), and plot a column chart of the percentages in Columns C and H (see the steps in Figure 6.4 for how to calculate these).

Figure 6.5a In this example, we use data for the US (Columns A to C) and Chile (Columns F to H), and plot a column chart of the percentages in Columns C and H (see the steps in Figure 6.4 for how to calculate these).

Plot a column chart

After step 3, the column chart will look like the one shown above.

Figure 6.5b After step 3, the column chart will look like the one shown above.

Change the appearance of the columns

First, we will remove the gaps between the columns, and make the columns overlap (rather than being plotted side-by-side). After step 7, there will be a vertical axis on the left and right side of the chart.

Figure 6.5c First, we will remove the gaps between the columns, and make the columns overlap (rather than being plotted side-by-side). After step 7, there will be a vertical axis on the left and right side of the chart.

Change the vertical axis values

In order to compare the distributions, make sure the left and right vertical axis have the same labels.

Figure 6.5d In order to compare the distributions, make sure the left and right vertical axis have the same labels.

Change the horizontal axis values and series names

Now, change the horizontal axis values to match the data in our frequency tables.

Figure 6.5e Now, change the horizontal axis values to match the data in our frequency tables.

Change the horizontal axis values and series names

After step 12, the horizontal axis value for the selected columns will change.

Figure 6.5f After step 12, the horizontal axis value for the selected columns will change.

Change the horizontal axis values and series names

Now, we will change the series names to country names.

Figure 6.5g Now, we will change the series names to country names.

Change the horizontal axis values and series names

After step 14, the legend entry for that country will now be the country name.

Figure 6.5h After step 14, the legend entry for that country will now be the country name.

Change the horizontal axis values and series names.

Since we have two vertical axes on our chart (one for each data series), we need to change the horizontal axis labels for both axes before Excel will update the chart.

Figure 6.5i Since we have two vertical axes on our chart (one for each data series), we need to change the horizontal axis labels for both axes before Excel will update the chart.

Change the horizontal axis values and series names.

Finally, change the shading of the columns so we can see the distributions of both countries. After step 19, the distributions of both countries should be clearly visible.

Figure 6.5j Finally, change the shading of the columns so we can see the distributions of both countries. After step 19, the distributions of both countries should be clearly visible.

Example of two overlapping distributions on the same column chart.

Figure 6.6 Example of two overlapping distributions on the same column chart.

box and whisker plot
A graphic display of the range and quartiles of a distribution, where the first and third quartile form the ‘box’ and the maximum and minimum values form the ‘whiskers’.

Another way to visualize distributions is a box and whisker plot, which shows some parts of a distribution rather than the whole distribution. We can use box and whisker plots to compare particular aspects of distributions more easily than when looking at the entire distribution.

As shown in Figure 6.7, the ‘box’ consists of the first quartile (value corresponding to the bottom 25 per cent, or 25th percentile, of all values), the median, and the third quartile (75th percentile). The ‘whiskers’ are the minimum and maximum values. (In Excel, the ‘whiskers’ may not be the actual maximum or minimum, since any values larger than 1.5 times the width of the box are considered outliers and are shown as separate points.)

Example of a box and whisker plot. (Note: In Excel, the mean value is shown by X. In general, the median may not be in the centre of the box, and can differ greatly from the mean. Using the data shown in Figure 6.7 for a variable from the dataset, the mean and median are very similar.)

Figure 6.7 Example of a box and whisker plot. (Note: In Excel, the mean value is shown by X. In general, the median may not be in the centre of the box, and can differ greatly from the mean. Using the data shown in Figure 6.7 for a variable from the dataset, the mean and median are very similar.)

1. Using the same countries you chose in Question 3:
• Make a box and whisker plot for each country and the US, showing the distribution of management scores. You can either make a separate chart for each country or show all countries in the same plot. To check that your plots make sense, compare your box and whisker plots to the distributions from Question 3.

Excel walk-through 6.3 Drawing box and whisker plots

Follow the walk-through in the CORE video, or in Figure 6.8, to find out how to draw a box and whiskers plot in Excel.

How to create box and whisker plots.

Figure 6.8 How to create box and whisker plots.

The data

In this example, we will use data for the US (Column A) and Chile (Column B). To create a box and whiskers plot of more than one variable, each variable needs to be in a separate column. You will need to filter, then copy and paste the required data into a new tab in Excel.

Figure 6.8a In this example, we will use data for the US (Column A) and Chile (Column B). To create a box and whiskers plot of more than one variable, each variable needs to be in a separate column. You will need to filter, then copy and paste the required data into a new tab in Excel.

Create box and whisker plots

After step 3, your box and whisker plot will look like the one above. The plots are ordered according to the columns in Excel, so the first plot corresponds to Column A, the second plot corresponds to Column B, and so on.

Figure 6.8b After step 3, your box and whisker plot will look like the one above. The plots are ordered according to the columns in Excel, so the first plot corresponds to Column A, the second plot corresponds to Column B, and so on.

First, we will add a legend to indicate to which country each plot corresponds.

Figure 6.8c First, we will add a legend to indicate to which country each plot corresponds.

Change the legend entries

Now, we will change the labels in the legend. If your legend is already correctly labelled, skip Steps 6–9.

Figure 6.8d Now, we will change the labels in the legend. If your legend is already correctly labelled, skip Steps 6–9.

Change the legend entries

After changing the name of the data series, the label in the legend will change.

Figure 6.8e After changing the name of the data series, the label in the legend will change.

Change the legend entries

After step 9, your data should now be correctly labelled in the chart legend.

Figure 6.8f After step 9, your data should now be correctly labelled in the chart legend.

Remove the outliers (optional)

Excel shows any observations that are greater than 1.5 times the box width (in absolute value) separately as outliers. If your chart is too cluttered with outliers, you can remove them to make your chart more readable.

Figure 6.8g Excel shows any observations that are greater than 1.5 times the box width (in absolute value) separately as outliers. If your chart is too cluttered with outliers, you can remove them to make your chart more readable.

From the manufacturing data, firms in the US seem to be managed better (on average) than firms in other countries. To investigate whether this is the case in other sectors, we will use data gathered on hospitals and schools.

1. Using the data for hospitals and schools (AMP_graph_public.csv):
• Create a table for hospitals and schools, showing the mean management score and criteria score (monitoring, targets, incentives) for each country, as in Figure 6.2a. (Hint: You may find it helpful to use Excel’s PivotTable option—see Excel walk-through 3.1.)
• Make separate bar charts for hospitals and schools showing the mean overall management score for each country, sorted from highest to lowest, as in Figure 6.1. Are the country rankings similar to those in manufacturing?
• Using your average criteria scores from Question 5(a), suggest some explanations for the observed rankings in either hospitals or schools. (You may find it helpful to research healthcare or educational policies and reforms in those countries to support your explanations.)

Part 6.2 Are differences in management practices statistically significant?

Using the management survey data collected by Bloom et al. (2012), we can compare average management scores across countries and industries. Rather than simply identifying differences between groups, we are also interested in whether these differences are statistically significant.

confidence interval
A range of values that is centred around the sample value, and is defined so that there is a specified probability (usually 95%) that it contains the ‘true value’ of interest.

In Empirical Project 2, we assessed statistical significance using p-values. Now we will assess statistical significance using another method called confidence intervals. These two methods are equivalent, meaning that we would get the same conclusions about statistical significance whichever method we use.

A 95% confidence interval is calculated from the data we observed and is designed so that the true value (for example, the mean of a population) will fall into the interval 95% of the time. Other common confidence intervals used in research studies are 90% and 99% confidence intervals, which are similarly defined. We will use 95% confidence intervals throughout this project.

What do we mean by the ‘true value’? Remember that we usually work with data that is a small sample from the entire population of interest. For example, the World Management Survey collects information from a selection of all the firms in a particular country. Since we don’t have data on all the firms in every country, we cannot say with certainty that the average management score across all firms in Country A (the ‘true value’) is higher than that of Country B. However, based on the sample of firms we have from Country A and Country B, we can say whether any observed difference in means is likely to be due to chance, and assess how precisely we have estimated the ‘true value’ of the difference and individual means.

To understand the principle behind confidence intervals, think about play­ing ring toss while blindfolded. You try to throw a ring (a confidence interval) so that it lands around a peg (the true value, which depends on the entire population of interest). The peg is fixed in place, so depending on how wide your ring is and where you throw it, the ring may not land around the peg.

To decide, you are given information (your data sample) about the possible location of the peg. The sample mean tells you where to stand. The width of the ring depends on how spread out your data is (the standard deviation), and how many observations you have. If your data has a large standard deviation, you know that the range of possible locations of the peg is wider so you would make your ring wider. The more obser­vations you have, the more information you have about the population of interest and thus the location of the peg, so (ceteris paribus) you can make your ring narrower.

Since you are blindfolded, you will never know if the ring actually landed around the peg. However, you know where to stand (at the sample mean) and how wide to make the ring so that for every 100 tosses, it lands around the peg roughly 95 times (in other words, 95% of the time).

As the name suggests, confidence intervals tell us how much confidence we can place in our estimates, in other words how precisely the sample mean is estimated. Wider confidence intervals suggest that our sample mean is estimated less precisely (the data is more spread out). Using the ring toss analogy, we are less sure that we are standing in the right place, so we need a wider ring to have the same chance (95%) of landing it around the peg.

To sum up: A confidence interval is a range of values centred around the sample mean value and is defined so that there is a specified probability (usually 95%) that it contains the true value of interest.

Rule of thumb for statistical significance

When comparing two distributions, if neither mean is in the confidence interval for the other mean, the difference in means is statistically significant.

This rule of thumb is handy when looking at charts. For a more definite conclusion, we can calculate the p-value (see Empirical Project 2) or construct a confidence interval for the difference in means. (This method involves more mathematics so we will discuss that in Empirical Project 8.)

We will now build on the results from the Bloom et al. (2012) paper by using 95% confidence intervals to make comparisons between the mean overall management score for different countries and types of firms. The confidence interval for the population mean (mean management score for that country) is centred around the sample mean. To determine the width of the interval, we use the standard deviation and number of firms.

1. First look at manufacturing firms in different countries. Using the manufacturing data (AMP_graph_manufacturing.csv) for three countries of your choice and for the US:
• Create a summary table for the overall management score as shown in Figure 6.9, with one row for each country. (Hint: Use Excel’s PivotTable option.)
Country Mean Standard deviation Number of firms

Summary table for manufacturing firms.

Figure 6.9 Summary table for manufacturing firms.

• Use Excel’s CONFIDENCE.T function to determine the width of the 95% confidence interval (this is the distance from the mean to one end of the interval). You should get a different number for each country.
• Plot a column chart showing the mean management score and add the confidence intervals to your chart (see Excel walk-through 6.4 for help on how to do this).
• Using your chart from Question 1(c) and this rule of thumb, determine which means (if any) are significantly different from the US mean. Use the width of the confidence intervals to describe how precisely each mean was estimated. Do your results change if you use a different specified probability (for example, 99%)?

Excel walk-through 6.4 Creating confidence intervals and adding them to a chart

How to create confidence intervals and add them to a chart.

Figure 6.10 How to create confidence intervals and add them to a chart.

Calculate the width of the confidence interval

In this example we will use data for the US and Chile (shown in Columns A and B). To calculate the width of the 95% confidence interval (distance from the mean to one end of the interval), we first need to calculate the standard deviation and number of observations for each country. The 0.05 in the CONFIDENCE.T function is 1 − 0.95 (our specified probability).

Figure 6.10a In this example we will use data for the US and Chile (shown in Columns A and B). To calculate the width of the 95% confidence interval (distance from the mean to one end of the interval), we first need to calculate the standard deviation and number of observations for each country. The 0.05 in the CONFIDENCE.T function is 1 − 0.95 (our specified probability).

Plot a bar chart

After completing step 4, your bar chart will look similar to the one above.

Figure 6.10b After completing step 4, your bar chart will look similar to the one above.

Add error bars to the chart

Confidence intervals are in ‘error bars’ option in Excel. We will use the calculated width values from step 1 to determine the size of the error bars.

Figure 6.10c Confidence intervals are in ‘error bars’ option in Excel. We will use the calculated width values from step 1 to determine the size of the error bars.

Add error bars to the chart

The positive error value determines the right end of the interval (the value above the mean). The negative value determines the left end of the interval (the value below the mean). Confidence intervals are symmetric, so we use the same values for both.

Figure 6.10d The positive error value determines the right end of the interval (the value above the mean). The negative value determines the left end of the interval (the value below the mean). Confidence intervals are symmetric, so we use the same values for both.

The finished chart

After completing step 9, your chart will look similar to the one above. You can also add horizontal and vertical axis titles and a chart title.

Figure 6.10e After completing step 9, your chart will look similar to the one above. You can also add horizontal and vertical axis titles and a chart title.

1. Using the data for hospitals or schools (AMP_graph_public.csv), using all available countries:
• Create a summary table for the overall management score, with one row for each country. (Hint: Use Excel’s PivotTable option.)
• Plot separate column charts for hospitals and schools. Which (if any) countries have significantly better or significantly worse management practices (on average) than the US?
1. Look at the width of your confidence intervals and relate this to the standard deviation and number of observations. Are confidence intervals generally wider/narrower if the standard deviation is larger? How about if the number of observations is larger? With reference to the ring toss example, explain why we would expect there to be a relationship between the confidence interval width, standard deviation, and number of observations.

Part 6.3 What factors affect the quality of management?

Besides documenting and comparing management practices across industries and countries, another purpose of the World Management Survey was to investigate factors that affect management quality.

One possible factor affecting differences in management is firm ownership. To look at the data for this factor in the healthcare and education sectors, we will focus on broad groups (public vs privately-owned firms), and for manufacturing firms we will focus on different kinds of private ownership.

1. Using the data for hospitals and schools (AMP_graph_public.csv):
• Create a separate PivotTable for hospitals and schools, showing the average management score, standard deviation (StdDev), and number of observations, with ‘country’ as the row variable, and ‘ownership’ (public or private) and ‘ind’ as the column variables.
• Use your PivotTable from Question 1(a) to calculate the confidence interval width for each country and ownership type.
• Plot a bar chart (one for hospitals and another for schools) showing the means from Question 1(a) and the confidence intervals from Question 1(b). Describe the differences between public and private firms within countries and compare management scores for the same firm type across countries. (For example, is one type of firm generally better managed than the other? Are there similar patterns for hospitals and schools? If you have done Question 5 in Part 6.1, you may want to discuss whether the rankings change after conditioning on ownership type.

Besides ownership type, management practices may vary depending on firm size, though it is difficult to predict what the relationship between these variables might be. Larger firms have more employees and could be more difficult to manage well, but may also attract more experienced managers. We will look at the conditional means for manufacturing firms, depending on whether they are above or below the median number of employees (calculated from the data), and see if there is a clear relationship.

1. Using the data for manufacturing firms (AMP_graph_manufacturing.csv):
• In a new column in the original spreadsheet, use Excel’s IF function to create a variable that equals ‘Smaller’ if a firm has less than the median number of employees (330). In natural log terms, this corresponds to log employment of less than 5.80.
• For two countries of your choice and the US, create a PivotTable showing the mean overall management score, standard deviation, and number of observations, with ‘country’ and ‘ownership’ as the row variables, and firm size (from Question 2(a)) as the column variable. (Note: When there is only one observation in a group, there is no standard deviation.)
• Use your PivotTable from Question 2(b) to calculate the confidence interval width for each firm size and ownership type.
• Plot a column chart for each country, showing the means from Question 2(b) and the confidence intervals from Question 2(c). Describe any patterns you observe across ownership types and firm size in each country.

Excel walk-through 6.5 Using Excel’s IF function

How to use Excel’s IF function.

Figure 6.11 How to use Excel’s IF function.

The data

This is what the manufacturing data looks like. We will create a new variable in Column T, according to the log employment values in Column E. We have labelled our new column ‘size’ (cell T1).

Figure 6.11a This is what the manufacturing data looks like. We will create a new variable in Column T, according to the log employment values in Column E. We have labelled our new column ‘size’ (cell T1).

Create a new variable

After completing step 2, you will have a variable for firm size. Excel’s IF function fills the cells in Column T with the word ‘Smaller’ if log employment is smaller than 5.8, otherwise the cell is filled with the word ‘Larger’.

Figure 6.11b After completing step 2, you will have a variable for firm size. Excel’s IF function fills the cells in Column T with the word ‘Smaller’ if log employment is smaller than 5.8, otherwise the cell is filled with the word ‘Larger’.

So far we have looked at correlations between firm characteristics and management practices, but have not made any causal statements. We will now discuss the difficulties with making causal statements using this data and examine how we might determine the direction of causation.

1. For each of the following variables, explain how it could affect management practices, and then explain how management practices could affect it:
• education level of managers (percentage with a college degree)
• number of competitors
• firm size (number of employees).
1. One way to establish the direction of causation is through a randomized field experiment. Read the discussion on pages 22–23 of the Bloom et al. paper (the section ‘Experimental Evidence on Management Quality and Firm Performance’) about one such experiment that was conducted in Indian textile factories.
• Briefly describe the idea behind a randomized field experiment, and explain, with reference to the results of the experiment in India, whether we can use it to determine the direction of causation between management practice and firm performance. The paper ‘Does Management Matter? Evidence from India’ provides more details about the experiment (pages 9–10 are particularly useful).
• Figure 12 in the paper shows productivity in treatment and control firms over time, with 95% confidence intervals. Use the information in the chart to describe the effect of the treatment on firm productivity.