# 12. Government policies and popularity: Hong Kong cash handout Working in Python

Don’t forget to also download the data into your working directory by following the steps in this project.

## Getting started in Python

Visit the ‘Getting Started in Python’ page for help and advice on setting up a Python session to work with. Remember, you can run any page from this book as a notebook by downloading the relevant file from this repository and running it on your own computer. Alternatively, you can run pages online in your browser over at Binder.

### Preliminary settings

Let’s import the packages we’ll need and also configure the settings we want:

import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
from pathlib import Path
import pingouin as pg
from lets_plot import *
from lets_plot.mapping import as_discrete

LetsPlot.setup_html(no_js=True)


## Part 12.1 Inequality

### Learning objectives for this part

• Draw Lorenz curves and calculate Gini coefficients.
• Assess the effect of a policy on income inequality.
• Convert nominal values to real values (extension).

One reason cited for Scheme $6,000 was to share the gains from economic growth among everyone in the society. We will be using household income data collected by the Hong Kong government to assess the potential effects of this scheme. Download the data. The spreadsheet contains information about household incomes for certain percentiles of the population. These incomes are ‘pre-intervention’, meaning that they do not include the effects of handouts or policy interventions from the government. (If you are curious about the difference between the two tables in this spreadsheet, see the extension section ‘Nominal and real values’ at the end of Part 12.1.) 1. Using the table ‘Monthly real household income (pre-tax,$HKD)’, plot a separate line chart for each percentile, with year on the horizontal axis and income on the vertical axis. Describe any patterns you see over time.

### Python walk-through 12.1 Importing a specified range of data from a spreadsheet

We start by importing the data; you will need to download it and put it in a subfolder of your working directory called ‘data’. The data provided in the Excel spreadsheet is not in the usual format of one variable per column (known as ‘long’ format). Instead, the first tab contains two separate tables, and we need the second table. We can therefore use the skiprows= keyword argument of pandaspd.read_excel function to specify the cells in the spreadsheet to import. (Note that variable headers for the years are included.)

income = pd.read_excel(
Path("data/doing-economics-project-12-datafile.xlsx"), skiprows=12
)

income

Unnamed: 0 2009 2010 2011 2012 2013 2014 2015 2016
0 85th 43,300.0 43,945.31250 44,515.669516 44,544.177789 45,270.209450 45,162.212854 47,659.574562 47,085.549738
1 75th 31,000.0 31,250.00000 32,273.860399 32,517.249786 34,166.195812 33,298.950420 34,791.489430 34,906.914962
2 50th 17,400.0 17,578.12500 17,806.267806 17,817.671116 18,620.576717 18,490.326277 19,063.829825 19,392.730534
3 25th 8,000.0 8,203.12500 8,346.688034 8,819.747202 8,541.548953 8,590.638315 8,737.588670 8,532.801435
4 15th 4,500.0 4,394.53125 4,637.048908 4,454.417779 4,356.189966 4,090.780150 3,971.631213 3,878.546107
5 NaN NaN NaN NaN NaN NaN NaN NaN NaN
6 Note: Nominal values rounded to the nearest hu... NaN NaN NaN NaN NaN NaN NaN NaN

This dataframe still has some rows that we don’t need for our analysis (those that are entirely filled with NaNs), so we need to get rid of them. There are multiple ways to do this depending on the context.

In this case, the two rows we don’t want contain a large number of invalid entries, so we can make use of the row-wise .dropna() method to clean up the data. While doing this task, we will also rename the first column (Unnamed: 0) to something more relevant.

income = income.dropna(axis=0).rename(columns={"Unnamed: 0": "Percentile"})
income

Percentile 2009 2010 2011 2012 2013 2014 2015 2016
0 85th 43,300.0 43,945.31250 44,515.669516 44,544.177789 45,270.209450 45,162.212854 47,659.574562 47,085.549738
1 75th 31,000.0 31,250.00000 32,273.860399 32,517.249786 34,166.195812 33,298.950420 34,791.489430 34,906.914962
2 50th 17,400.0 17,578.12500 17,806.267806 17,817.671116 18,620.576717 18,490.326277 19,063.829825 19,392.730534
3 25th 8,000.0 8,203.12500 8,346.688034 8,819.747202 8,541.548953 8,590.638315 8,737.588670 8,532.801435
4 15th 4,500.0 4,394.53125 4,637.048908 4,454.417779 4,356.189966 4,090.780150 3,971.631213 3,878.546107

Let’s make a quick plot of this data. We’ll need to re-orient the data if we want to use the lets_plot plotting package (because it expects long-format data). We’ll use pd.melt to do this re-orientation.

income_melted = pd.melt(
income, id_vars="Percentile", value_vars=range(2009, 2016), var_name="Year"
)

(
ggplot(income_melted, aes(x=as_discrete("Year"), y="value", color="Percentile"))
+ geom_line(size=2)
+ labs(y="Monthly real household income (HKD)")
+ scale_x_continuous(format="d")
)


Figure 12.1 Monthly real household income over time.

However, plotting all percentile groups using the same scale hides much of the variation within each percentile group. So, we should create a separate chart for each percentile group. As an example, we will plot the chart for the 15th percentile.

First, we use the pd.melt function to reshape the data into the format that lets_plot uses to plot charts. Then we use the .loc function to select data for the 15th percentile only. Finally, we make the line chart as before.

perct_to_use = "15th"

(
ggplot(
income_melted.loc[income_melted["Percentile"] == perct_to_use, :],
aes(x=as_discrete("Year"), y="value"),
)
+ geom_line(size=1)
+ geom_point(size=3)
+ labs(
title=f"Monthly real household income for {perct_to_use} percentile",
y="Household income (HKD)",
)
+ scale_x_continuous(format="d")
)


Figure 12.2 Monthly real household income for the 15th percentile.

Now we will use this data to draw Lorenz curves and compare changes in the income distribution for 2011–2012. One way to do this is to make the following simplifying assumptions:

• There are 100 households in the economy (so we can think of each percentile as corresponding to one household).
• Households between the 15th and 25th percentile have the same income as the household in the 15th percentile, households between the 25th and 50th percentile have the same income as the household in the 25th percentile, and so on. (Households below the 15th percentile earn nothing.)
1. Draw Lorenz curves for 2011 and 2012 by carrying out the following:
• Create a new column for 2012 only, showing annual incomes following the $6,000 handout. Assume that nothing else has changed except for the cash handout. (Remember to convert from monthly to annual data, and remember that the cash handout was given to all households, including those with no income.) • Calculate the annual economy-wide earnings in 2011 and 2012 (with the$6,000 cash handout included). (Hint: Multiply the annual income of a given percentile by the number of households assumed to earn that amount.)
• Use your answers to Question 2(b) to complete the table in Figure 12.3 below. (The second row also shows zeros in 2011 because the bottom 15% of households earned nothing.) For help on how to calculate cumulative shares, see Python walk-through 5.1.
Cumulative share of the population (%) Perfect equality line Cumulative share of income in 2011 (%) Cumulative share of income in 2012 (%)
0 0 0 0
15 15 0
25 25
50 50
75 75
85 85
100 100 100 100

Figure 12.3 Cumulative share of income, for some percentiles of the population.

• Draw the Lorenz curves for 2011 and 2012 in the same chart, with cumulative share of population on the horizontal axis and cumulative share of income on the vertical axis. Add the perfect equality line to your chart, and add a chart legend.

### Python walk-through 12.2 Calculating cumulative income shares and plotting a Lorenz curve

Questions 2(a)–(c) can be completed in a few steps. To do them, we’re going to use several techniques.

First, we use the .loc method to get the relevant variables. Remember that the syntax is .loc[rows, columns], so we can create a list of just the columns we want to work with and use : to get all rows. Then we’re going to use the assign function. This function allows us to change dataframes in-line, that is, without having to disrupt the flow, and within a chained set of methods. As it’s an important point, let’s take a quick look at this idea in more detail.

Say we had a dataframe, df, with a column number that just comprised the numbers from 1 to 10. Imagine we wanted to create a second column called number_add_one, which comprises the numbers 2 to 11. There are two ways to do this task using df. The first is to explicitly create the new column:

df["number_add_one"] = df["number"] + 1


The second is to use assign:

df = (
df.assign(
)
)


Neither of these ways is inherently better than the other, they’re just different. However, if you’re doing many steps in a row, multiple assign statements can make the code more readable. As a reminder, lambda functions are functions that you don’t have to give a name to. They work by defining a variable, by convention x, but you could call it anything. In the case of a dataframe assign statement, writing lambda x: means that x stands in for the dataframe itself, so that writing lambda x: x["number"] refers to the ‘number’ column in the dataframe x.

Returning to the task at hand: we’re first going to remove the ‘th’ suffix in the percentile column values using .str.split("th", expand=True)[0] to split all the strings, expand the resulting list into two columns and, with [0], only take the first part (before the ‘th’). The variable containing percentiles is then cast as an integer type.

cols_of_interest = ["Percentile", 2011, 2012]

percentiles = income.loc[:, cols_of_interest].assign(
Percentile=lambda x: x["Percentile"].str.split("th", expand=True)[0].astype("int")
)
percentiles

Percentile 2011 2012
0 85 44,515.669516 44,544.177789
1 75 32,273.860399 32,517.249786
2 50 17,806.267806 17,817.671116
3 25 8,346.688034 8,819.747202
4 15 4,637.048908 4,454.417779

In the next step, we’re going to concatenate a dataframe with three zeros to form the zeroth percentile entries for 2011 and 2012: we need this step to ensure our data includes 0.

percentiles = pd.concat(
[
percentiles,
pd.DataFrame([[0, 0, 0]], columns=cols_of_interest, index=[5], dtype="float"),
],
axis=0,
).sort_values(by="Percentile")
percentiles

Percentile 2011 2012
5 0.0 0.000000 0.000000
4 15.0 4,637.048908 4,454.417779
3 25.0 8,346.688034 8,819.747202
2 50.0 17,806.267806 17,817.671116
1 75.0 32,273.860399 32,517.249786
0 85.0 44,515.669516 44,544.177789

Next, we’re going to do a number of operations on the columns using .assign, including:

• adding a column representing the number of households in each percentile group (assuming 100 households in the economy)
• creating a new variable called handout_2012 that adds $6,000 to each value for the year 2012 • converting monthly to annual data (multiplying the 2011 and 2012 values by 12) • adding the economy-wide income for each percentile group, derived from multiplying the income values by the number of households, and storing these in the variables income_2011 and income_2012 • creating the normalized cumulative income (as a percentage) for each income group. percentiles = percentiles.assign( households=[15, 10, 25, 25, 10, 15], handout_2012=lambda x: x[2012]*12 + 6000, handout_2011=lambda x: x[2011]*12, income_2011=lambda x: x["handout_2011"] * x["households"], income_2012=lambda x: x["handout_2012"] * x["households"], rel_share_2011=lambda x: 100 * x["handout_2011"].cumsum() / x["handout_2011"].sum(), rel_share_2012=lambda x: 100 * x["handout_2012"].cumsum() / x["handout_2012"].sum(), Percentile=lambda x: x["households"] + x["Percentile"], ).loc[:, ["Percentile", "rel_share_2011", "rel_share_2012"]] percentiles  Percentile rel_share_2011 rel_share_2012 5 15.0 0.000000 0.427639 4 25.0 2.742732 3.252577 3 50.0 15.085025 16.537542 2 75.0 41.415250 42.648724 1 85.0 60.504663 61.474686 0 100.0 100.000000 100.000000 Finally, we tidy up the data by ensuring that the 0th percentile has no share of the income and the 100th percentile has 100% of the cumulative income. We do this by adding a row of zeros by concatenating another empty dataframe to percentiles. Finally, we sort by the values again. income_shares = pd.concat( [ percentiles, pd.DataFrame( [[0, 0, 0]], columns=["Percentile", "rel_share_2011", "rel_share_2012"], index=[6], ), ], axis=0, ).sort_values(by="Percentile") income_shares.round(2)  Percentile rel_share_2011 rel_share_2012 6 0.0 0.00 0.00 5 15.0 0.00 0.43 4 25.0 2.74 3.25 3 50.0 15.09 16.54 2 75.0 41.42 42.65 1 85.0 60.50 61.47 0 100.0 100.00 100.00 With the data from Questions 2(a)–(c), we can plot the Lorenz curve using the lets_plot package. Note that we use the Percentile variable to draw the line of perfect equality. We need to have data in long format for this, so we begin by doing a melt: long_incomes_shares = pd.melt( income_shares, id_vars="Percentile", var_name="year", value_name="Cumulative share of income (%)", )  # rename the legend labels long_incomes_shares = long_incomes_shares.replace({"rel_share_2011": "2011", "rel_share_2012": "2012"}) ( ggplot( long_incomes_shares, aes(x="Percentile", y="Cumulative share of income (%)", color="Year"), ) + geom_abline(slope=1, color="black", linetype=2, alpha=0.7) + geom_line(size=1) + geom_point(size=3) + labs(x="Cumulative share of the population (%)") )  Figure 12.4 Lorenz curves (2011 and 2012). Now we will compare the Gini coefficients in 2011, 2012, and 2013 (a year after the policy came into effect). 1. Calculate the Gini coefficient by carrying out the following: • Create a table as in Figure 12.5, and fill in the remaining values (some values for 2011 have been filled in for you). The first column should contain the numbers 1 to 100, in intervals of 1, and the remaining columns should contain the annual incomes earned by a household at that percentile (using the same assumption as in Question 2). Remember that the 2012 data should include the cash handout. To obtain more accurate answers, use two decimal places instead of rounding incomes to the nearest dollar (where relevant). Percentile 2011 2012 2013 1 0.00 2 0.00 3 0.00 98 534,188 99 534,188 100 534,188 Figure 12.5 Incomes earned by each percentile of the population. • Using the ineq function, calculate the Gini coefficient for each year. • Based on the Gini coefficients from Question 3(b), what effect did the$6,000 handout appear to have on income inequality in the short run, and in the long run? Suggest some explanations for what you observe.

### Python walk-through 12.3 Generating Gini coefficients

#### Create a table containing percentiles

To create the 2011, 2012, and 2013 percentiles for every percentile in our 100-household economy, we need to take the income for each percentile group and expand that for every household in the respective percentile group. For example, there are 15 households in the bottom percentile group that have zero income for 2011 and 2013, and $6,000 in 2012. For the 15th percentile group, there are 15 households that will share the same income value, and so on for the other percentile groups. To achieve this expansion, there are a few steps. First, we’re going to select the years we’re interested in and add in a row of zeros (for the zero percentiles). Then we’ll convert monthly to annual incomes by multiplying all values by 12. Note that the values in the dataframe are incomes. years_of_interest = [2011, 2012, 2013] raw_percentiles = income.loc[:, years_of_interest].sort_values(2011) raw_percentiles[2011] = raw_percentiles[2011]*12 raw_percentiles[2012] = raw_percentiles[2012]*12 raw_percentiles[2013] = raw_percentiles[2013]*12 raw_percentiles = pd.concat( [ pd.DataFrame( [[0, 0, 0]], columns=years_of_interest, index=[6], ), raw_percentiles, ], ).reset_index(drop=True) raw_percentiles  2011 2012 2013 0 0.000000 0.000000 0.000000 1 55,644.586895 53,453.013347 52,274.279592 2 100,160.256410 105,836.966428 102,498.587435 3 213,675.213675 213,812.053389 223,446.920608 4 387,286.324786 390,206.997436 409,994.349740 5 534,188.034188 534,530.133473 543,242.513405 Next, we create a list of the numbers of households in each group (households = [15, 10, 25, 25, 10, 15]) that will be matched with the six different percentiles we have available in the raw_percentiles dataframe. Then we use two for loops: • An inner for loop goes over each number in the households list above (the variable is called num_hhds), takes the corresponding income value (num_income) and repeats it num_hhds times. We achieve this repetition using np.repeat from the numpy package. Each one of these repetitions is concatenated until we have a dataframe with an income value for all 100 percentiles for a specific year. • An outer for loop that goes over the years. All the results from both dataframes are combined together. We do a couple of other useful things too: • hh_percentiles_by_year.index % 100 + 1 creates the percentiles from 1 to 100. The % symbol represents the modulo operator which gives the remainder after division. • For 2012, we add in the 6000 HKD extra payment. Note that pandas uses 0 as its default column name, which is why it appears below. households = [15, 10, 25, 25, 10, 15] hh_percentiles = pd.DataFrame() for year in raw_percentiles.columns: hh_percentiles_by_year = pd.DataFrame() for i, num_hhds in enumerate(households): num_income = raw_percentiles.loc[i, year] hh_percentiles_by_year = pd.concat( [hh_percentiles_by_year, pd.DataFrame(np.repeat(num_income, num_hhds))], axis=0, ) hh_percentiles_by_year["year"] = year hh_percentiles_by_year = hh_percentiles_by_year.reset_index(drop=True) hh_percentiles_by_year["percentile"] = hh_percentiles_by_year.index % 100 + 1 if year == 2012: hh_percentiles_by_year[0] = hh_percentiles_by_year[0] + 6000 hh_percentiles = pd.concat([hh_percentiles, hh_percentiles_by_year], axis=0) hh_percentiles.tail()  0 year percentile 95 543,242.513405 2013 96 96 543,242.513405 2013 97 97 543,242.513405 2013 98 98 543,242.513405 2013 99 99 543,242.513405 2013 100 We can now pivot the table we created to have columns as years and percentiles as the index: hh_percentiles = hh_percentiles.pivot(columns="year", index=["percentile"], values=0) hh_percentiles  year 2011 2012 2013 percentile 1 0.000000 6,000.000000 0.000000 2 0.000000 6,000.000000 0.000000 3 0.000000 6,000.000000 0.000000 4 0.000000 6,000.000000 0.000000 5 0.000000 6,000.000000 0.000000 ... ... ... ... 96 534,188.034188 540,530.133473 543,242.513405 97 534,188.034188 540,530.133473 543,242.513405 98 534,188.034188 540,530.133473 543,242.513405 99 534,188.034188 540,530.133473 543,242.513405 100 534,188.034188 540,530.133473 543,242.513405 Now we can compute the Gini coefficients for the different years. First we define a function that computes a Gini, given a set of values. (Here we use the exact formula for the Gini coefficient.) def gini_coefficient(x): """Compute Gini coefficient of array of values""" x = np.double(x.values) x = x / x.sum() # Mean absolute difference mad = np.abs(np.subtract.outer(x, x)).mean() # Relative mean absolute difference rmad = mad / np.mean(x) # Gini coefficient g = 0.5 * rmad return g  Second, we apply this function to each column separately (using axis=0) to get our Gini values. hh_percentiles.apply(lambda x: gini_coefficient(x), axis=0)  year 2011 0.468760 2012 0.451902 2013 0.469819 dtype: float64  1. In our analysis we assumed that the$6,000 handout was the only policy that affected households in 2012. In reality a household’s disposable income will also depend on taxes and transfers. Without doing additional calculations, explain what would happen to the shape of the Lorenz curve and inequality in 2012 if:
• households in and below the 15th percentile received cash transfers from the government
• households in and above the 75th percentile had to pay income tax.

### Extension Nominal and real values

In this extension section, we will discuss how the table ‘Monthly nominal household income (pre-tax, $HKD)’ taken from the Hong Kong Poverty Situation Report 2016 was used to create the table ‘Monthly real household income (pre-tax,$HKD)’, and why we needed to make this conversion.

inflation
An increase in the general price level in the economy. Usually measured over a year. See also: deflation, disinflation.

The difference between real and nominal income is that real income takes inflation into account. You may be familiar with the concept of inflation, which is an increase in the general price level in the economy. Usually inflation is measured by taking a fixed bundle of goods and services and looking at how much it would cost to buy that bundle, compared to a reference year. (For more details about real versus nominal variables, see the Einstein ‘Comparing income at different times, and across different countries’ in Section 1.2 of The Economy.) If the bundle has become more expensive, then we conclude that the price level in the economy has increased.

In this case, the values from 2010 onwards have been adjusted to account for the fact that prices have increased since 2009, so the same income would be able to purchase fewer goods and services. Without making this adjustment, we would conclude that households in the 15th percentile had the same purchasing power in 2009 and 2010, when in fact they do not as they can buy fewer goods in 2010 because of the overall price increase.

1. Convert nominal values to real values, using 2009 as the reference year:
• To understand what happens to a given nominal value over time due to inflation, create a table as in Figure 12.6, and fill it in according to the percentage increases shown. (These percentages were taken from the Monthly Digest of Statistics.) (For example, $1 in 2009 would be$1 × (1 + (2.40/100)) = $1.024 in 2010.) For greater accuracy, round your answers to three decimal places. With a starting value of$1 in 2009, what would the value be in 2016?
Year Percentage increase (from previous year) Inflation index
2009   1.000
2010 2.40 1.024
2011 5.30
2012 4.10
2013 4.30
2014 4.40
2015 3.00
2016 2.40

Figure 12.6 Creating an index-based series from percentage increases.

• Use this table to convert nominal incomes to real incomes by dividing the nominal income by the corresponding value in the third column (for example, divide nominal incomes in 2010 by 1.024 to get the value in 2009 terms). You should get the same values as in the ‘real household income’ table.

### Extension Python walk-through 12.4 Converting nominal incomes to real incomes

To obtain the real income values, we need to divide the income for each percentile group by the inflation index created in Question 5(a). Recall that we have imported only the real income data from the Excel spreadsheet and not the nominal income data, so we will first import the nominal income data (nom_income). Note that, in the code below, the inflation index is entered as a vector (list of numbers) called inflation (with the same number of elements as the number of years in the data) and this is multiplied (element-wise) by each row of the income data using the todo method.

# Import the nominal income data, drop everything beyond first table,
# transpose, and set Percentile as the index
nom_income = (
Path("data/doing-economics-project-12-datafile.xlsx"),
skiprows=2,
)
.loc[:4, :]
.set_index("Percentile")
.T
)

nom_income

Percentile 85th 75th 50th 25th 15th
2009 43,300.0 31,000.0 17,400.0 8,000.0 4,500.0
2010 45,000.0 32,000.0 18,000.0 8,400.0 4,500.0
2011 48,000.0 34,800.0 19,200.0 9,000.0 5,000.0
2012 50,000.0 36,500.0 20,000.0 9,900.0 5,000.0
2013 53,000.0 40,000.0 21,800.0 10,000.0 5,100.0
2014 55,200.0 40,700.0 22,600.0 10,500.0 5,000.0
2015 60,000.0 43,800.0 24,000.0 11,000.0 5,000.0
2016 60,700.0 45,000.0 25,000.0 11,000.0 5,000.0

Now, we multiply this data with the inflation data, taking the first element of the inflation data for the first row (excluding the year), the second for the second, and so on. Note that we actually need to divide by the inflation numbers, rather than multiply, which we do with a list comprehension. Finally, we transpose back to the original data shape.

nom_income is a dataframe with the nominal income observations (years in rows and percentiles in columns). We wish to divide all 2009 data by the first price index number (1.0), all 2010 observations by the 2nd (1.024), and so forth. If your dataframe consists of numerical data only, then this can be achieved using the .div method. As we want to divide each of the eight rows by a different number, we need to feed in eight different numbers. This is done by feeding in the inflation list and specifying that we want to apply these one for each row (axis = 0).

inflation = [1, 1.024, 1.078, 1.122, 1.171, 1.222, 1.259, 1.289]
nom_income = nom_income.div(inflation, axis=0).T
nom_income.round(2)

2009 2010 2011 2012 2013 2014 2015 2016
Percentile
85th 43,300.0 43,945.31 44,526.90 44,563.28 45,260.46 45,171.85 47,656.87 47,090.77
75th 31,000.0 31,250.00 32,282.00 32,531.19 34,158.84 33,306.06 34,789.52 34,910.78
50th 17,400.0 17,578.12 17,810.76 17,825.31 18,616.57 18,494.27 19,062.75 19,394.88
25th 8,000.0 8,203.12 8,348.79 8,823.53 8,539.71 8,592.47 8,737.09 8,533.75
15th 4,500.0 4,394.53 4,638.22 4,456.33 4,355.25 4,091.65 3,971.41 3,878.98

## Part 12.2 Government popularity

### Learning objectives for this part

• Assess the effect of a policy on government popularity.

One possible reason why the government implemented Scheme $6,000 was to gain public approval, since there was some pressure on the government to spend the surplus on alleviating current social issues rather than reinvesting it (for example, in pension schemes). We will use a public opinion poll conducted by the University of Hong Kong to assess whether this scheme could have improved public satisfaction with the government. 1. Think about the groups who would be affected by this scheme (for example, the government or members of the public). Who would be likely to support or oppose this scheme, and why? Download the data: ### Python walk-through 12.5 Importing data from Excel We have downloaded and saved the data with the filename ‘datatables.xlsx’. From inspecting the Excel file, the data starts in row 2. We will use pd.read_excel to import the rows we need, and store this table in a dataframe called overall. overall = pd.read_excel( Path("data/datatables.xlsx"), skiprows=1 ) overall  調查開始日期Survey Start Date 調查結束日期Survey End Date 成功樣本Successful cases 次樣本人數Subsample 回應率Response rate 好滿意Very satisfied 幾滿意Quite satisfied 滿意Satisfied 一半半Half‐half 幾唔滿意Quite dissatisfied 好唔滿意Very dissatisfied 唔滿意Dissatisfied 唔知/難講DK/HS 總數Total 淨值Netvalue 平均量值Mean value 基數Base 樣本平均標準誤差Standard Error of Mean 0 2023‐12‑01 00:00:00 2023‑12‑07 502.0 502.0 0.487 0.126 0.356 0.482 0.175 0.131 0.195 0.327 0.017 1.0 0.156 3.1 492.0 0.1 1 2023‑11‑01 00:00:00 2023‑11‑15 1,004.0 518.0 0.514 0.109 0.316 0.425 0.139 0.180 0.245 0.425 0.011 1.0 0.000 2.9 510.0 0.1 2 2023‑10‑03 00:00:00 2023‑10‑25 1,003.0 493.0 0.536 0.103 0.334 0.438 0.139 0.177 0.237 0.414 0.009 1.0 0.023 2.9 482.0 0.1 3 2023‑09‑07 00:00:00 2023‑09‑19 1,001.0 512.0 0.530 0.138 0.306 0.444 0.158 0.170 0.208 0.378 0.020 1.0 0.066 3.0 502.0 0.1 4 2023‑08‑01 00:00:00 2023‑08‑10 1,005.0 519.0 0.545 0.107 0.315 0.422 0.164 0.183 0.220 0.403 0.011 1.0 0.019 2.9 513.0 0.1 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 397 1997‑10‑28 00:00:00 1997‐10‐29 536.0 536.0 0.445 0.019 0.351 0.370 0.401 0.160 0.019 0.179 0.050 1.0 0.191 3.2 509.0 0.1 398 1997‑09‑23 00:00:00 1997‑09‑24 512.0 512.0 0.416 0.014 0.475 0.489 0.304 0.123 0.008 0.131 0.076 1.0 0.358 3.4 471.0 0.1 399 1997‑08‑26 00:00:00 1997‑08‑27 532.0 532.0 0.424 0.030 0.413 0.443 0.287 0.119 0.011 0.130 0.140 1.0 0.313 3.4 456.0 0.1 400 1997‑07‑29 00:00:00 1997‑07‑30 564.0 564.0 0.397 0.016 0.380 0.396 0.245 0.153 0.018 0.171 0.189 1.0 0.225 3.3 458.0 0.1 401 #2017年9月前以「整體回應比率」顯示，之後以「實效回應比率」顯示。#“Overall r... NaT NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN The data has been imported correctly: Python recognized the start and end dates as dates (they are even formatted with a timestamp), and all other variables as numeric (num). However, the variable names are quite long and include Chinese characters, so let’s rename the variables we need (survey start date and net value) to English-only variables. We’ll also remove the last row of the dataframe as it only contains missing values. overall = overall.dropna(axis=0).rename(columns={"調查結束日期Survey End Date": "Survey date", "淨值Netvalue": "Net value"}) overall.info()  <class 'pandas.core.frame.DataFrame'> Index: 401 entries, 0 to 400 Data columns (total 18 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 調查開始日期Survey Start Date 401 non-null object 1 Survey date 401 non-null datetime64[ns] 2 成功樣本Successful cases 401 non-null float64 3 次樣本人數Subsample 401 non-null float64 4 回應率Response rate 401 non-null float64 5 好滿意Very satisfied 401 non-null float64 6 幾滿意Quite satisfied 401 non-null float64 7 滿意Satisfied 401 non-null float64 8 一半半Half-half 401 non-null float64 9 幾唔滿意Quite dissatisfied 401 non-null float64 10 好唔滿意Very dissatisfied 401 non-null float64 11 唔滿意Dissatisfied 401 non-null float64 12 唔知/難講DK/HS 401 non-null float64 13 總數Total 401 non-null float64 14 Net value 401 non-null float64 15 平均量值Mean value 401 non-null float64 16 基數Base 401 non-null float64 17 樣本平均標準誤差Standard Error of Mean 401 non-null float64 dtypes: datetime64[ns](1), float64(16), object(1) memory usage: 59.5+ KB  1. Read the HKU POP survey methods page for a description of how the survey data was collected. Explain whether you think the sample is representative of the target population, and discuss some limitations of the survey method. ### Python walk-through 12.6 Plotting line charts The data is already in the correct format, but the variable we want to plot (Net value) is expressed in proportions instead of percentages. We will convert this variable to percentages and then use ggplot to make the line chart. overall["Net value (%)"] = overall["Net value"]*100 ( ggplot(overall[(overall["Survey date"] > "2006-1-20")], aes(x="Survey date", y="Net value (%)")) + geom_line(size=1) + labs(x="Survey date", y="Net satisfaction (% positive minus % negative)", title="Overall satisfication with HKSAR Government") + scale_x_datetime() )  Figure 12.7 Net public satisfaction with the government’s performance over time. For Question 3(b), in this example we use the variable ‘Appraisal of degree of prosperity’. We have downloaded and saved this data with the filename ‘datatables_part2.xlsx’. We will import it into Python as the dataframe prosperity. prosperity = pd.read_excel( Path("data/datatables_part2.xlsx"), skiprows=1 ) prosperity  調查開始日期Survey Start Date 調查結束日期Survey End Date 樣本人數Total Sample 次樣本人數Sub Sample 回應率Response Rate 繁榮Prosperity 標準誤差Standard Error 評分人數Number of Raters 認知率Recognition Rate 備註Remark 0 2023‑08‑01 00:00:00 2023‑08‑10 1,005.0 492.0 0.545 5.42 0.11 487.0 0.990 NaN 1 2023‑05‑03 00:00:00 2023‑05‑18 1,003.0 503.0 0.524 5.87 0.13 500.0 0.995 NaN 2 2023‑02‑01 00:00:00 2023‑02‑09 1,017.0 528.0 0.580 5.83 0.10 519.0 0.983 NaN 3 2022‑11‑02 00:00:00 2022‑11‑10 1,001.0 519.0 0.489 5.09 0.10 515.0 0.991 NaN 4 2022‑08‑01 00:00:00 2022‑08‑09 1,004.0 510.0 0.586 5.52 0.09 501.0 0.983 NaN ... ... ... ... ... ... ... ... ... ... ... 99 1997‑09‑23 00:00:00 1997‑09‑24 511.0 511.0 0.416 7.89 0.06 497.0 0.973 NaN 100 1997‑08‑26 00:00:00 1997‑08‑27 532.0 532.0 0.424 7.86 0.06 513.0 0.964 NaN 101 1997‑07‑29 00:00:00 1997‑07‑30 564.0 564.0 0.397 7.86 0.07 540.0 0.957 NaN 102 1997‑06‑25 00:00:00 1997‑06‑25 548.0 548.0 0.467 8.05 0.06 526.0 0.960 NaN 103 * 該調查結果公佈時尚未包括手機樣本。上表結果已更新為固網樣本及手機樣本的合併統計數字。* ... NaT NaN NaN NaN NaN NaN NaN NaN NaN prosperity = prosperity.rename(columns={"調查結束日期Survey End Date": "Survey date", "繁榮Prosperity": "Prosperity"}) prosperity.info()  <class 'pandas.core.frame.DataFrame'> RangeIndex: 104 entries, 0 to 103 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 調查開始日期Survey Start Date 104 non-null object 1 Survey date 103 non-null datetime64[ns] 2 樣本人數Total Sample 103 non-null float64 3 次樣本人數Sub Sample 103 non-null float64 4 回應率Response Rate 103 non-null float64 5 Prosperity 103 non-null float64 6 標準誤差Standard Error 103 non-null float64 7 評分人數Number of Raters 103 non-null float64 8 認知率Recognition Rate 103 non-null float64 9 備註Remark 2 non-null object dtypes: datetime64[ns](1), float64(7), object(2) memory usage: 8.3+ KB  For Question 3(c), we repeat the steps taken for 3(a). The Prosperity variable is measured on a 0–10 scale, where higher numbers indicate stronger beliefs that Hong Kong is a prosperous society. Figure 12.8 Belief that Hong Kong is a prosperous society (0–10 scale). 1. Assess public satisfaction with the government by carrying out the following: • Make a line chart with overall public satisfaction (net value, which is the difference between percentage of positive and negative responses) on the vertical axis, and time (Jan 2006 to the latest month available) on the horizontal axis. Comment on any trends in overall public satisfaction over this time period. • Go to the POP polls Social Indicators data page and choose one or two other indicators that are directly related to the policy (for example, improving people’s Degree of prosperity or Degree of equality). Click the ‘Download Excel’ button above the data table to download the data in Excel format. Import the data into Python, and reformat the variable of interest as in Question 2. • For each of your chosen indicators, make a separate line chart as in Question 3(a), with the dates ranging from the first entry of 2006 to the latest period available, and comment on any similarities to or differences from the chart in Question 3(a). (Since some indicators may be measured on a different scale, focus on changes over time.) • Do you think the scheme had the intended effect on government popularity? Besides the scheme, what other factors or events could explain the observed patterns? 1. In 2018, the government decided to do another cash handout. Read the article ‘Hong Kong cash handout scheme will cost government HK$330 million to administer’ and discuss how this scheme differs from the 2011 scheme. Explain whether you think this policy is an improvement over the 2011 scheme.
1. Suppose you are a policymaker in a developed country with a large budget surplus, and one of the government’s aims is to reduce income inequality. Would you recommend that the government implement a scheme similar to either the 2011 or 2018 scheme? If you recommend a cash handout, suggest some modifications that could make the scheme more effective. If not, suggest other policies that may be more effective in reducing inequality. You may find it helpful to research policies aimed at reducing income inequality, for example Universal Basic Income, which some countries have tried.