Research Domain and Question¶
Region: State of North Carolina, USA Domain: Economic Statistics
Main Research Question: When adjusted for inflation using the BLS CPI, did the per capita income of residents of Buncome County, North Carolina increase or decrease between 1970 and 2020?
Secondary Research Question: How does the inlfation adjusted per capita income of Buncombe County residents compare to the average per capita income of residents in all counties of North Carolina, in the same period?
Income in North Carolina¶
The data I have elected to use for income statistics in North Carolina comes from the website of the North Carolina Office of State Management and Budget. Specifically, I have examined data from the following dataset:
Consumer Price Index (CPI) Data¶
I found a table with historical CPI data, used for calculating inflation effects, on the US Inflation Calculator website. To retrieve the table, I first saved the HTML, then imported that into Excel, cleaned up the sheet outside of the table, and then imported the Excel sheet into Pandas.
Following best practices in economics, I use the Average Annual CPI to compute inflation, calculating a multiplier which I used to convert historical income amounts into their equivalent in "2020 dollars".
import os
def prindex(alist):
for i, x in enumerate(alist):
print(f"{i}: {x}")
files = os.listdir()
prindex(files)
0: education.csv 1: educational-attainment-for-the-population-b15003.csv 2: USBLS_Annual_Inflation_Rates_1970-2020.xlsx 3: Consumer Price Index Data from 1913 to 2024.xlsx 4: Consumer Price Index Data from 1913 to 2024.html 5: Assignment_04_Income_and_Education_in_NC.ipynb 6: North_Carolina_Income_Change_1970_to_2020.png 7: employment-and-income-linc.csv 8: .ipynb_checkpoints
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
emp_inc_df = pd.read_csv(files[7], sep=";")
cpi_df = pd.read_excel(files[3])
Calculating Historical Inflation Multipliers based on CPI Data¶
years = [1970, 1980, 1990, 2000, 2010, 2020]
CPI_census_years = cpi_df[(cpi_df["Year"].isin(years))].sort_values(by="Year", ascending=True).reset_index(drop=True)
CPI_census_years
Year | Jan | Feb | Mar | Apr | May | June | July | Aug | Sep | Oct | Nov | Dec | Annual Avg | Dec-Dec | Avg-Avg | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1970 | 37.800 | 38.000 | 38.200 | 38.500 | 38.600 | 38.800 | 39.000 | 39.000 | 39.200 | 39.400 | 39.600 | 39.800 | 38.800 | 5.6 | 5.7 |
1 | 1980 | 77.800 | 78.900 | 80.100 | 81.000 | 81.800 | 82.700 | 82.700 | 83.300 | 84.000 | 84.800 | 85.500 | 86.300 | 82.400 | 12.5 | 13.5 |
2 | 1990 | 127.400 | 128.000 | 128.700 | 128.900 | 129.200 | 129.900 | 130.400 | 131.600 | 132.700 | 133.500 | 133.800 | 133.800 | 130.700 | 6.1 | 5.4 |
3 | 2000 | 168.800 | 169.800 | 171.200 | 171.300 | 171.500 | 172.400 | 172.800 | 172.800 | 173.700 | 174.000 | 174.100 | 174.000 | 172.200 | 3.4 | 3.4 |
4 | 2010 | 216.687 | 216.741 | 217.631 | 218.009 | 218.178 | 217.965 | 218.011 | 218.312 | 218.439 | 218.711 | 218.803 | 219.179 | 218.056 | 1.5 | 1.6 |
5 | 2020 | 257.971 | 258.678 | 258.115 | 256.389 | 256.394 | 257.797 | 259.101 | 259.918 | 260.280 | 260.388 | 260.229 | 260.474 | 258.811 | 1.4 | 1.2 |
CPI = CPI_census_years["Annual Avg"]
CPI = list(CPI)
CPI_Multipliers = [CPI[5] / multiplier for multiplier in CPI]
print(CPI, CPI_Multipliers)
[38.8, 82.4, 130.7, 172.2, 218.056, 258.811] [6.670386597938144, 3.1409101941747566, 1.9801912777352717, 1.5029674796747967, 1.1869015298822319, 1.0]
print(f"""
CPI: {CPI}
CPI_Multipliers: {CPI_Multipliers}
""")
CPI: [38.8, 82.4, 130.7, 172.2, 218.056, 258.811] CPI_Multipliers: [6.670386597938144, 3.1409101941747566, 1.9801912777352717, 1.5029674796747967, 1.1869015298822319, 1.0]
Employment and Income Data¶
The employment and income LINC database contains 1,177,506 rows, 904,374 of which provide data for North Carolina counties ("Area Type": "County").
It contains data spanning 1969 to 2023. It combines US census data, economic statistics and much more. The dataset provides data for an astonishing 439 different variable types, of which I am interested in only one:
"Per Capita Money Income (Census)"
First I isolate County data on Per Capita Income for the years 1970, 1980, 1990, 2000, 2010, and 2020 (census years) and then I further isolate Buncome County data. I create a separate dataframe for each of these datasets. I discard data for 2015 and 2022.
years = [1970, 1980, 1990, 2000, 2010, 2020]
Per Capita Income Data¶
county_per_capita_income = emp_inc_df[(emp_inc_df["Area Type"] == "County")
& (emp_inc_df["Variable"] == "Per Capita Money Income (Census)")
& (emp_inc_df["Year"].isin(years))]
The resulting dataframe contains 600 rows and five columns, giving per capita money income in the census years of 1970-2020 for 100 counties in North Carolina.
print(f"There are {len(set(county_per_capita_income["Area Name"]))} North Carolina counties represented in the DataFrame.")
There are 100 North Carolina counties represented in the DataFrame.
To make best use of the dataframe, pivot it
NC_per_capita_Income_1970_to_2020 = county_per_capita_income.pivot(index="Area Name", columns="Year", values="Value")
NC_per_capita_Income_1970_to_2020
Year | 1970 | 1980 | 1990 | 2000 | 2010 | 2020 |
---|---|---|---|---|---|---|
Area Name | ||||||
Alamance County | 2936.0 | 6534.0 | 13290.0 | 19391.0 | 22819.0 | 27944.0 |
Alexander County | 2391.0 | 5731.0 | 11624.0 | 18507.0 | 20716.0 | 27446.0 |
Alleghany County | 2012.0 | 4934.0 | 10237.0 | 17691.0 | 18919.0 | 26026.0 |
Anson County | 1744.0 | 4826.0 | 9402.0 | 14853.0 | 16856.0 | 22483.0 |
Ashe County | 1717.0 | 4652.0 | 9545.0 | 16429.0 | 20350.0 | 25282.0 |
... | ... | ... | ... | ... | ... | ... |
Wayne County | 1998.0 | 5320.0 | 10843.0 | 17010.0 | 20446.0 | 26362.0 |
Wilkes County | 2093.0 | 5767.0 | 10816.0 | 17516.0 | 19406.0 | 25634.0 |
Wilson County | 2054.0 | 5754.0 | 11641.0 | 17102.0 | 20691.0 | 24656.0 |
Yadkin County | 2303.0 | 5645.0 | 11843.0 | 18576.0 | 20379.0 | 26907.0 |
Yancey County | 1624.0 | 4530.0 | 9462.0 | 16335.0 | 18576.0 | 25767.0 |
100 rows × 6 columns
# make a copy for experimentation
df = NC_per_capita_Income_1970_to_2020.copy()
df.index.name = "County"
df.index.name
'County'
# testing out .loc on it
df.loc["Buncombe County", 1970]
np.float64(2671.0)
# what does the mean of each column produce?
columns = df.columns
for col in columns:
print(df[col].mean())
2161.2 5504.49 11306.92 17812.76 21620.08 27696.34
bunc_per_cap_income = df.loc["Buncombe County"]
NC_avg_per_cap_income = df.mean()
# is the mean of each column = to the loop above (spoiler: yes)
print(bunc_per_cap_income, NC_avg_per_cap_income)
Year 1970 2671.0 1980 6360.0 1990 13211.0 2000 20384.0 2010 25665.0 2020 33835.0 Name: Buncombe County, dtype: float64 Year 1970 2161.20 1980 5504.49 1990 11306.92 2000 17812.76 2010 21620.08 2020 27696.34 dtype: float64
Let's Look at Mean Family Income as Well¶
The variable "Mean Fanily Income" is another measure which will allow a similar comparison. Let's add it to our dataframe. Here I'll do all the steps in one motion, as it were.
county_mean_family_income = emp_inc_df[(emp_inc_df["Area Type"] == "County")
& (emp_inc_df["Variable"] == "Mean Family Income")
& (emp_inc_df["Year"].isin(years))]
county_mean_family_income = county_mean_family_income.pivot(index="Area Name", columns="Year", values="Value")
buncombe_mean_family_income = county_mean_family_income.loc["Buncombe County"]
NC_mean_family_income = county_mean_family_income.mean()
# Put together a finished comparison DF
compare_Bunc_and_NC_per_cap_income = pd.DataFrame({"Buncombe County Per Cap Income": bunc_per_cap_income,
"Buncombe County Mean Family Income": buncombe_mean_family_income,
"Avg NC County Per Cap Income": NC_avg_per_cap_income,
"Avg NC Mean Family Income": NC_mean_family_income,
"CPI": CPI,
"CPI 2020 Multiplier": CPI_Multipliers},
index=NC_avg_per_cap_income.index)
compare_Bunc_and_NC_per_cap_income["Bunc Co PCI (2020 Dollars)"] = (
compare_Bunc_and_NC_per_cap_income["Buncombe County Per Cap Income"]
* compare_Bunc_and_NC_per_cap_income["CPI 2020 Multiplier"])
compare_Bunc_and_NC_per_cap_income["Bunc Co Mean Family Income (2020 Dollars)"] = (
compare_Bunc_and_NC_per_cap_income["Buncombe County Mean Family Income"]
* compare_Bunc_and_NC_per_cap_income["CPI 2020 Multiplier"])
compare_Bunc_and_NC_per_cap_income["NC Avg PCI (2020 Dollars)"] = (
compare_Bunc_and_NC_per_cap_income["Avg NC County Per Cap Income"]
* compare_Bunc_and_NC_per_cap_income["CPI 2020 Multiplier"])
compare_Bunc_and_NC_per_cap_income["NC Mean Family Income (2020 Dollars)"] = (
compare_Bunc_and_NC_per_cap_income["Avg NC Mean Family Income"]
* compare_Bunc_and_NC_per_cap_income["CPI 2020 Multiplier"])
compare_Bunc_and_NC_per_cap_income
Buncombe County Per Cap Income | Buncombe County Mean Family Income | Avg NC County Per Cap Income | Avg NC Mean Family Income | CPI | CPI 2020 Multiplier | Bunc Co PCI (2020 Dollars) | Bunc Co Mean Family Income (2020 Dollars) | NC Avg PCI (2020 Dollars) | NC Mean Family Income (2020 Dollars) | |
---|---|---|---|---|---|---|---|---|---|---|
Year | ||||||||||
1970 | 2671.0 | 8781.0 | 2161.20 | 7769.05 | 38.800 | 6.670387 | 17816.602603 | 58572.664716 | 14416.039515 | 51822.566999 |
1980 | 6360.0 | 19279.0 | 5504.49 | 17708.04 | 82.400 | 3.140910 | 19976.188835 | 60553.607633 | 17289.108755 | 55619.363355 |
1990 | 13211.0 | 37183.0 | 11306.92 | 33517.70 | 130.700 | 1.980191 | 26160.306970 | 73629.452280 | 22389.864362 | 66371.457190 |
2000 | 20384.0 | 56354.0 | 17812.76 | 51308.68 | 172.200 | 1.502967 | 30636.489106 | 84698.229350 | 26771.999003 | 77115.277465 |
2010 | 25665.0 | 71669.0 | 21620.08 | 62891.14 | 218.056 | 1.186902 | 30461.827764 | 85064.045745 | 25660.906028 | 74645.590282 |
2020 | 33835.0 | 96530.0 | 27696.34 | 80073.13 | 258.811 | 1.000000 | 33835.000000 | 96530.000000 | 27696.340000 | 80073.130000 |
Generate the Visualization¶
buncombe_PCI = compare_Bunc_and_NC_per_cap_income["Bunc Co PCI (2020 Dollars)"]
buncombe_MFI = compare_Bunc_and_NC_per_cap_income["Bunc Co Mean Family Income (2020 Dollars)"]
NC_PCI = compare_Bunc_and_NC_per_cap_income["NC Avg PCI (2020 Dollars)"]
NC_MFI = compare_Bunc_and_NC_per_cap_income["NC Mean Family Income (2020 Dollars)"]
years = list(compare_Bunc_and_NC_per_cap_income.index)
fig, ax = plt.subplots(1, 1, figsize=(14, 6))
plt.plot(years, buncombe_MFI, color="blue", label="Bunc. Co. Mean Family Income (2020 Dollars)")
plt.plot(years, NC_MFI, color="red", label="NC Mean Family Income (2020 Dollars)")
plt.plot(years, buncombe_PCI, color="skyblue", label="Bunc. Co. Per Capita Income (2020 Dollars)")
plt.plot(years, NC_PCI, color="pink", label="N.C. Mean Per Capita Income (2020 Dollars)")
plt.title("Inflation Adjusted Per Capita and Mean Family Income in Buncombe County vs North Carolina Average, 1970-2020")
plt.xlabel("Census Year")
plt.ylabel("Inflation Adjusted 2020 Dollars (Based on CPI)")
plt.legend()
plt.savefig("North_Carolina_Income_Change_1970_to_2020.png")
plt.show(fig);