This repository will contain all data used for the VITA mapping project. It also contains Jupyter Notebooks that cover cleaning and prepping.
Maps are on Tableau Public:
- VITA sites
- Counties with MyFreeTaxes e-filed returns
- VITA sites vs counties where MyFreeTaxes returns were filed
- VITA sites vs income eligibility
First, set up your Census API key. We use the API key by saving it to a .env
file.
Save the .env-sample file to .env.
mv .env-sample .env
Then replace the text <YOUR API KEY HERE>
in .env-sample with the API key you just got from the Census.
You'll need python 3 and Jupyter installed. You'll probably also want git.
With homebrew:
brew install python jupyter
brew install git #optional
Then install the necessary python libraries:
pip install python-dotenv
pip install pandas
pip install geopandas Fiona folium Shapely rtree
pip install census
To run the jupyter notebooks, you'll need to download the repo. Either do that through git:
git clone https://github.com/rcackerman/vita-mapping.git
or download the zip file and unpack it.
Then you'll start jupyter notebook:
jupyter notebook
You should be ready to run!
The repo is split into 2 workflows, based mainly on which map the data are being used for. First, for VITA Sites vs eligibility, we use the files:
1. get ACS data.ipynb
* andVITA sites matched to county.ipynb
.
This will create two files, ./data/output/acs_household_income_county.csv
and vita_sites_by_county.csv
.
The other workflow is for VITA sites vs MFT use:
1. get ACS data.ipynb
*2. MFT Processing.ipynb
3. Merging MFT to ACS relational file.ipynb
4. MFT at the county level.ipynb
- and
VITA + MFT.ipynb
This will create the file ./data/output/mft_vita_merged.csv
.
* You only need to run this file once, even if you're rebuilding both maps.
- 2019 county shapefiles: https://www.census.gov/geographies/mapping-files/time-series/geo/cartographic-boundary.2019.html
- 2019 ACS detailed datasets: https://www.census.gov/data/developers/data-sets/acs-5year.2019.html
- Specifically, "Household Income in the Past 12 Months (In 2020 Inflation-Adjusted Dollars)" (group B19001) 5 year estimate data tables at the county summary level.
- 2022 VITA sites from the IRS, obtained via Code for America's scraper. Nb. that we have only included sites where the
archived
flag is false. - 2019 MyFreeTaxes data, provided by United Way Worldwide. This data is not public, as it contains personally-identifiable information. A version without columns including PII is included in this repo: ./data/mft_returns_2019.csv.
While doing this project, we also found income tax return data from the IRS, including VITA returns.