Sorting Schools:

A Computational Analysis of Charter School Identities and Stratification

Suggested citation: Haber, Jaren. 2021. “Sorting Schools: A Computational Analysis of Charter School Identities and Stratification.” Sociology of Education 94(1):43–64. doi: 10.1177/0038040720953218.

Paper Abstract

Research shows charter schools are more segregated by race and class than traditional public schools. I investigate an under-examined mechanism for this segregation: Charter schools project identities corresponding to parents’ race- and class-specific parenting styles and educational values. I use computational text analysis to detect the emphasis on inquiry-based learning in the websites of all charter schools operating in 2015-16. I then estimate mixed linear regression models to test the relationships between ideological emphasis and school- and district-level poverty and ethnicity. I thereby transcend methodological problems in scholarship on charter school identities by collecting contemporary, population-wide data, and by blending text analysis with hypothesis testing. Findings suggest charter school identities are both race- and class-specific, outlining a new mechanism by which school choice may consolidate parents by race and class—and paving the way for behavioral and longitudinal studies. This project contributes to literatures on school choice and educational stratification.

Public Data Sources

School directory and demographics: Common Core of Data, Public School Universe Survey (CCD PSUS), 2015-16
School district demographics: American Community Survey (ACS), 2012-16
School academic performance: EdFacts Assessment Proficiency, 2015-16
Additional school information (not used in analysis): Civil Rights Data Collection (CRDC), 2015-16
Charter Management Organization (CMO) directory (supplemented by author): Stanford Center for Research on Education Outcomes
School district child poverty information (not used in analysis):Small Area Income and Poverty Estimates (SAIPE) Program

Notes

See codebook.csv for detailed information on all variables in data files. For comprehensiveness, variables not used in final analysis (see mixed_models.do) are retained.

Data files (.dta and .csv)--other than CMO and URL lists--are post-processing (see data_preparation.do).

Web-crawling speeds were throttled to prevent server overload, and web-crawled site data is kept private pursuant to school website copyrights.

Guide to inspecting robustness checks via log files

Alternative Measures I: Lagged academic proficiency rates (2013-14 and 2015-16 instead of 2014-15):
logs/robust_laggedscores_mi5_linear_101019.smcl
Alternative Measures II: Narrow dictionaries of IBL rather than 50-term full dictionary: seed (5 terms), narrow (20 terms), and full w/o “hands-on” term (49 terms)
logs/robustness_check_dictionary_size.pdf
Alternative Measures III: Race/class differentials between district and school
logs/robustness_check_district_differentials.pdf
Fully nested mixed-effects linear models (100 imputations)

pt. 1: Race & poverty -> IBL
logs/robustness_check_fully_nested_models_1.pdf
pt. 2: IBL, academic proficiency -> Poverty
logs/robustness_check_fully_nested_models_2.pdf
pt. 3: IBL, academic proficiency -> Race
logs/robustness_check_fully_nested_models_3.pdf

Filter Dataset: Restrict sample to only those schools with:

precise academic data: readlevel14 & mathlevel14 == 1
logs/robustness_check_precise_scores.pdf
above-average district poverty
logs/robustness_check_high_poverty_districts.pdf
above-average district POC
logs/robustness_check_high_POC_districts.pdf
above-average district population density
logs/robustness_check_high_density_districts.pdf
inquiry_full_count < 10000
logs/robustness_check_ibl_outliers.pdf
numpages < 100
logs/robustness_check_large_websites.pdf
students > 10
logs/robustness_check_small_schools.pdf

Acknowledgments

I am especially grateful to Heather Haveman for her constructive criticism, which after many drafts have greatly improved this paper. I also give thanks to the UC Berkeley Data-Intensive Social Science Lab (D-Lab) community for teaching me to code and to embrace not knowing. I acknowledge also Sam Lucas, Calvin Morrill, Bruce Fuller, David Bamman, Ben Gebre-Medhin, and Caroline Le Pennec-Caldichoury for their feedback and insightful comments; Aaron Culich, Carl Mason, and the Cloud Working Group for help with web data collection and computing infrastructure; and my family and wife for their encouragement and support. This complex project wouldn’t have been possible without the contributions of 38 research assistants from the Undergraduate Research Apprentice Program and Data Science Discovery Program: Kanika Ahluwalia, Brad Afzali, Akcan Balkir, Muying Chen, Siyuan Chen, Yitong Chen, Kaan Dogusoy, Saabhir Gill, Harshayu Girase, Akshat Gokhale, Yoon Sung Hong, Jennifer Huang, Elaine Huynh, Krutika Ingale, Jiyoon Jeong, James Jung, Inderpal Kaur, Francis Kumar, Yong Jin Kweon, Ariel Langer, Brian Yimin Lei, Xueyong Liu, Haley Miller, Anna Nguyen, Thao Nguyen, Tina Nguyen, Madeleine Peng, Emily Qian, Samyukta Raman, Ji Shi, Sarah Solieman, Arjun Srinivasan, Prianka Subrahmanyam, Frank Wang, Violet Yao, George Wu, Max Yuan, and Jiahua Zou. Previous versions of this paper and its methods were presented at the Berkeley Institute for Data Science’s 2018 Text Across Domains (TextXD) symposium; the D-Lab’s Computational Text Analysis Working Group in 2017-18; the 2018 Making Text Research-Ready symposium; the 2018 Graduate School of Education Research Day; and the American Sociology Association’s Sociology of Education Section in 2017 and 2018. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1548562, as well as the Berkeley Demography Lab cloud computing facility. Financial support was provided by the UC Berkeley Dissertation Completion Fellowship and the Bridge Lowenthal Fellowship. I declare no conflict of interest in doing this research.

Contact

jhaber@berkeley.edu

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
data		data
do-files		do-files
logs		logs
model_estimates		model_estimates
visuals		visuals
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
codebook.csv		codebook.csv
expand_dictionary.ipynb		expand_dictionary.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sorting Schools:

A Computational Analysis of Charter School Identities and Stratification

Paper Abstract

Public Data Sources

Notes

Guide to inspecting robustness checks via log files

Acknowledgments

Contact

About

Releases

Packages

Languages

License

comp-strat/sorting-schools-2020

Folders and files

Latest commit

History

Repository files navigation

Sorting Schools:

A Computational Analysis of Charter School Identities and Stratification

Paper Abstract

Public Data Sources

Notes

Guide to inspecting robustness checks via log files

Acknowledgments

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages