Impute-NAs-Better

Impute missing values while minimizing distortion of overall variable distributions by:

Using available columns per row to create a bagged model.
Applying that model to non-NA rows to find distribution of residuals.
Adding variation to the model's output by adding a random residual to each of them.

As designed this imputer takes in a dataframe whose categorical variables are encoded as strings, and imputes NAs for all missing values, starting with the columns with the fewest NAs, then using the newly NA-free columns in the next imputations.

The regression estimator is linear regression, and the classifier is random forests.

This imputer is an implementation of a technique described in the following paper:

Joseph L. Schafer & Maren K. Olsen (1998) Multiple Imputation for Multivariate Missing-Data Problems: A Data Analyst's Perspective, Multivariate Behavioral Research, 33:4, 545-571, DOI: 10.1207/s15327906mbr3304_5

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.DS_Store		.DS_Store
MultipleImputation.py		MultipleImputation.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Impute-NAs-Better

About

Releases

Packages

Languages

juanfrcaliz/Impute-NAs-Better

Folders and files

Latest commit

History

Repository files navigation

Impute-NAs-Better

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages