In this project, supervised learning techniques and an analytical mind on data collected for the U.S. census to help CharityML (a fictitious charity organization) identify people most likely to donate to their cause is applied.First the data is explored to learn how the census data is recorded. Next, a series of transformations and preprocessing techniques have been applied to manipulate the data into a workable format.Then several supervised learners is implemented on the data, and considered which is best suited for the solution.
Afterwards, the model i've selected is optimized and presented as solution to CharityML. Finally,the chosen model and its predictions under the hood is explored, to see just how well it's performing.
This project is designed to get acquainted with the many supervised learning algorithms available in sklearn, and to also provide for a method of evaluating just how each model works and performs on a certain type of data. It is important in machine learning to understand exactly when and where a certain algorithm should be used, and when one should be avoided.
This project uses the following software and Python libraries:
This project contains three files:
finding_donors.ipynb
: This is the main file where all works description is given.census.csv
: The project dataset. This dataset is loaded in the notebook.visuals.py
: A Python file containing visualization code that is run behind-the-scenes. Do not modify