This project, carried out in Jupyter Notebook, aims to explore the main Data Analysis techniques with Python tools.
- Pandas, Numpy, Seaborn, Matplotlib, Plotly and sklearn are used. Divided into three notebooks, I separate the data cleaning, data analysis and machine learning part.
Author: Lucas Lobianco De Matheo
Title: Kaggle Titanic DataSet
This dataset was one of the first I worked on and today I feel able to explore it better and with more techniques. Extension: .csv Source: https://www.kaggle.com/azeembootwala/titanic Date: 01-02-2022Main Skills of this project:
- Data Preparation
- Data Cleansing
- Data Wrangling
- Data pre-processing
- Exploratory Data Analysis (EAD)
- Data Visualization
In this project you will find a .csv file that was used to start the project (titanic.csv) and a preprocessing result file (titanic_preprocessed.csv) and (titanic_preprocessed_2.csv).
The notebook that starts the project is the "Titanic DataSet - Data Cleasing", which generates the titanic_preprocessed.csv, useful for the analysis part.
The notebook "Titanic DataSet - Data Analysis" takes the titanic_preprocessed.csv as input and generates the titanic_preprocessed_2.csv which can later be used in BI platforms like PowerBI as an add-on.
The "Titanic DataSet - Machine Learning" notebook also uses the titanic_preprocessed.csv as input, as the M.L. algorithms used prioritize numerical variables.