Skip to content

Latest commit

 

History

History
43 lines (26 loc) · 983 Bytes

README.md

File metadata and controls

43 lines (26 loc) · 983 Bytes

Data preparation and transformation exercise

The objective of this exercise is to practice various steps of data preprocessing and feature engineering.

The scenario is the preparation of data for a ML multilinear regressions.

The dataset used is the "Climate Weather Surface of Brazil - Hourly", wich is available at Kaggle.

It contains hourly climate data taken from weather stations in Brasil, taken between 2000 and 2021.

This exercise is broken down as follows:

Part I

  1. Load data
  2. Inspect data

Part II

  1. Format features
  2. Clean messy data
  3. Remove duplicate values

Part III

  1. Treat missing values
  2. Imputation

Part IV

  1. Remove strongly correlated features
  2. Remove outliers

Part V

  1. Aggregate features
  2. Encode categorical features
  3. Feature scaling
  4. Dimensionality reduction and feature decomposition

Part VI

  1. Sample and balance