Machine-learning

Machine learning coding proposed by Apurv verma.

Data preprocessing : Day 1

Data preprocessing:First step to get started with machhine learning.

Step1: Importing the required libraries

1.Numpy as np

2.Pandas

3.Matplotlib

These three are the essential libraries and need to be imported each and every time.

Pandas is the library used to import datasets and managing them.

Numpy is the library which contains all the Mathematical functions.

Step 2: Importing the data set

Data sets are generally availabe in .csv format. It contains data in tabular form.

We use read_csv.md method of pandas library to read the csv file as a dataframe.

Then we make seperate vectors and matrix of dependent variable and independent variables from the dataframe.

Step3: Handling the missing data:

Data is rarely clean and often you can have corrupt or missing values.

Missing data in the column is generally represented by NaN.

Data can be missing due to various reasons and it is very important to take care of that as it may affect the performance of the model.

We can replace the missing data by the median or mean of the entire column.

We use imputer class of sklearn.preprocessing module for this task.

Step4: Encoding the categorical data

Categorical data are the variables that contain label values rather than numeric values.

String values such as yes or no can not be used in the mathematical equations of the model so we need to encode these variables into numbers.To achieve this we use LabelEncoder class from sklearn.preprocessing library.

Step4: Splitting the dataset into test set and training set

We usually split dataset into two parts,One for the training the model called training set and other for testing the performance of the trained model called test set. The ratio of split is generally 80% for the training and 20% for the testing.

For this purpose we import train_test_split() class from sklearn.model_selection module.

Step5: Feature scaling

Most of the machine learning algorithm use the Euclidean distance between two data points in their computation.

We have features in the datasets having different range and it can pose problem ,so it is required to bring all the feature in some definite range so that the performance of the model can be improved.

For this StandardScaler class of sklearn.preprocessing module is used.

Generally,Scaling is preferred after splitting the data into test set and training set.

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
Code		Code
KNN.ipynb		KNN.ipynb
README.md		README.md
apriori.ipynb		apriori.ipynb
apriori.py		apriori.py
data_preprocessing_template.ipynb		data_preprocessing_template.ipynb
data_preprocessing_template.py		data_preprocessing_template.py
decision_tree_classification.ipynb		decision_tree_classification.ipynb
decision_tree_classification.py		decision_tree_classification.py
decision_tree_regression.ipynb		decision_tree_regression.ipynb
decision_tree_regression.py		decision_tree_regression.py
eclat.py		eclat.py
hierarchical_clustering.ipynb		hierarchical_clustering.ipynb
hierarchical_clustering.py		hierarchical_clustering.py
k_means_clustering.ipynb		k_means_clustering.ipynb
k_means_clustering.py		k_means_clustering.py
linear1.py		linear1.py
logistic_regression.ipynb		logistic_regression.ipynb
logistic_regression.py		logistic_regression.py
multiple_linear_regression.ipynb		multiple_linear_regression.ipynb
multiple_linear_regression.py		multiple_linear_regression.py
naive_bayes.ipynb		naive_bayes.ipynb
polynomial_regression.py		polynomial_regression.py
random_forest_classification.ipynb		random_forest_classification.ipynb
random_forest_classification.py		random_forest_classification.py
random_forest_regression.ipynb		random_forest_regression.ipynb
simple_linear_regression.ipynb		simple_linear_regression.ipynb
support_vector_machine.ipynb		support_vector_machine.ipynb
support_vector_machine.py		support_vector_machine.py
support_vector_regression.ipynb		support_vector_regression.ipynb
upper_confidence_bound.ipynb		upper_confidence_bound.ipynb
upper_confidence_bound.py		upper_confidence_bound.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine-learning

About

Releases

Packages

Languages

vapurv843/Machine-learning

Folders and files

Latest commit

History

Repository files navigation

Machine-learning

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages