Data-Scientist-Salary-Prediction

Table of Content

Linkdin Profile
Project Overview
How will this project help?
Resources Used
Exploratory Data Analysis (EDA) and Data Cleaning
Feature Engineering
Model Building and Evaluation
Model Prediction

Linkdin Profile

For any queries regarding about this project contact me

Link : https://www.linkedin.com/in/anil-l-b023631b6/

Project Overview

• Created a machine learning model that estimates salary of data scientist based on the features like rating, company_founded, etc.
• Engineered features from the text of each job description to quantify the value companies put on python, excel, tableau and sql

How will this project help?

• This project helps data scientist/analyst to negotiate their income for an existing or a new job

Resources Used

• Packages: pandas, numpy, sklearn, matplotlib, seaborn.
• Dataset by Ken Jee: https://github.com/PlayingNumbers/ds_salary_proj

Exploratory Data Analysis (EDA) and Data Cleaning

• Removed unwanted columns: 'Unnamed: 0'
• Plotted bargraphs and countplots for numerical and categorical features respectively for EDA
• Numerical Features (Rating, Founded): Replaced NaN or -1 values with mean or meadian based on their distribution

• Categorical Features: Replaced NaN or -1 values with 'Other'/'Unknown' category
• Removed unwanted alphabet/special characters from Salary feature
• Converted the Salary column into one scale i.e from (per hour, per annum, employer provided salary) to (per annum)

Feature Engineering

• Creating new features from existing features e.g. job_in_headquaters from (job_location, headquarters), etc.

• Trimming columns i.e. Trimming features having more than 10 categories to reduce the dimensionality
• Handling ordinal and nominal categorical features
• Feature Selection using information gain (mutual_info_regression) and correlation matrix
• Feature Scaling using StandardScalar

Model Building and Evaluation

Metric: Negative Root Mean Squared Error (NRMSE)
• Multiple Linear Regression: -27.523
• Lasso Regression: -27.993
• Random Forest: -17.637
• Gradient Boosting: -24.429
• Voting (Random Forest + Gradient Boosting): -19.136
Note: Evaluation scores are obtained using cross validation.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Data Science Salary Prediction .ipynb		Data Science Salary Prediction .ipynb
README.md		README.md
glassdoor_jobs.csv		glassdoor_jobs.csv
salary_pk.pkl		salary_pk.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data-Scientist-Salary-Prediction

Table of Content

Linkdin Profile

Project Overview

How will this project help?

Resources Used

Exploratory Data Analysis (EDA) and Data Cleaning

Feature Engineering

Model Building and Evaluation

Model Prediction

About

Releases

Packages

Languages

anillava1999/Data-Scientist-Salary-Prediction

Folders and files

Latest commit

History

Repository files navigation

Data-Scientist-Salary-Prediction

Table of Content

Linkdin Profile

Project Overview

How will this project help?

Resources Used

Exploratory Data Analysis (EDA) and Data Cleaning

Feature Engineering

Model Building and Evaluation

Model Prediction

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages