This repository contains my first project on data analysis using Python libraries such as Numpy, Pandas, Matplotlib, and Seaborn. The project involves extracting, cleaning, and visualizing data from a CSV file downloaded from the Stack Overflow website.
In this project, I utilized powerful data analysis libraries to transform raw data into meaningful insights. The main steps included extracting the data, cleaning it to remove anomalies, and visualizing it to highlight key trends.
- Downloaded the original dataset from Stack Overflow in CSV format. You can download the data from Stack Overflow Survey Data.
- Loaded the data into a Jupyter Notebook using the Pandas library.
- Identified and corrected anomalies in the data, such as incorrect ages and implausible coding start ages.
- Used Pandas and Numpy to clean the data and make it readable and understandable.
- Created various plots and graphs using Matplotlib and Seaborn.
- Visualizations helped to better understand the data and highlight important trends.
To run this project, you'll need to have Python installed along with the following libraries:
pip install numpy pandas matplotlib seaborn jupyter
- Clone this repository to your local machine:
git clone https://github.com/yourusername/data-analysis-project.git
- Navigate to the project directory:
cd data-analysis-project
- Open the Jupyter Notebook:
jupyter notebook
- Run the notebook to see the data extraction, cleaning, and visualization steps.
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.