Data-Wrangling

This repository contains experiments on data wrangling techniques, focusing on methods for handling missing values, filtering, aggregation, and more.

Python

Python is a high-level, interpreted programming language widely used in data science for data manipulation, analysis, and visualization. Libraries such as Pandas and NumPy provide powerful tools for data wrangling, including handling missing values, filtering, and reshaping datasets.

Directory Structure 📂

Data-Wrangling/
│
├── Experiment 1 - Handling Missing Values/
│   ├── Handling_Missing_Values.ipynb
│
├── Experiment 2 - Data Filtering/
│   ├── Data_Filtering.ipynb
│   ├── Experiment 2 Document.docx
│
├── Experiment 3 - Data Aggregation/
│   ├── Data_Aggregation.ipynb
│   ├── Experiment 3 Document.docx/
│
├── Experiment 4 - Data Concatenation/
│   ├── Data_Concatenation.ipynb
│
├── Experiment 5 - Data Reshaping/
│   ├── Data_Reshaping.ipynb
│
├── Experiment 6 - Data Sampling/
│   ├── Data_Sampling.ipynb
│
├── Experiment 7 - Data Conversion/
│   ├── Data_Conversion.ipynb
│
└── README.md

Table Of Contents 📔 🔖 📑

1. Handling Missing Values

Identify and fill missing values in a dataset using methods such as mean imputation or forward/backward filling to ensure data completeness and accuracy.

2. Data Filtering

Filter rows or columns based on specified criteria, such as removing outliers or selecting data within a certain range to refine datasets for analysis.

3. Data Aggregation

Aggregate data by grouping rows based on specific attributes and computing summary statistics, such as mean, median, count, or sum. This helps to summarize large datasets for easier analysis.

4. Data Concatenation

Concatenate multiple datasets either along rows or columns to create a unified dataset. This method is useful when merging datasets from different sources or appending new data to an existing dataset.

5. Data Reshaping

Reshape data by pivoting, stacking, or unstacking to convert between wide and long formats. This technique allows for better organization and analysis of data with multiple variables.

6. Data Sampling

Randomly sample rows or columns from a dataset to create a smaller subset for analysis. Sampling is useful for exploratory data analysis, testing models, or handling large datasets efficiently.

7. Data Conversion

Convert data types of columns, such as changing categorical variables to numerical representations or converting numerical values into categories, enabling better processing and analysis of the data.

8. Text Data Processing

Clean and preprocess text data by removing punctuation, stopwords, and performing tokenization. This process helps in standardizing the text, making it ready for further analysis such as natural language processing (NLP) or text mining. Tokenization splits text into words or phrases, which can then be analyzed or converted into numerical representations for machine learning models.

9. Date-Time Processing

Extract date or time components from datetime columns and perform operations such as calculating time differences or aggregating data by time intervals. This allows for efficient analysis of time series data and helps in understanding trends over different time periods. Techniques include extracting year, month, day, and calculating durations between timestamps.

10. Data Merging

Merge two or more datasets based on common keys or indices to combine information from different sources. This process is essential for creating comprehensive datasets that capture all relevant data points across different tables. Techniques include inner joins, outer joins, left joins, and right joins to ensure that data relationships are properly maintained during the merging process.

Thanks for Visiting 😄

Drop a 🌟 if you find this repository useful.
If you have any doubts or suggestions, feel free to reach me.

📫 How to reach me:
Contribute and Discuss: Feel free to open issues 🐛, submit pull requests 🛠️, or start discussions 💬 to help improve this repository!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data-Wrangling

Python

Directory Structure 📂

Table Of Contents 📔 🔖 📑

1. Handling Missing Values

2. Data Filtering

3. Data Aggregation

4. Data Concatenation

5. Data Reshaping

6. Data Sampling

7. Data Conversion

8. Text Data Processing

9. Date-Time Processing

10. Data Merging

Thanks for Visiting 😄

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Experiment 1		Experiment 1
Experiment 10		Experiment 10
Experiment 2		Experiment 2
Experiment 3		Experiment 3
Experiment 4		Experiment 4
Experiment 5		Experiment 5
Experiment 6		Experiment 6
Experiment 7		Experiment 7
Experiment 9		Experiment 9
LICENSE		LICENSE
README.md		README.md

License

madhurimarawat/Data-Wrangling

Folders and files

Latest commit

History

Repository files navigation

Data-Wrangling

Python

Directory Structure 📂

Table Of Contents 📔 🔖 📑

1. Handling Missing Values

2. Data Filtering

3. Data Aggregation

4. Data Concatenation

5. Data Reshaping

6. Data Sampling

7. Data Conversion

8. Text Data Processing

9. Date-Time Processing

10. Data Merging

Thanks for Visiting 😄

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages