GitHub - KianoushAmirpour/Exploratory-data-analysis: Performing Exploratory Data Analysis (EDA) using PySpark, pandas, matplotlib, and scikit-learn for data manipulation, visualization, and pattern identification.

Gain Domain knowledge
Check for missing values
Check for duplicates
Categorical features distribution
Association between categorical features (Chi-square test)
Numerical features distribution (histograms, boxplots, violinplot)
Correlation between Numerical features
Transformation of numerical features and Normality tests (Log Normal, QuantileTransformer, Boxcox transformation, Kolmogorov-Smirnov test, qqplots)
Encoding values ( ordinal_encoder, label_encoder, one_hot_encoding)
Correlation between all features
PCA (Explained Variance and Cumulative Variance, loadings)

Which five products have the largest difference between their Popularity Index and Return Rate?
Which Supplier ID has the highest percentage of sales using the Shipping Method: Standard?
Compare the average Shipping Cost across different Category values.
Which Category has the highest number of sold products with a Popularity Index between 50 and 70?
Report the top ten Supplier ID values with the highest net sales amount (after applying Discount and Tax Rate) along with their net sales amount.
In which city have individuals under 35 years old paid the highest total cost for purchasing and receiving products on average?
In which continent has the highest Stock Level been reported, and in which has the lowest been reported?
Sort cities based on their average Shipping Cost.
In which Category has the most variation in the Price of sold products been observed?
What percentage of products have a Popularity Index of less than 80?
In which Category have Supplier ID values applied the lowest percentage of Discount?
Which three products have the highest Popularity Index among the Customer Age Group above 55 years?

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
Multi_class_prediction_analysis_pandas.ipynb		Multi_class_prediction_analysis_pandas.ipynb
README.md		README.md
purchase_analysis_pyspark.ipynb		purchase_analysis_pyspark.ipynb

Provide feedback