- Gain Domain knowledge
- Check for missing values
- Check for duplicates
- Categorical features distribution
- Association between categorical features (Chi-square test)
- Numerical features distribution (histograms, boxplots, violinplot)
- Correlation between Numerical features
- Transformation of numerical features and Normality tests (Log Normal, QuantileTransformer, Boxcox transformation, Kolmogorov-Smirnov test, qqplots)
- Encoding values ( ordinal_encoder, label_encoder, one_hot_encoding)
- Correlation between all features
- PCA (Explained Variance and Cumulative Variance, loadings)
- Which five products have the largest difference between their Popularity Index and Return Rate?
- Which Supplier ID has the highest percentage of sales using the Shipping Method: Standard?
- Compare the average Shipping Cost across different Category values.
- Which Category has the highest number of sold products with a Popularity Index between 50 and 70?
- Report the top ten Supplier ID values with the highest net sales amount (after applying Discount and Tax Rate) along with their net sales amount.
- In which city have individuals under 35 years old paid the highest total cost for purchasing and receiving products on average?
- In which continent has the highest Stock Level been reported, and in which has the lowest been reported?
- Sort cities based on their average Shipping Cost.
- In which Category has the most variation in the Price of sold products been observed?
- What percentage of products have a Popularity Index of less than 80?
- In which Category have Supplier ID values applied the lowest percentage of Discount?
- Which three products have the highest Popularity Index among the Customer Age Group above 55 years?