-
The task chosen is to classify product data into its respective "browse node"
-
Product data such as "Product name", "Product description", "Brand Name", "Brand Description" is given.
-
As seen in E-commerce websites, The number of products existing in each category is highly varied, and this imbalance makes categorization by machine-learning challenging, and interesting.
-
Hence, the performance of the amazing transformer models are explored on this imbalanced data.
-
The dataset is explore and pre-processed. [EDA and preprocessing]
-
The pre-processed dataset is tokenized using rust based batchwise tokenization and saved as pickle files. [tokenization]
-
Each transformer is trained using its respective tokensized dataset, the metrics are logged, and then visualized as shown below. [training] :
4. The top performer is retrained using the focal loss, and the improved is shown as below.
[1] Surana S, "Amazon Product Browse Node Classification Data", Kaggle Datasets. [link]