Skip to content

Benchmarking Different transformers and demonstrating the pros of focal loss on imbalanced classification data

Notifications You must be signed in to change notification settings

kumar-selvakumaran/Transformers_for_imbalanced_classification

Repository files navigation

Transformers compete on imbalanced product classification data

Context:

  • The task chosen is to classify product data into its respective "browse node"

  • Product data such as "Product name", "Product description", "Brand Name", "Brand Description" is given.

  • As seen in E-commerce websites, The number of products existing in each category is highly varied, and this imbalance makes categorization by machine-learning challenging, and interesting.

  • Hence, the performance of the amazing transformer models are explored on this imbalanced data.

details

  • The dataset can can be found in this link. [link]

  • link to resulting publication. [link]

Steps taken in this project:

  1. The dataset is explore and pre-processed. [EDA and preprocessing]

  2. The pre-processed dataset is tokenized using rust based batchwise tokenization and saved as pickle files. [tokenization]

  3. Each transformer is trained using its respective tokensized dataset, the metrics are logged, and then visualized as shown below. [training] :

Girl in a jacket


4. The top performer is retrained using the focal loss, and the improved is shown as below.


Girl in a jacket


References

[1]  Surana S, "Amazon Product Browse Node Classification Data", Kaggle Datasets. [link]

About

Benchmarking Different transformers and demonstrating the pros of focal loss on imbalanced classification data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published