NITA Fashion is one of the biggest e-commerce companies distributing over 400 local and international brands. The products are divided into four main categories: apparel, footwear, accessories, and personal care. Customer experience is the company's core value. We consistently strive to identify problems and devise effective solutions, placing utmost importance on ensuring our customers' satisfaction.
As an E-commerce data scientist, I am recently working on a project proposal to enhance the customer experience during product searches. This initiative is based on insights gathered from a survey conducted in 2022, which involved feedback from 2000 customers regarding the challenges they encounter while searching for products using keywords. The survey findings revealed several valuable points:
- 35% of the customers reported difficulties in locating their desired products amidst the vast array of offerings on the website.
- 42% expressed concerns about spending excessive time on the website without being able to find their target items effectively.
- 22% of the customers expressed the demand for an image search feature
How to improve customer experience during product searches?
I would like to mention visual search (image search and video search) and how potential it is, especially in the fashion e-commerce market.
- Visual search is a new search type with rising demand, in which:
-
62% of Millennial and Generation Z customers desire visual search over any other new technology. (2019)
-
54% of US internet users were excited to have this technology in their shopping experience
-
Fashion has been always at the top of visual search
-
The visual search market is projected to experience substantial growth, with a Compound Annual Growth Rate (CAGR) of approximately 18% anticipated until the year 2028.
-
Considering the vast potential of the global visual search market, I believe that adopting visual search is a critical and timely move for our company. Being an early adopter will provide us with competitive advantages and allow us to leverage all of its benefits.
Given our company's current capacity and available resources, I propose taking a phased approach, starting with phase 1, which will primarily focus on building an image search feature with 2 functions: image classification and image similarity. This will allow us to build a strong foundation and gradually expand into other areas of visual search in the future.
The proposal will be introduced to the Chief Executive Officer and all Department Heads to get feedback and approval for its implementation.
Our dataset consists of more than 44 thousand images from 143 product types, however, for proposal purposes, I will focus on the top 10 product types with more than 24 thousand images.
We will utilize both classic convolutional neural networks and transfer learning networks to build up the first foundational model for image classification tasks, then compare their performance and select the best one.
We then apply transfer learning networks to build up the models to see which is the best one to utilize.
-
The best ResNet50 model has an accuracy of 75%.
-
MobiletNetV2 best model improves the test accuracy to 79%.
-
VGG16 has an impressive test accuracy of 95%.
We will build up a Siamese model utilizing contrastive loss in order to predict the Euclidean distances of image pairs. The shorter distances indicate stronger similarity.
Test accuracy first achieved 50% and despite various improvement steps, the model's test accuracy remain at only 50%. Due to the time constraint and computational limitations, we will consider this model as our best model at the moment.
The model's performance will be assessed based on accuracy, which involves calculating the ratio of correct predictions to the total number of predictions. Besides that, the f1-score will be another metric to use to analyze the performance of individual classes in the image classification model.
The image classification model's performance will be evaluated across the same 50 training epochs.
By comparing the performance of all models, the best CNN model turns out to be the best one regarding the accuracy, and computational time within 50 epochs.
When examining the performance in individual classes, 8 out of 10 them exhibit impressive F1-scores ranging from 96% to 100%. However, casual shoes and sports shoes were exceptions, showing slightly lower scores of 82% and 85%, respectively.
The explanation for it is visual ambiguity when the machine cannot distinguish the differences between casual shoes and sports shoes, which limits its ability to learn, hence, resulting in lower f1-scores. Visual ambiguity can be attributed to both subjective and objective causes. Subjective causes might arise from incorrect master data while subjective causes may be rooted in the design nature, where products share a similar appearance but serve different functions.
Therefore, to mitigate this issue arising from objective factors, it is crucial to ensure the integrity of the input data.
The image similarity model will experiment with epochs ranging from 5 to 20 to explore various scenarios.
The best model we have will be the one with 50% accuracy as no superior alternatives have been identified thus far.
-
Check and clean master data:
- As a standard practice, prior to utilizing data for input into the model, it is crucial to engage the relevant departments and obtain their confirmation on the integrity of their respective working data. This verification process serves to ensure both model accuracy and the delivery of precise outputs to customers.
-
Marketing the new image search feature as a competitive advantage against competitors
- Make media noise to let every user know that our company has this new feature in order to drive traffic to our website, then ultimately increase sales.
- Additionally, as we encourage more customers to use the new feature, the more data we will collect for our further analysis.
-
Utilize image data for buying references. The buying team can utilize image search analysis for buying references to leverage sales, especially from trendy opportunities.
Here comes what our new feature will look like: After the customer uploads their target image, the website will classify which type of product types he or she is looking for and return the top 5 similar products.
This is the video link for the deployment: https://www.loom.com/share/08c684695d564f699cb4f975c815f8b2?sid=727db1e3-32b1-41cb-a2c4-bcf0a82a62cb
Also, below is the snapshot of the deployment.
-
Due to the limited timeline and computational capacity, my models will run a maximum of 50 epochs. However, training time was significantly extended, with the longest training model taking more than 3 days.
-
Data have classes imbalance
-
Siamese model's accuracy achieved only 50%
-
Improve the Siamese model's accuracy
-
Apply image classification and image similarity for all product types