Fung Group - HACK THE RACK 2018 (22-24 Jun 2018)
Hack The Rack 2018 was an open competition joining undergraduates with industry experts such as business professionals, talented designers, data scientists and software developers. It was unquestionably one of the most challenging hackathons we have taken part in. Thank Fung Group so much for providing a marvelous platform to allow us to meet specialists from different backgrounds. To be honest, we enjoyed a lot throughout the event and met a lot of great friends.

The story behind the challenge:
Li & Fung has a huge catalog of products and an even larger repository of product images, but different APIs are used to tag images and the tagging is done in different ways. The major needs are to humanize the image catalog searching process in order to improve product search capabilities.
The task is intentionally made to process the text data on the tagging of 25,000+ images provided by Li & Fung so as to make a search engine of their products more easily and effectively with the goal of Speed, Innovation and Digitalization.
- Objectives: To improve product search capabilities by
- Consolidating a unique tagging list
- Enabling synonyms search for tags
- Allowing merchandisers to add or correct tags on images
Remarks:
We consider building an algorithm for an instant auto-tagging system by taking pictures and have a real-time tagging model. However, after communicating with the Li & Fung related department staff and listening their voices, we shift our focuses on the natural language processing of tag labels instead of image tagging due to the fact that image recognition and labeling are not actually the pain point while the main concern for them is to handle the difference of tagging labels collected from various API results. As Li & Fung already have real-time and accurate-acceptable tagging APIs, it has no reason for us to duplicate the work and spending time to do the tagging again. The more important part is utilizing the existing tagging and generating a search engine that can make use of all the existing tag labels. Thus, our main focus is on the text mining part of the tag labels.
Analysis & Solutions:
In order to cater to the "real" challenge, we drill down into two main focuses.
- Part 1: To clean the input datasets from two different APIs and consolidate into an Image_Tag master dataset
- Part 2: To leverage the pre-trained neural network model to enhance the customer search experience
Data Exploration
There are two kinds of data formats received from external companies - JSON and CSV. However, they are merged into the same CSV file as shown above. It is fine and easy to extract features and do further analysis on the CSV based data. However, the JSON data are displayed as a string inside a cell which is hard to deal with. In order to facilitate the demonstration, we only focus on how we crack on the JSON-wise dataset.
Text Preprocessing
We have explored and tried different text cleansing techniques such as tokenizing the phrases into uni-gram or bi-gram and building the sequence padding. Nonetheless, it comes to a critical problem that there are many irrelevant or useless tag labels that will be derived.
We finally come up with a simple but strongly effective idea with a few steps:
- Create a specific stopword list including the JSON attributes beforehand
- Format all words into small letters
- Remove the stopwords
- Observe the remaining string pattern and output the tag labels
Remarks:
We would not do any further text cleansing process on the output tag labels based on the following two reasons:
- We want to keep the original tag format that Li & Fung uses
- We have leveraged a synonym model that caters the problem
A master dataset was then generated which has a nice format that we want as mentioned in the challenge description part.
For more details: https://github.com/cydalytics/Image_Tag_Processing
All in all, we have built a search platform that caters to the need of retailers to look for suitable merchandisers, inspires designers with the latest fashion trend as well as the past famous fashion items and provides multi and instant accessibility for users with different devices. All these fasten the decision-making process and at the same time, reduce the cost of searching and improve the customer experience in searching by offering a more humanized search engine.
To make all these happen, we try our whole bag of tricks and here are some software we have leveraged.
Prototype Design:
Lastly, we would like to take this chance to appraise our perfect teammates. They worked really hard and did a wonderful job. We hope to meet more new friends and learn more in the next competition. =)
Comments
Post a Comment