Sohu Chinese Image-Text Matching Competition 2017. We won the third prize.
Please clone the Chinese word-segmentation tools into ./NLP
And clone a caffe repo into ./caffe
Download a pre-trained VGG-16 model trained on ImageNet with caffe.
tensorflow 1.0
Image -> vgg16 -> 4096-d feature -> fc1 -> fc2 -> distance(image, text)
Text TF-IDF or GMM feature -> fc1 -> fc2 -> distance(image, text)
《Learning two-branch neural networks for image-text matching tasks》
《Fisher Vectors Derived from Hybrid Gaussian-Laplacian Mixture Models for Image Annotation》
- # extract features using VGG-16
Tensorflow/ # train using the tfidf features # train lstm # generate the matching result (top-10) using image/text embeddings.
NLP/ # word segmentation and generate vocabulary # filter nouns and verbs # generate one file .txt based on the word-segmented files and a vocabulary. # compute tfidf cluster based on word2vec (fasttext word2vec is recommended. Search 'fasttext' repo)
model ensemble
OCR (which we did not use)