Skip to content

Muennighoff/vilio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


🥶VILIO🥶


Build GitHub release Transformers Documentation Contributor Covenant

State-of-the-art Visio-Linguistic Models 🥶

Updates

06/2021 - Hateful Memes CSV Files

  • The CSV files that were used for the scores in the vilio paper are now available here

06/2021 - Inference on any meme

Ordering

Vilio aims to replicate the organization of huggingface's transformer repo at: https://github.com/huggingface/transformers

  • /bash Shell files to reproduce hateful memes results

  • /data By default, directory for loading in data & saving checkpoints

  • /ernie-vil Ernie-vil sub-repository written in PaddlePaddle

  • /fts_lmdb Scripts for handling .lmdb extracted features

  • /fts_tsv Scripts for handling .tsv extracted features

  • /notebooks Jupyter Notebooks for demonstration & reproducibility

  • /py-bottm-up-attention Sub-repository for tsv feature extraction forked & adapted from here

  • src/vilio All implemented models (also see below for a quick overview of models)

  • /utils Pandas & ensembling scripts for data handling

  • entry.py files Scripts used to access the models and apply model-specific data preparation

  • pretrain.py files Same purpose as entry files, but for pre-training; Point of entry for pre-training

  • hm.py Training code for the hateful memes challenge; Main point of entry

  • param.py Args for running hm.py

Usage

Follow SCORE_REPRO.md for reproducing performance on the Hateful Memes Task.
Follow GETTING_STARTED.md for using the framework for your own task.
See the paper at: https://arxiv.org/abs/2012.07788

Architectures

🥶 Vilio currently provides the following architectures with the outlined language transformers:

  1. E - ERNIE-VIL ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph
  2. D - DeVLBERT DeVLBert: Learning Deconfounded Visio-Linguistic Representations
  3. O - OSCAR Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
  4. U - UNITER UNITER: UNiversal Image-TExt Representation Learning
  5. V - VisualBERT VisualBERT: A Simple and Performant Baseline for Vision and Language
  6. X - LXMERT LXMERT: Learning Cross-Modality Encoder Representations from Transformers

To-do's

  • Clean-up import statements, python paths & find a better way to integrate transformers (Right now all import statements only work if in main folder)
  • Enable loading and running models just via import statements (and not having to clone the repo)
  • Find a way to better include ERNIE-VIL in this repo (PaddlePaddle to Torch?)
  • Move tokenization in entry files to model-specific tokenization similar to transformers

Attributions

The code heavily borrows from the following repositories, thanks for their great work:

Citation

@article{muennighoff2020vilio,
  title={Vilio: State-of-the-art visio-linguistic models applied to hateful memes},
  author={Muennighoff, Niklas},
  journal={arXiv preprint arXiv:2012.07788},
  year={2020}
}