Evaluation codes for MS COCO caption generation.
- java 1.8.0
- python (tested 2.7/3.6)
./
- cocoEvalCapDemo.py (demo script)
./annotation
- captions_val2014.json (MS COCO 2014 caption validation set)
- Visit MS COCO download page for more details.
./results
- captions_val2014_fakecap_results.json (an example of fake results for running demo)
- Visit MS COCO format page for more details.
./pycocoevalcap: The folder where all evaluation codes are stored.
- evals.py: The file includes COCOEavlCap class that can be used to evaluate results on COCO.
- tokenizer: Python wrapper of Stanford CoreNLP PTBTokenizer
- bleu: Bleu evalutation codes
- meteor: Meteor evaluation codes
- rouge: Rouge-L evaluation codes
- cider: CIDEr evaluation codes
- spice: SPICE evaluation codes
- You will first need to download the Stanford CoreNLP 3.6.0 code and models for use by SPICE. To do this, run: ./get_stanford_models.sh
Alternatively, consider using Pip (which automatically handles getting Stanford Core NLP and models):
pip install git+https://github.com/flauted/coco-caption.git@python23
- SPICE will try to create a cache of parsed sentences in ./pycocoevalcap/spice/cache/. This dramatically speeds up repeated evaluations.
- Without altering this code, use the environment variables
SPICE_CACHE_DIR
andSPICE_TEMP_DIR
to set the cache directory. - The cache should NOT be on an NFS mount.
- Caching can be disabled by editing the
pycocoevalcap/spice/spice.py
file. Remove the-cache
argument tospice_cmd
.
- Without altering this code, use the environment variables
- Microsoft COCO Captions: Data Collection and Evaluation Server
- PTBTokenizer: We use the Stanford Tokenizer which is included in Stanford CoreNLP 3.4.1.
- BLEU: BLEU: a Method for Automatic Evaluation of Machine Translation
- Meteor: Project page with related publications. We use the latest version (1.5) of the Code. Changes have been made to the source code to properly aggreate the statistics for the entire corpus.
- Rouge-L: ROUGE: A Package for Automatic Evaluation of Summaries
- CIDEr: CIDEr: Consensus-based Image Description Evaluation
- SPICE: SPICE: Semantic Propositional Image Caption Evaluation
- Xinlei Chen (CMU)
- Hao Fang (University of Washington)
- Tsung-Yi Lin (Cornell)
- Ramakrishna Vedantam (Virgina Tech)
- David Chiang (University of Norte Dame)
- Michael Denkowski (CMU)
- Alexander Rush (Harvard University)