Skip to content

scarecrow1123/allennlp-distributed-training

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 

Repository files navigation

allennlp-distributed-training

This repo holds a few example AllenNLP experiments modified to run with DistributedDataParallel support. The training_config directory has two versions of the same set of experiments. The ones in distributed_data_parallel directory mostly differs with the dataset readers. The dataset readers are replicas of the original ones in AllenNLP, with a minor modification to support distributed sampling.

To run the distributed experiments install AllenNLP:

conda create -n allennlp_distributed python=3.7
conda activate allennlp_distributed
git clone https://github.com/allenai/allennlp
cd allennlp
pip install .

And run:

allennlp train training_config/distributed_data_parallel/esim.jsonnet --include-package distributed-training -s output/

To run without distributed setup, do the usual AllenNLP installation and use experiments in training_config/ data_parallel/

Speed Comparison: Time taken to train one epoch (averaged over 3 epochs)

GPU - 2080 Ti

NOTE: The time reported does not correspond to the training_duration metric. This is the time taken by the Trainer._train_epoch method.

Experiment Single GPU 2x Data Parallel 2x Distributed 4x Data Parallel 4x Distributed
esim.jsonnet (400K SNLI samples) 4m 15s NA NA 4m 30s 2m 13s
bidaf.jsonnet 5m 44s NA NA 4m 10s 2m 5s

About

Experiments to benchmark AllenNLP distributed training.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published