This repo holds a few example AllenNLP experiments modified to run with DistributedDataParallel
support.
The training_config
directory has two versions of the same set of experiments. The ones in distributed_data_parallel
directory mostly differs with the dataset readers. The dataset readers are replicas of the original ones in AllenNLP,
with a minor modification to support distributed sampling.
To run the distributed experiments install AllenNLP:
conda create -n allennlp_distributed python=3.7
conda activate allennlp_distributed
git clone https://github.com/allenai/allennlp
cd allennlp
pip install .
And run:
allennlp train training_config/distributed_data_parallel/esim.jsonnet --include-package distributed-training -s output/
To run without distributed setup, do the usual AllenNLP installation and use experiments in training_config/ data_parallel/
GPU - 2080 Ti
NOTE: The time reported does not correspond to the training_duration
metric. This is the time taken by the Trainer._train_epoch
method.
Experiment | Single GPU | 2x Data Parallel | 2x Distributed | 4x Data Parallel | 4x Distributed |
---|---|---|---|---|---|
esim.jsonnet (400K SNLI samples) | 4m 15s | NA | NA | 4m 30s | 2m 13s |
bidaf.jsonnet | 5m 44s | NA | NA | 4m 10s | 2m 5s |