A pre-trained language model for sequence-to-sequence learning with a novel self-supervised objective called future n-gram prediction.
-
CNN daily mail validation data, NVIDIA-V100-16GB
BatchSize 32 64 128 prophetnet (fs 0.9.0) 2.4 samples/s 2.8 samples/s OOM above + fastseq 6.1 samples/s 9.1 samples/s 11.9 samples/s
ProphetNet-large-160GB (fine-tuned on CNN/Daily Mail with 9 epochs) link
CNN/DM validation data
$ fastseq-generate-for-fairseq \
cnn_dm_bert/len-512.bin \
--path prophetnet/model.pt \
--fp16 \
--task translation_prophetnet \
--batch-size BATCH_SIZE \
--beam 4 \
--num-workers 4 \
--min-len 55 \
--max-len-b 140 \
--no-repeat-ngram-size 3 \
--lenpen 2.0 \
--remove-bpe \
--gen-subset valid \
To get baseline speed number which doesn't use FastSeq optimizations, replace fastseq-generate-for-fairseq
by fairseq-generate
.
Refer to file.
bash generate_binary_data_for_prophetnet.sh INPUT_DATA_DIR
-
CNN daily mail validation data, NVIDIA-V100-16GB
BatchSize 32 64 128 transformers-4.12.0 2.8 samples/s 3.2 samples/s 3.4 samples/s above + fastseq 4.4 samples/s 5.6 samples/s 6.3 samples/s
microsoft/prophetnet-large-uncased
from model hub.
CNN/DM validation data
$ fastseq-generate-for-transformers \
microsoft/prophetnet-large-uncased \
cnn_dm_bert/raw/val.source \
out.summary \
--reference_path cnn_dm_bert/raw/val.target \
--device cuda \
--bs BATCH_SIZE \
--fp16 \
--score_path out.score \
--task summarization \
--no_repeat_ngram_size 3
Baseline speed number is obtained by running Transformers v4.12.0 code.
Refer to file.