Use pre-trained CTC model for conformer training. #174

csukuangfj · 2021-04-22T09:54:21Z

It improves converging speed.

This pull-request is just a refactoring of danpovey's work.

The following is a screenshot comparing the obj with and without ali_model.

You can see that the objf with ali_model (using this pull-request) is lower.

It improves converging speed.

csukuangfj · 2021-04-22T10:29:56Z

The pre-trained CTC models can be found at
https://drive.google.com/drive/folders/1MnqRzD_OU_aLKlkg44kOgkczK4xOkJ_M?usp=sharing

They can be used for reproducibility.

danpovey · 2021-04-22T10:53:06Z

LGTM!

pzelasko · 2021-04-23T00:35:02Z

How does it affect the speed of convergence and WER?

csukuangfj · 2021-04-23T01:52:27Z

How does it affect the speed of convergence and WER?

I am running the experiments w/ and w/o ali_model. Will post results here once they are available.

csukuangfj · 2021-04-23T07:45:03Z

Here is the result with ali_model.
(DDP training with 2 GPUs)

 $ ./mmi_att_transformer_train.py --full-libri=0 --max-duration=300 --ali-model-epoch=7 --world-size=2

$ ./mmi_att_transformer_decode.py
(which uses the whole lattice for LM rescoring with output beam size 8, and averaging over the last 5 epochs)
2021-04-23 15:27:14,472 INFO [common.py:373] [test-clean] %WER 5.78% [3038 / 52576, 542 ins, 179 del, 2317 sub ]
2021-04-23 15:33:31,694 INFO [common.py:373] [test-other] %WER 15.88% [8310 / 52343, 1285 ins, 579 del, 6446 sub ]

The tensorboard log is available at
https://tensorboard.dev/experiment/7hyD4bZLRwuzv2yr7DB3mA/#scalars&_smoothingWeight=0&runSelectionState=eyIuIjp0cnVlfQ%3D%3D

You can compare its global objf with the one from the third try of #145 (which also used 2GPUs, but without ali_model)
https://tensorboard.dev/experiment/LdD8mokDTaWACV8Mbvh5CQ/#scalars&runSelectionState=eyIuIjp0cnVlfQ%3D%3D&_smoothingWeight=0

this pull-request

(Note it uses a lower max-duration, so its number of batches is larger)

You can see that it converges faster.

from #145

The valid objf of this pull-request is, however, somewhat higher.

this pull-request

from #145

Use pre-trained CTC model for conformer training.

10c116d

It improves converging speed.

csukuangfj mentioned this pull request Apr 22, 2021

Convergence & decoding problem #170

Open

csukuangfj merged commit 7e02d92 into k2-fsa:master Apr 22, 2021

csukuangfj deleted the ali branch April 22, 2021 11:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use pre-trained CTC model for conformer training. #174

Use pre-trained CTC model for conformer training. #174

csukuangfj commented Apr 22, 2021

csukuangfj commented Apr 22, 2021

danpovey commented Apr 22, 2021

pzelasko commented Apr 23, 2021

csukuangfj commented Apr 23, 2021

csukuangfj commented Apr 23, 2021

Use pre-trained CTC model for conformer training. #174

Use pre-trained CTC model for conformer training. #174

Conversation

csukuangfj commented Apr 22, 2021

csukuangfj commented Apr 22, 2021

danpovey commented Apr 22, 2021

pzelasko commented Apr 23, 2021

csukuangfj commented Apr 23, 2021

csukuangfj commented Apr 23, 2021

this pull-request

from #145

this pull-request

from #145