Skip to content
This repository has been archived by the owner on Oct 13, 2022. It is now read-only.

Use pre-trained CTC model for conformer training. #174

Merged
merged 1 commit into from
Apr 22, 2021

Conversation

csukuangfj
Copy link
Collaborator

It improves converging speed.

This pull-request is just a refactoring of danpovey's work.

The following is a screenshot comparing the obj with and without ali_model.

You can see that the objf with ali_model (using this pull-request) is lower.

Screen Shot 2021-04-22 at 5 34 01 PM

@csukuangfj
Copy link
Collaborator Author

The pre-trained CTC models can be found at
https://drive.google.com/drive/folders/1MnqRzD_OU_aLKlkg44kOgkczK4xOkJ_M?usp=sharing

They can be used for reproducibility.

@danpovey
Copy link
Contributor

LGTM!

@csukuangfj csukuangfj merged commit 7e02d92 into k2-fsa:master Apr 22, 2021
@csukuangfj csukuangfj deleted the ali branch April 22, 2021 11:02
@pzelasko
Copy link
Collaborator

How does it affect the speed of convergence and WER?

@csukuangfj
Copy link
Collaborator Author

How does it affect the speed of convergence and WER?

I am running the experiments w/ and w/o ali_model. Will post results here once they are available.

@csukuangfj
Copy link
Collaborator Author

Here is the result with ali_model.
(DDP training with 2 GPUs)

 $ ./mmi_att_transformer_train.py --full-libri=0 --max-duration=300 --ali-model-epoch=7 --world-size=2
$ ./mmi_att_transformer_decode.py
(which uses the whole lattice for LM rescoring with output beam size 8, and averaging over the last 5 epochs)
2021-04-23 15:27:14,472 INFO [common.py:373] [test-clean] %WER 5.78% [3038 / 52576, 542 ins, 179 del, 2317 sub ]
2021-04-23 15:33:31,694 INFO [common.py:373] [test-other] %WER 15.88% [8310 / 52343, 1285 ins, 579 del, 6446 sub ]

The tensorboard log is available at
https://tensorboard.dev/experiment/7hyD4bZLRwuzv2yr7DB3mA/#scalars&_smoothingWeight=0&runSelectionState=eyIuIjp0cnVlfQ%3D%3D


You can compare its global objf with the one from the third try of #145 (which also used 2GPUs, but without ali_model)
https://tensorboard.dev/experiment/LdD8mokDTaWACV8Mbvh5CQ/#scalars&runSelectionState=eyIuIjp0cnVlfQ%3D%3D&_smoothingWeight=0

this pull-request

(Note it uses a lower max-duration, so its number of batches is larger)

You can see that it converges faster.

Screen Shot 2021-04-23 at 3 38 26 PM

from #145

Screen Shot 2021-04-23 at 3 38 38 PM


The valid objf of this pull-request is, however, somewhat higher.

this pull-request

Screen Shot 2021-04-23 at 3 41 24 PM

from #145

Screen Shot 2021-04-23 at 3 41 36 PM

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants