This repository contains code for our paper on analyzing speech representations in end-to-end automatic speech recognition models:
"Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems", Yonatan Belinkov and James Glass, NIPS 2017.
- First prepare a dataset in LMDB format according to the instructions in deepspeech.torch. We provide a custom
MakeLMDBTimes.lua
file to process a dataset with time segmentation such as TIMIT. - Run
train.lua
with the following arguments:
loadPath
: DeepSpeech-2 model trained with deepspeech.torchtrainingSetLMDBPath
,validationSetLMDBPath
,testSetLMDBPath
: top folders for the LMDB training/validation/test setsreprLayer
: representation layer name (input, cnn1, cnn2, rnn1, rnn2, etc.)predFile
: file to save predictions
See train.lua
for more options, such as controlling convolution strides, using a window of features around the frame or predicting phone classes.
If you use this code, please consider citing our paper:
@InProceedings{belinkov:2017:nips,
author = {Belinkov, Yonatan and Glass, James},
title = {Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems},
booktitle = {Advances in Neural Information Processing Systems (NIPS)},
month = {December},
year = {2017}
}
This project uses code from deepspeech.torch.