-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Silero STT models #378
Conversation
@vinitra @abhinavs95 @autoih My name is Alexander, I am with Silero, we are a small independent self-financed company making speech related products. Note that I took some liberty with your submission template for a number of reasons:
Also please note that despite the fact that speech-to-text has a long history of over-fitting to LibriSpeech, we follow a radically different approach of actually tracking real life metrics of our models by benchmarking our models on a huge variety of different domains. This has some consequences for including val datatasets into the model package. |
Signed-off-by: snakers41 <aveysov@gmail.com>
@snakers4 , thanks for sharing these nice speech models to the community. Is it possible to check in these models into the model zoo instead of other hosts? |
Hi, Do you mean uploading to git-lfs in this repo? The reason why I opted for such versioning / hosting is threefold:
|
+1 for what @wenbingl said. We recently moved to a centralized way of hosting models and it is best that we keep it that way. Also the CIs assume the models are uploaded to git-lfs and therefore wont run any check for your models. You can always remove the older versions of the models from the zoo when you update the models. |
I see. |
@snakers4 do you have test projects? - as provided by ASR/TTS ONNX models |
Hi @GeorgeS2019 |
@snakers4 Do you have test projects as in the case of Nvidia ASR ONNX listed above? |
We have this - https://github.com/snakers4/silero-models |
@snakers4 Thx for sharing. Do check up and follow the links I share. Eventually this will bring to one of the largest 3D Game community, seeking STT and TTS solutions. |
Silero Speech To Text
Description
Silero Speech-To-Text models provide enterprise grade STT in a compact form-factor for several commonly spoken languages. Unlike conventional ASR models our models are robust to a variety of dialects, codecs, domains, noises, lower sampling rates (for simplicity audio should be resampled to 16 kHz). The models consume a normalized audio in the form of samples (i.e. without any pre-processing except for normalization to -1 … 1) and output frames with token probabilities. We provide a decoder utility for simplicity (we could include it into our model itself, but it is hard to do with ONNX for example).
We hope that our efforts with Open-STT and Silero Models will bring the ImageNet moment in speech closer.
Use Cases
Transcribing speech into text. Please see detailed benchmarks for various domains here.
Model
Please note that models are downloaded automatically with the utils provided below.
Source
Original implementation in PyTorch => simplification => TorchScript => ONNX.
Inference
We try to simplify starter scripts as much as possible using handy torch.hub utilities.
Dataset (Train)
Not disclosed by model authors.
Validation
We have performed a vast variety of benchmarks on different publicly available validation datasets. Please see benchmarks here. We neither own these datasets nor we provide mirrors for them or re-upload them for legal reasons.
It is customary for English STT models to report metrics on Librispeech. Please beware though that these metrics have very little in common with real life / production metrics and with model generalization (see here, and here section "Sample Inefficient Overparameterized Networks Trained on "Small" Academic Datasets"). Hence we report metrics compared to a premium Google STT API (heavily abridged).
EN V1
Please see benchmarks here for more details.
References
Contributors
Alexander Veysov together with Silero AI Team.
License
AGPL-3.0 License