Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changelog #20

Open
snakers4 opened this issue Oct 3, 2020 · 35 comments
Open

Changelog #20

snakers4 opened this issue Oct 3, 2020 · 35 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@snakers4
Copy link
Owner

snakers4 commented Oct 3, 2020

Mirroring changelog
Some important changes, too small for a release

@snakers4 snakers4 added the documentation Improvements or additions to documentation label Oct 3, 2020
@snakers4 snakers4 self-assigned this Oct 3, 2020
Repository owner locked and limited conversation to collaborators Oct 3, 2020
@snakers4
Copy link
Owner Author

snakers4 commented Oct 3, 2020

2020-10-03 Batched ONNX and TF Models

  • Extensively clean up and simplify ONNX and TF model code
  • Add batch support to TF and ONNX models
  • Update examples
  • (pending) Submit new models to TF Hub and update examples there

@snakers4 snakers4 pinned this issue Oct 3, 2020
@Islanna
Copy link
Collaborator

Islanna commented Oct 19, 2020

2020-10-19 Update wiki

@snakers4
Copy link
Owner Author

2020-10-28 Minor PyTorch 1.7 fix

  • torch.hub.load signature was changed

@snakers4
Copy link
Owner Author

snakers4 commented Nov 3, 2020

2020-11-03 English Model V2 Released

  • A minor release, i.e. other models not affected
  • English model was made much more robust to certain dialects
  • Performance metrics coming soon

PS - the model should generalize much better in general

@snakers4
Copy link
Owner Author

snakers4 commented Nov 6, 2020

2020-11-03 [Experimental] Ukrainian Model V1 Released

  • An experimental model
  • Trained from a small community contributed corpus
  • New Full model size reduced to 85 MB
  • New - quantized model is ony 25 MB
  • No TF or ONNX models
  • Will be re-released a fine-tuned model from a larger Russian corpus upon V3 release

@snakers4
Copy link
Owner Author

2020-11-26 Fix TensorFlow Examples

Nasty Google makes their tf.hub utils locked ...

@snakers4
Copy link
Owner Author

snakers4 commented Dec 4, 2020

2020-12-04 Add EE Distro Sizing and New Speed Metrics

@snakers4
Copy link
Owner Author

snakers4 commented Dec 9, 2020

Moved some issues with useful answers to discussions and marked some answers as "solved"

@snakers4 snakers4 changed the title Changelog Mirror Changelog Feb 19, 2021
@snakers4
Copy link
Owner Author

Replaced CDN links with the ordinary links

@snakers4
Copy link
Owner Author

Migrated to our own file hosting in preparation for new releases

@snakers4
Copy link
Owner Author

snakers4 commented Mar 3, 2021

Ukrainian Model V3 Released

  • On a larger corpus (~1000 hours)
  • Fine tuned from a commercial production Russian model
  • Model flavors: jit (CPU or GPU), jit_q (quantized, CPU only), and onnx (ONNX)
  • Huge model speed improvements for CPU inference (roughly 2-3x) compared to the previous one, comparable with new best from here
  • Will be dropping TF support altogether
  • No proper quality benchmarks for an experimental model though

@snakers4
Copy link
Owner Author

snakers4 commented Mar 3, 2021

Added current state into changelog
Added more updates regarding the new ua model

@snakers4
Copy link
Owner Author

2fb61a1

TTS models pre-release
Some doc improvements
Working on V3 model release

@snakers4
Copy link
Owner Author

2021-04-20 Add v3 STT English Models

Huge update for English!

  • Default model (jit or onnx) size is reduced almost by 50% without sacrificing quality (!);
  • New model flavours: jit_q (smaller quantized model), jit_skip (with exposed skip connections), jit_large (higher quality model), onnx_large (!);
  • New smallest model jit_q is only 40M in size (!);
  • Tensorflow checkpoints discontinued;
  • New performance benchmarks - default models are on par with previous models and Google, large models mostly outperform Google (!);
  • Even more quality improvements coming soon (!);
  • CE benchmarks coming soon;
  • xsmall model was created (2x smaller than the default), but I could not quantize it. I am looking into creating a xxsmall model;
  • Still working on making EE models fully JIT-traceable;

@snakers4
Copy link
Owner Author

2021-04-21 Add v3 xsmall STT English Models

  • Polish docs;
  • Add xsmall and xsmall_q model flavours for en_v3;
  • Polish performance benchmarks page a bit;

@snakers4
Copy link
Owner Author

snakers4 commented Jun 3, 2021

Added minimal standalone TTS example

@snakers4
Copy link
Owner Author

snakers4 commented Jun 3, 2021

Added v4_0 large English model, metrics coming soon

@snakers4
Copy link
Owner Author

snakers4 commented Jun 7, 2021

Added v4_0 large English model metrics

@snakers4
Copy link
Owner Author

2021-06-18 Large V2 TTS release, v4_0 Large English STT Model

  • Added v4_0 large English model with metrics;
  • V2 TTS models with x4 faster vocoder;
  • Russian models now feature automatic stress and ё, homonyms are not handled yet;
  • A multi-language multi-speaker model;

@snakers4
Copy link
Owner Author

snakers4 commented Jul 7, 2021

Will also repost here our EE solution changelogs from now on

Silero Models EE, First Numbered Version v1.1 🚀 (Mar 23, 2021)

Bug Fixes 🐛

New Fields ➕

  • New field transcript_denorm - transcribed text without normalization / post-processing;

Distributions 💽

Sizing ⚡

New Environment Variables 🎛️

@snakers4
Copy link
Owner Author

snakers4 commented Jul 7, 2021

Silero Models EE, v1.2 STT Quality Improvements, TTS Release, gRPC, Packaging Improvements

Bug Fixes 🐛

  • Minor post-processing bugs fixed;
  • Collected edge cases were used for quality control;
  • Performance degradation related to batches with audios of very different lengths partially fixed (50-70%);

STT Model Improvements and Simplifications 🚀

  • Model naming simplification;
  • Several internal releases and internal model simplifications;
  • New higher quality STT models - ru_xlarge_v012.model for GPU only and ru_small_v012_q.model for CPU only;
  • The CPU model is quantized (the non-quantized version is not provided to avoid confusion). The quality gap between quantized and original for this new model is negligible;
  • Quality of new xlarge model in line with the bleeding edge model this article;
  • Major library version facelift, PyTorch images based off v1.9 now;
  • Model freeze during initial model warmup and loading, small additional speed boost;
  • LM startup fixed for large number of LM workers. Now the LM file is locked and LMs are launched consequtively instead of a random delay;

Deprecations 🚫

  • xsmall and large models deprecated for simplicity;
  • Legacy post-processing pipeline deprecated in favor of the new one entirely, now there will be only one decoder.py;
  • Because of LM file locking, using old license files with new images may result in slower LM loading for large installations;
  • See this change. To avoid confusion in future, it is advised to use pytransform.so => pytransform.so mounts in future (also please make sure to consult with compatibility table);

STT Model Metrics 💎

All of these metrics are calculated following this article on 1 hour subsets (hence metrics can be a bit different from the historical ones):

АПИ (ru_xlarge_v1_postv2) Bleeding Edge xlarge_v012 small_v012_q
Чтение 7 6 5.8 8.7
Справочная 16 11 10.9 14.6
Такси 13 12 11.6 16.7
Публичные выступления 14 12 12.3 17.4
Радио 18 15 15.7 21.3
Суд 20 20 17.7 22.9
Аудио книги 24 20 20 25.2
Справочная 25 20 21 26.7
Аэропорт 21 22 21.5 27.1
Финансы (оператор) 25 24 21.8 27.5
YouTube 28 25 23.6 30.6
Умная колонка 30 27 25.3 31.9
Умная колонка (далеко) 41 27 27.2 35.3
E-commerce 29 29 28 35.5
Yellow pages 32 29 30 35.9
Диспетческая 41 32 32.2 39.2
Медицинские термины 35 33 32.7 39.7
Банк 39 35 36.3 40.9
Пранки 41 35 36.4 43.8
Стихи, рэп 43 41 46.2 53.1
Average 27.1 23.75 23.81 29.7

TTS Release 🎙️

  • TTS release following these articles - 1, 2;
  • Commercial speaker models available: tts_aidar_v012.pt, tts_baya_v012.pt, tts_kseniya_v012.pt;
  • Automated stress and ё for ~97% of all cases in Russian language;

New Features ➕

  • TTS release;
  • New STT models;
  • New experimental gRPC interface - still requires some testing and polish (WIP - at this moment one VAD param probably should be tuned per-installation and hence added to the Environment);

Distributions and Packaging 💽

  • Docker image security improvements;
  • Several sizes of VAD provided for gRPC;
  • New images v1.2, migration to PyTorch 1.9, library version updates and compatibility testing;

Sizing ⚡

  • New sizing for TTS models;
  • New sizing for gRPC interface;
  • Updated sizing for STT models;

New Environment Variables 🎛️

Please see the respective docs for more detailed information:

  • gRPC:
    • GRPC_PROC_NUM;
    • GRPC_THREAD_NUM;
    • TIME_STOP_WO_SPEECH;
    • TIME_STOP_WO_CHUNKS;
    • GRPC_PORT;
    • VAD_MODEL
  • TTS:
    • HOST_TTS_DISTRO;
    • TTS_BATCH_SIZE;
    • TTS_BATCH_DELAY;
    • CPUSET_TTS;
    • MAX_TTS_TIME;

@snakers4
Copy link
Owner Author

snakers4 commented Aug 9, 2021

2021-08-09 German V3 Large Model

  • German V3 Large jit model trained on more data - large quality improvement;
  • Metrics coming soon;

@snakers4
Copy link
Owner Author

snakers4 commented Sep 3, 2021

2021-09-03 German V4 and English V5 Models

  • German V4 large jit and onnx models;
  • English V5 small (jit and onnx), small_q (only jit) and xlarge (jit and onnx) models;
  • Vast quality improvements (metrics to be added shortly) on the majority of domains;
  • English xsmall models coming soon (jit

@snakers4
Copy link
Owner Author

snakers4 commented Sep 7, 2021

Better progress visualization for English EE models

image

@snakers4
Copy link
Owner Author

snakers4 commented Oct 1, 2021

Quick update - added English V5 quantized ONNX model

@snakers4
Copy link
Owner Author

snakers4 commented Oct 6, 2021

2021-10-06 Text Recapitalization and Repunctuation Model for 4 Languages

  • Inserts capital letters and basic punctuation marks (dot, comma, hyphen, question mark, exclamation mark, dash for Russian);
  • Works for 4 languages (Russian, English, German, Spanish) and can be extended;
  • By design is domain agnostic and is not based on any hard-coded rules;
  • Has non-trivial metrics and succeeds in the task of improving text readability;

@snakers4
Copy link
Owner Author

Quick update - updated list of articles - https://github.com/snakers4/silero-models#further-reading

@snakers4
Copy link
Owner Author

snakers4 commented Dec 9, 2021

2021-12-09 Improved Text Recapitalization and Repunctuation Model for 4 Languages

  • The model now can work with long inputs, 512 tokens or ca. 150 words;
  • Inputs longer than 150 words are automatically processed in chunks;
  • The bugs with newer PyTorch versions have been fixed;
  • Model was trained longer with larger batches;
  • Model size slightly reduced to 85 MB;
  • The rest of model optimizations were deemed too high maintenance;

@snakers4
Copy link
Owner Author

snakers4 commented Jun 6, 2022

2022-02-24 English V6 Release

  • New en_v6 models;
  • Quality improvements for English models;

@snakers4
Copy link
Owner Author

snakers4 commented Jun 6, 2022

2022-02-28 Experimental Pip Package

  • Models are downloaded on demand both by pip and PyTorch Hub;
  • If you need caching, do it manually or via invoking a necessary model once (it will be downloaded to a cache folder);
  • Please see these docs for more information;
  • PyTorch Hub and pip package are based on the same code. Hence all examples, historically based on torch.hub.load can be used with a pip-package;

@snakers4
Copy link
Owner Author

snakers4 commented Jun 6, 2022

2022-04-12 Silero TTS in High Resolution, 10x Faster and More Stable

  • Huge release - Russian only for now;
  • Model size reduced 2x;
  • New models are 10x faster;
  • We added flags to control stress;
  • Now the models can make proper pauses;
  • High quality voice added (and unlimited "random" voices);
  • All speakers squeezed into the same model;
  • Input length limitations lifted, now models can work with paragraphs of text;
  • Pauses, speed and pitch can be controlled via SSML;
  • Sampling rates of 8, 24 or 48 kHz are supported;
  • Models are much more stable — they do not omit words anymore;

@snakers4
Copy link
Owner Author

snakers4 commented Jun 6, 2022

2022-06-06 Silero TTS in 20 Languages With 174 Speakers

  • Huge release - 20 languages, 173 voices;
  • 1 new high quality Russian voice (eugene);
  • The CIS languages: Kalmyk, Russian, Tatar, Uzbek и Ukrainian;
  • Romance and Germanic languages: English, Indic English, Spanish, German, French;
  • 10 Indic languages;
  • Russian automated stress model vastly improved (please see this link for details);
  • All models inherit all of the previous SSML perks;

@snakers4
Copy link
Owner Author

snakers4 commented Jun 6, 2022

Forgot about maintaining this

@snakers4
Copy link
Owner Author

snakers4 commented Jun 6, 2022

Updated the pip package

@snakers4
Copy link
Owner Author

2023-08-17 New Cyrillic Model, 4x Faster for v4 Models

  • New v4 models uploaded: v4_ru, v4_uz, v4_ua, v4_indic, v4_cyrillic;
  • All v4 models are 3-4x faster than v3 models;
  • All v4 models support SSML;
  • 8 kHz audio quality boost;
  • New Cyrillic models for 22 languages with 31 voices;
  • Tatar and Kalmyk models merged into the Cyrillic model;

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants