Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite generate_package.py in C++ to avoid training dependencies #3113

Merged
merged 9 commits into from
Jul 2, 2020
2 changes: 1 addition & 1 deletion data/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ Language-Specific Data

This directory contains language-specific data files. Most importantly, you will find here:

1. A list of unique characters for the target language (e.g. English) in ``data/alphabet.txt``
1. A list of unique characters for the target language (e.g. English) in ``data/alphabet.txt``. After installing the training code, you can check ``python -m deepspeech_training.util.check_characters --help`` for a tool that creates an alphabet file from a list of training CSV files.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


2. A scorer package (``data/lm/kenlm.scorer``) generated with ``generate_scorer_package`` (``native_client/generate_scorer_package.cpp``). The scorer package includes a binary n-gram language model generated with ``data/lm/generate_lm.py``.

Expand Down
2 changes: 1 addition & 1 deletion doc/Scorer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ Afterwards you can use ``generate_scorer_package`` to generate the scorer packag

cd data/lm
# Download and extract appropriate native_client package:
curl -LO ...
curl -LO http://github.com/mozilla/DeepSpeech/releases/...
tar xvf native_client.*.tar.xz
./generate_scorer_package --alphabet ../alphabet.txt --lm lm.binary --vocab vocab-500000.txt \
--package kenlm.scorer --default_alpha 0.931289039105002 --default_beta 1.1834137581510284
Expand Down
36 changes: 18 additions & 18 deletions taskcluster/.shared.yml
Original file line number Diff line number Diff line change
Expand Up @@ -142,32 +142,32 @@ system:
namespace: "project.deepspeech.swig.win.amd64.b5fea54d39832d1d132d7dd921b69c0c2c9d5118"
tensorflow:
linux_amd64_cpu:
url: "https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.tensorflow.pip.r2.2-submodule.347767452d19a45a6aeb3694e54adce4d945634a.1.cpu/artifacts/public/home.tar.xz"
namespace: "project.deepspeech.tensorflow.pip.r2.2-submodule.347767452d19a45a6aeb3694e54adce4d945634a.1.cpu"
url: "https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.tensorflow.pip.r2.2.518c1d04bf55d362bb11e973b8f5d0aa3e5bf44d.0.cpu/artifacts/public/home.tar.xz"
namespace: "project.deepspeech.tensorflow.pip.r2.2.518c1d04bf55d362bb11e973b8f5d0aa3e5bf44d.0.cpu"
linux_amd64_cuda:
url: "https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.tensorflow.pip.r2.2-submodule.347767452d19a45a6aeb3694e54adce4d945634a.1.cuda/artifacts/public/home.tar.xz"
namespace: "project.deepspeech.tensorflow.pip.r2.2-submodule.347767452d19a45a6aeb3694e54adce4d945634a.1.cuda"
url: "https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.tensorflow.pip.r2.2.518c1d04bf55d362bb11e973b8f5d0aa3e5bf44d.0.cuda/artifacts/public/home.tar.xz"
namespace: "project.deepspeech.tensorflow.pip.r2.2.518c1d04bf55d362bb11e973b8f5d0aa3e5bf44d.0.cuda"
linux_armv7:
url: "https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.tensorflow.pip.r2.2-submodule.347767452d19a45a6aeb3694e54adce4d945634a.1.arm/artifacts/public/home.tar.xz"
namespace: "project.deepspeech.tensorflow.pip.r2.2-submodule.347767452d19a45a6aeb3694e54adce4d945634a.1.arm"
url: "https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.tensorflow.pip.r2.2.518c1d04bf55d362bb11e973b8f5d0aa3e5bf44d.0.arm/artifacts/public/home.tar.xz"
namespace: "project.deepspeech.tensorflow.pip.r2.2.518c1d04bf55d362bb11e973b8f5d0aa3e5bf44d.0.arm"
linux_arm64:
url: "https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.tensorflow.pip.r2.2-submodule.347767452d19a45a6aeb3694e54adce4d945634a.1.arm64/artifacts/public/home.tar.xz"
namespace: "project.deepspeech.tensorflow.pip.r2.2-submodule.347767452d19a45a6aeb3694e54adce4d945634a.1.arm64"
url: "https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.tensorflow.pip.r2.2.518c1d04bf55d362bb11e973b8f5d0aa3e5bf44d.0.arm64/artifacts/public/home.tar.xz"
namespace: "project.deepspeech.tensorflow.pip.r2.2.518c1d04bf55d362bb11e973b8f5d0aa3e5bf44d.0.arm64"
darwin_amd64:
url: "https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.tensorflow.pip.r2.2-submodule.347767452d19a45a6aeb3694e54adce4d945634a.1.osx/artifacts/public/home.tar.xz"
namespace: "project.deepspeech.tensorflow.pip.r2.2-submodule.347767452d19a45a6aeb3694e54adce4d945634a.1.osx"
url: "https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.tensorflow.pip.r2.2.518c1d04bf55d362bb11e973b8f5d0aa3e5bf44d.0.osx/artifacts/public/home.tar.xz"
namespace: "project.deepspeech.tensorflow.pip.r2.2.518c1d04bf55d362bb11e973b8f5d0aa3e5bf44d.0.osx"
android_arm64:
url: "https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.tensorflow.pip.r2.2-submodule.347767452d19a45a6aeb3694e54adce4d945634a.1.android-arm64/artifacts/public/home.tar.xz"
namespace: "project.deepspeech.tensorflow.pip.r2.2-submodule.347767452d19a45a6aeb3694e54adce4d945634a.1.android-arm64"
url: "https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.tensorflow.pip.r2.2.518c1d04bf55d362bb11e973b8f5d0aa3e5bf44d.0.android-arm64/artifacts/public/home.tar.xz"
namespace: "project.deepspeech.tensorflow.pip.r2.2.518c1d04bf55d362bb11e973b8f5d0aa3e5bf44d.0.android-arm64"
android_armv7:
url: "https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.tensorflow.pip.r2.2-submodule.347767452d19a45a6aeb3694e54adce4d945634a.1.android-armv7/artifacts/public/home.tar.xz"
namespace: "project.deepspeech.tensorflow.pip.r2.2-submodule.347767452d19a45a6aeb3694e54adce4d945634a.1.android-armv7"
url: "https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.tensorflow.pip.r2.2.518c1d04bf55d362bb11e973b8f5d0aa3e5bf44d.0.android-armv7/artifacts/public/home.tar.xz"
namespace: "project.deepspeech.tensorflow.pip.r2.2.518c1d04bf55d362bb11e973b8f5d0aa3e5bf44d.0.android-armv7"
win_amd64_cpu:
url: "https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.tensorflow.pip.r2.2-submodule.347767452d19a45a6aeb3694e54adce4d945634a.1.win/artifacts/public/home.tar.xz"
namespace: "project.deepspeech.tensorflow.pip.r2.2-submodule.347767452d19a45a6aeb3694e54adce4d945634a.1.win"
url: "https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.tensorflow.pip.r2.2.518c1d04bf55d362bb11e973b8f5d0aa3e5bf44d.0.win/artifacts/public/home.tar.xz"
namespace: "project.deepspeech.tensorflow.pip.r2.2.518c1d04bf55d362bb11e973b8f5d0aa3e5bf44d.0.win"
win_amd64_cuda:
url: "https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.tensorflow.pip.r2.2-submodule.347767452d19a45a6aeb3694e54adce4d945634a.1.win-cuda/artifacts/public/home.tar.xz"
namespace: "project.deepspeech.tensorflow.pip.r2.2-submodule.347767452d19a45a6aeb3694e54adce4d945634a.1.win-cuda"
url: "https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.tensorflow.pip.r2.2.518c1d04bf55d362bb11e973b8f5d0aa3e5bf44d.0.win-cuda/artifacts/public/home.tar.xz"
namespace: "project.deepspeech.tensorflow.pip.r2.2.518c1d04bf55d362bb11e973b8f5d0aa3e5bf44d.0.win-cuda"
username: 'build-user'
homedir:
linux: '/home/build-user'
Expand Down