-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: streaming decoder (fast DS_IntermediateDecode calls) #1837
Comments
Is the vad_transcriber insufficient? If so, why? |
@kdavis-mozilla I think @Chidhambararajan is referring to the ability of being able to get the transcription as soon as we get enough accumulated logits to do so. Related to #1757 |
I think it all boils down to the fact that the decoding step is not yet streamable |
@Chidhambararajan That being said, we already have streaming for the audio feeding, and on desktop with decent CPU or a GPU it should be faster than realtime, as well as on mid-range Android smartphone with TFLite quantized model. So you can build realtime transcription, not perfectly yet, and it should be more perfect once we have streaming decoder (soon). |
Currently, the decoder we use ( DeepSpeech/native_client/ctcdecode/ctc_beam_search_decoder.cpp Lines 26 to 46 in a4b35d2
Which returns a decoder state struct which contains all of the variables needed for the main loop. Then eventually you feed a batch of probabilities into the decoder with a DeepSpeech/native_client/ctcdecode/ctc_beam_search_decoder.cpp Lines 48 to 121 in a4b35d2
And then finally you'd have a DeepSpeech/native_client/ctcdecode/ctc_beam_search_decoder.cpp Lines 124 to 175 in a4b35d2
This step could then be called from |
In the end, here's how the API would be used:
|
I refactored all of this and got it working. Just need to tidy it up a bit and then I'll post a PR. |
See PR #2121. |
This has now been merged with master so I'll close this issue. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Can the streaming recognition service be added to deep speech client, cause currently an audio file is recorded and later it is transcribed by the engine . However most of the big STT services provide a feature of streaming realtime audio from mic and getting back results simultaneously.. That feature will in fact give a boost to the applications of this project for realtime recognition.
The text was updated successfully, but these errors were encountered: