Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] Support for batch transcription #26

Open
G-Thor opened this issue Aug 23, 2022 · 2 comments
Open

[feature request] Support for batch transcription #26

G-Thor opened this issue Aug 23, 2022 · 2 comments

Comments

@G-Thor
Copy link
Contributor

G-Thor commented Aug 23, 2022

The current state of this package does not offer batch transcription, with sentences being passed through one at a time.
This would be completely fine but it seems like some dependencies, such as IceNLP are being intialized on every call.
This severely slows down transcription, making batch transcriptions (in the range of thousands of utterances) very time consuming.

It would be nice to be able to pass in a list of strings and get a list of lists of Tokens, with matching indices, for example.

@bnika
Copy link
Contributor

bnika commented Aug 23, 2022

We have it on our issue list to deamonize IceNLP, the current implementation slows the process down indeed. A temporary solution would be to set phrasing=False in textprocessing_manager.transcribe() if you can do without it, IceNLP is only used in the phrasing step.
The LSTM-model is also too slow in general, we need to do some research in that part as well.

@G-Thor
Copy link
Contributor Author

G-Thor commented Aug 24, 2022

Unfortunately, it's the phrasing part I was most interested in applying, since my data is already normalized and I have other options for G2P.
Perhaps I'll just use a more naïve phrasing approach while the issues with this dependency get worked out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants