Replies: 1 comment
-
Not looking for exactly this, but just a transcription using something like https://github.com/ggerganov/whisper.cpp/tree/master/examples/whisper.android locally. This could then be synchronized with the audiobook. Created a separate issue for lyrics support at #2275 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
This is a more involved idea than most others but I think it'd be worth it and on the surface it doesn't seem too insane. It'd work like this:
You're listening to a book. You pause at the 12h 37m 21s position in the track and click on "show in book".
Voice looks at the audiobook's directory and finds the only epub file in there.
It takes the timestamp you're at and splices a +/- 10 seconds segment of your audio file. Ffmpeg can seek fast and cut this super quickly on a 5+ year old Android.
It then transcribes it. Either using the native API of your phone (e.g. on a Pixel) or through an API call to a self hosted Whisper API or something. Some expected response schema could make it so that the server can be upgradable, as long as it responds as defined.
Voice then normalized the transcription by removing all the whitespace.
We also do the same to the epub's content, so with a simple string lookup, we can find the non-whitespace index of the transcription in the book file and then add back the whitespaces and get the index of the string in the book. Could possibly do a whitespace-agnostic search too and skip this and the last step.
Voice then opens the epub to that character index, allowing the user to highlight the text in an epub-native way.
We can then extract our highlights from the epub file later.
This is probably way out of scope for your vision of the app but it would solve a major issue with audiobooks that hasn't been solved on Android yet!
Beta Was this translation helpful? Give feedback.
All reactions