Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Online segmentation and transcription for live streams #18

Open
lmmx opened this issue Apr 6, 2021 · 0 comments
Open

Online segmentation and transcription for live streams #18

lmmx opened this issue Apr 6, 2021 · 0 comments
Labels
enhancement New feature or request

Comments

@lmmx
Copy link
Owner

lmmx commented Apr 6, 2021

[Motivated by a feature request to handle live streams in beeb, placeholder notes for when I'm ready to implement transcription from live streams following this]

Speaker segmentation is the only part that complicates the workflow here.

Essentially there's an "offline" segmentation (i.e. done after the show has ended), but we could switch to an "online" segmentation:

  • If the show has begun, 'rewind' by stepping back to the first M4S stream segment and download all
    • (beeb will handle this)
  • While the show is on air, continue to download each new M4S stream segment until the last (at which time the show will go off air)
    • (again, beeb will handle this)
  • As soon as possible, merge all the downloaded M4S stream segments, and label the "low energy" time points
    • Now (ignoring the potentially ongoing stream segment downloads) split the merged stream at these points and produce segmented audio clips
    • Don't use the last of these clips (the one that runs to the end of the merged audio stream)! Its end is potentially going to be cut off in the middle of someone saying something just because it's at the end. Keep that one for the next iteration
    • ...

[TBC]

Due to a limitation in the models I'm using (maximum token sequence length) I can't actually input an entire programme to these steps. In that sense there's no benefit gained from waiting until a programme finishes to build the MP4.

If I implemented live transcription, I'd get the end result much sooner (as I could begin processing the audio while the show was still on air), so I'd be interested in this too.

@lmmx lmmx added the enhancement New feature or request label Apr 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant