You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Motivated by a feature request to handle live streams in beeb, placeholder notes for when I'm ready to implement transcription from live streams following this]
Speaker segmentation is the only part that complicates the workflow here.
Essentially there's an "offline" segmentation (i.e. done after the show has ended), but we could switch to an "online" segmentation:
If the show has begun, 'rewind' by stepping back to the first M4S stream segment and download all
(beeb will handle this)
While the show is on air, continue to download each new M4S stream segment until the last (at which time the show will go off air)
(again, beeb will handle this)
As soon as possible, merge all the downloaded M4S stream segments, and label the "low energy" time points
Now (ignoring the potentially ongoing stream segment downloads) split the merged stream at these points and produce segmented audio clips
Don't use the last of these clips (the one that runs to the end of the merged audio stream)! Its end is potentially going to be cut off in the middle of someone saying something just because it's at the end. Keep that one for the next iteration
...
[TBC]
Due to a limitation in the models I'm using (maximum token sequence length) I can't actually input an entire programme to these steps. In that sense there's no benefit gained from waiting until a programme finishes to build the MP4.
If I implemented live transcription, I'd get the end result much sooner (as I could begin processing the audio while the show was still on air), so I'd be interested in this too.
The text was updated successfully, but these errors were encountered:
[Motivated by a feature request to handle live streams in beeb, placeholder notes for when I'm ready to implement transcription from live streams following this]
Speaker segmentation is the only part that complicates the workflow here.
Essentially there's an "offline" segmentation (i.e. done after the show has ended), but we could switch to an "online" segmentation:
[TBC]
Due to a limitation in the models I'm using (maximum token sequence length) I can't actually input an entire programme to these steps. In that sense there's no benefit gained from waiting until a programme finishes to build the MP4.
If I implemented live transcription, I'd get the end result much sooner (as I could begin processing the audio while the show was still on air), so I'd be interested in this too.
The text was updated successfully, but these errors were encountered: