-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculate duration of input files in order to decide which engine to use #136
Conversation
return parseFloat(stdout); | ||
} catch (error) { | ||
logger.error(`Error during ffprobe file duration detection`, error); | ||
throw error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will we return a 500 if we can't find file duration here? might be better to default to whisperx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good shout, implemented in 5d7709a
packages/common/src/types.ts
Outdated
@@ -69,6 +69,8 @@ export const TranscriptionJob = z.object({ | |||
engine: z.nativeEnum(TranscriptionEngine), | |||
// we can get rid of this when we switch to using a zip | |||
translationOutputBucketUrls: z.optional(OutputBucketUrls), | |||
// this is optional because giant currently doesn't know file duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do ffmpeg or the whisper models output file duration? Or would it be worth having the worker calculate duration with ffprobe? That would unify the way we're finding duration and make it so Giant isn't a special case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I was more just shoving this in in case useful for logging - the only place it was used was in setting a 'duration' field in the transcription failure message, which isn't really necessary. So I've removed this field in 244a6d2
Deploy build 801 of
|
244a6d2
to
1833874
Compare
What does this change?
Currently we have two available 'engines' for transcription:
It seems a shame to wait to spin up whisperx for a file that is only a few minutes long. This PR gets the API to make a decision on which engine to use based on the duration of the file - any files under 10 minutes long where the user hasn't requested diarization will be sent to whisper.cpp, everything else goes to whisperX
To get ffprobe running on lambda I had to create a lambda layer - this is basically just a zip file that gets added to the disk of the lambda vm.
How to test
Tested on CODE