Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whisper encoder + No 30 second padding #5

Merged
merged 9 commits into from
Jun 4, 2024
Prev Previous commit
Next Next commit
formatting
farzadab committed Jun 4, 2024
commit 1a52c9ec2508e3cab2b82f8eae591097eaac4f48
2 changes: 1 addition & 1 deletion ultravox/model/ultravox_model.py
Original file line number Diff line number Diff line change
@@ -10,8 +10,8 @@
import transformers.modeling_outputs
import transformers.models

from ultravox.model import ultravox_config
from ultravox.model import modified_whisper
from ultravox.model import ultravox_config


class UltravoxModel(
1 change: 1 addition & 0 deletions ultravox/model/ultravox_processing.py
Original file line number Diff line number Diff line change
@@ -112,6 +112,7 @@ def __call__(
if audio is not None and len(audio) > 0:
if self.audio_padding == "max_length":
# 30 seconds is the expected length for Whisper
assert sampling_rate is not None, "Sampling rate must be provided."
audio_len = 30 * sampling_rate
else:
audio_len = audio.shape[-1]