Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Word Timestamps Generation Fails with Access Violation #2

Open
zh-plus opened this issue Dec 4, 2024 · 0 comments
Open

Word Timestamps Generation Fails with Access Violation #2

zh-plus opened this issue Dec 4, 2024 · 0 comments

Comments

@zh-plus
Copy link

zh-plus commented Dec 4, 2024

Environment

Model: kotoba-tech/kotoba-whisper-v2.0-faster
Device: CUDA
Compute Type: float16
OS: Windows
faster-whisper version: 8327d8cc647266ed66f6cd878cf97eccface7351

Description

When attempting to generate word timestamps using the faster-whisper model, the process crashes with an access violation error (exit code -1073741819/0xC0000005).

Reproduction

from faster_whisper import WhisperModel
model = WhisperModel("kotoba-tech/kotoba-whisper-v2.0-faster", device='cuda', compute_type='float16', num_workers=1)
segments, info = model.transcribe("output.mp3",
                                  language="ja", chunk_length=15, condition_on_previous_text=False, vad_filter=True, word_timestamps=True)
for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

Error Details

Exit Code: -1073741819 (0xC0000005)
Error Location: The crash occurs in the alignment phase at transcribe.py#L1654

results = self.model.align(
    encoder_output,
    tokenizer.sot_sequence,
    text_tokens,
    num_frames,
    median_filter_width=median_filter_width,
)

Additional Context

  • The error code (0xC0000005) indicates an access violation, which typically occurs when a program tries to read or write to memory that it doesn't have permission to access.
  • The issue appears to be related to either the model itself or the model conversion phase, as the crash happens during the alignment process.
  • I used the officially provided distilled model distil-large-v3 to generate word timestamps, and it worked.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant