Performance metrics for evaluating time segmentation of Whisper models #182

mafaisalpg · 2024-04-04T21:05:45Z

mafaisalpg
Apr 4, 2024

Hello everyone,

I like to compare different Whisper models on my own datasets changing some parameters like detect_disfluencies, etc. My output looks like

start, end, text

instead of word level timestamp.

Questions

What are metrics (MSE, MAE, etc.) best fit for this purpose?
Is there any Python implementation which I can reuse?
Is there any paper or article have done such comparison?

Thanks

Jeronymous · 2024-04-05T01:43:29Z

Jeronymous
Apr 5, 2024
Maintainer

Yes, we studied how to evaluate timestamps quality.
It's described in this paper: "Transcribing And Aligning Conversational Speech: A Hybrid Pipeline Applied To French Conversations"
https://hal.science/hal-04404777 (part "V. EVALUATION METRICS")

Unfortunately there is no open source code to compute those metrics.
But if you have any question about that, I can answer

1 reply

mafaisalpg Apr 5, 2024
Author

thanks. do whisper-timestamped any implementation for these metrics which were used for comparing with whisper (original)?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance metrics for evaluating time segmentation of Whisper models #182

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Performance metrics for evaluating time segmentation of Whisper models #182

mafaisalpg Apr 4, 2024

Replies: 1 comment · 1 reply

Jeronymous Apr 5, 2024 Maintainer

mafaisalpg Apr 5, 2024 Author

mafaisalpg
Apr 4, 2024

Replies: 1 comment 1 reply

Jeronymous
Apr 5, 2024
Maintainer

mafaisalpg Apr 5, 2024
Author