Add whisper masking #146

zqhuang211 · 2024-11-07T00:09:37Z

Added masking in whisper encoder to ensure consistency in training and inference.
Simplified release_config.yaml to serve as an example configuration.

liPatrick

Doesn't whisper expect us to pad to 30sec? Do we have any concern about masking the padding here? Other than that the code for the pad masking looks good

zqhuang211 · 2024-11-09T04:28:04Z

Doesn't whisper expect us to pad to 30sec? Do we have any concern about masking the padding here? Other than that the code for the pad masking looks good

Yes, Whisper is trained with padding to 30 seconds, so the way we’ve used it may cause some mismatch. However, it does not seem to degrade end-to-end performance in our (limited) comparative studies.

- Added masking in whisper encoder to ensure consistency in training and inference. - Simplified release_config.yaml to serve as an example configuration.

* Fix typo in README.md (#128) * [bugfix] Missing enable_fsdp in 70b config (#132) * Update load warnings (#126) * Generic datasets with inheritance (#135) * Switch InterleaveDataset to use weights (e.g., 2.0, 0.5, etc) (#140) * Break up datasets.py (#141) * Update registry with more languages commonvoice (#143) * Split dataset definitions into individual files (#145) * Add whisper masking (#146) * Defining block size in UltravoxConfig, and solving assertions (#157)

Zhongqiang Huang added 2 commits November 6, 2024 15:56

Update

57d12a2

update

829b7d2

zqhuang211 requested a review from liPatrick November 7, 2024 00:09

Zhongqiang Huang added 4 commits November 6, 2024 16:12

update

46d7543

update

80b88cb

update

79e3c71

update

da7e222

liPatrick reviewed Nov 9, 2024

View reviewed changes

liPatrick approved these changes Nov 9, 2024

View reviewed changes

zqhuang211 merged commit 812f58c into main Nov 9, 2024
1 check passed

farzadab deleted the zhuang/add_whisper_masking branch December 4, 2024 00:28

akshat0311 pushed a commit to jiviai/audio-llm that referenced this pull request Jan 30, 2025

Add whisper masking (fixie-ai#146)

203a36b

- Added masking in whisper encoder to ensure consistency in training and inference. - Simplified release_config.yaml to serve as an example configuration.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add whisper masking #146

Add whisper masking #146

zqhuang211 commented Nov 7, 2024 •

edited

Loading

liPatrick left a comment •

edited

Loading

zqhuang211 commented Nov 9, 2024

Add whisper masking #146

Add whisper masking #146

Conversation

zqhuang211 commented Nov 7, 2024 • edited Loading

liPatrick left a comment • edited Loading

Choose a reason for hiding this comment

zqhuang211 commented Nov 9, 2024

zqhuang211 commented Nov 7, 2024 •

edited

Loading

liPatrick left a comment •

edited

Loading