-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add whisper masking #146
Add whisper masking #146
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't whisper expect us to pad to 30sec? Do we have any concern about masking the padding here? Other than that the code for the pad masking looks good
Yes, Whisper is trained with padding to 30 seconds, so the way we’ve used it may cause some mismatch. However, it does not seem to degrade end-to-end performance in our (limited) comparative studies. |
- Added masking in whisper encoder to ensure consistency in training and inference. - Simplified release_config.yaml to serve as an example configuration.
* Fix typo in README.md (#128) * [bugfix] Missing enable_fsdp in 70b config (#132) * Update load warnings (#126) * Generic datasets with inheritance (#135) * Switch InterleaveDataset to use weights (e.g., 2.0, 0.5, etc) (#140) * Break up datasets.py (#141) * Update registry with more languages commonvoice (#143) * Split dataset definitions into individual files (#145) * Add whisper masking (#146) * Defining block size in UltravoxConfig, and solving assertions (#157)
release_config.yaml
to serve as an example configuration.