You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
…change (#2297)
Summary:
# Before submitting
- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?
## What does this PR do?
Fixes # (issue).
## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: fairinternal/fairseq-py#2297
Reviewed By: alexeib
Differential Revision: D30906090
Pulled By: dianaml0
fbshipit-source-id: 941d30db7f766c9077a1b5bb2a04680f57e2e070
sorenmulli
pushed a commit
to sorenmulli/fairseq
that referenced
this issue
Oct 4, 2021
…change (facebookresearch#2297)
Summary:
# Before submitting
- [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
- [ ] Did you read the [contributor guideline](https://github.com/pytorch/fairseq/blob/master/CONTRIBUTING.md)?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?
## What does this PR do?
Fixes # (issue).
## PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
## Did you have fun?
Make sure you had fun coding �
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2297
Reviewed By: alexeib
Differential Revision: D30906090
Pulled By: dianaml0
fbshipit-source-id: 941d30db7f766c9077a1b5bb2a04680f57e2e070
🐛 Bug
add_insertion_noise in DenoisingDataset does not respect
--max-source-positions
.This can become an issue when specifying, e.g.,
--mask 0.3 --mask-length span-poisson --poisson-lambda 3.5
for the denoising task.In the call to add_whole_word_mask, if the input source tokens are already at the maximum size allowed by the model, the resulting call to add_insertion_noise can yield a result that is longer than
--max-source-positions
.To Reproduce
Have a dataset with inputs of size equal to
--max-source-positions
.Run a denoising task with
--mask 0.3 --mask-length span-poisson --poisson-lambda 3.5
.Eventually, the dataset will return a batch with token count greater than
--max-source-positions
.Expected behavior
The call to
add_whole_word_mask
(or really,add_insertion_noise
) should not return a set of tokens with a length longer than--max-source-positions
.Environment
pip
, source):pip install -e .
The text was updated successfully, but these errors were encountered: