You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I translate the sentence "Nokkuð mun minna en 50% skal gisti tær.", I get different results. The sentences that are provided in the batch size are the only difference. I've prepared a minimum example (sentences1.txt and sentences2.txt):
Files sentences1.txt and sentences2.txt contain very similar sentences, but the only common sentence is just the one which should report the same translation, but it does not: "Anything less than 50% should be edible." vs "Anything less than 50% should be ed.".
I think this might be expected if the execution was in GPU, since the order of the instructions might be different, but I think this should not happen in CPU.
I used the isen.student.base model from browsermt/students#74 (not sure if it is the same version of the PR, since I see changes in the files of the PR and mines, but should be the same).
cat sentences1.txt | marian-dev/scripts/marian-translate-is2en.sh 5
# The HCA percent is very important, make sure that these ranges in between 50 and 60%.# Anything less than 50% should be edible.# You need to keep an eye out for uncountable items, binders and fillers.
cat sentences2.txt | marian-dev/scripts/marian-translate-is2en.sh 5
# The HCA percent is very important, make sure that this limit in between 50 and 60%.# Anything less than 50% should be ed.# You need to look out for synthetic components, binders as well as fillers.
If I translate both files together (cat sentences{1,2}.txt or cat sentences{2,1}.txt), the result of the sentence is "Anything less than 50% should be edible.".
Context
Marian version: v1.9.56 a1a82ff 2021-10-18 18:17:11 +0200
Bug description
Hi!
When I translate the sentence "Nokkuð mun minna en 50% skal gisti tær.", I get different results. The sentences that are provided in the batch size are the only difference. I've prepared a minimum example (
sentences1.txt
andsentences2.txt
):Files
sentences1.txt
andsentences2.txt
contain very similar sentences, but the only common sentence is just the one which should report the same translation, but it does not: "Anything less than 50% should be edible." vs "Anything less than 50% should be ed.".I think this might be expected if the execution was in GPU, since the order of the instructions might be different, but I think this should not happen in CPU.
I used the
isen.student.base
model from browsermt/students#74 (not sure if it is the same version of the PR, since I see changes in the files of the PR and mines, but should be the same).sentences2.txt
sentences1.txt
How to reproduce
Translation script (
marian-translate-is2en.sh
):Translate:
If I translate both files together (
cat sentences{1,2}.txt
orcat sentences{2,1}.txt
), the result of the sentence is "Anything less than 50% should be edible.".Context
Maybe is related to huggingface/transformers#25921 (?)
The text was updated successfully, but these errors were encountered: