-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recreating WMT14 EN-DE transformer results #1862
Comments
What do you mean by outdated commands in #637? I don't think lots of params have changed since then. |
@francoishernandez there are some parameters such as "-epochs" and "-report_every" that do not work anymore in the current version, but yes, you're right; it shouldn't be that big of a concern. However, #637 mostly talks about preprocessing and tokenizing though, which I believe I'm following correctly. Do you think there's something wrong with my current preprocessing/tokenizing steps? As for this https://forum.opennmt.net/t/reproducing-pre-trained-transformer-model/3591/6, I tried the other perl file that was mentioned (multi-bleu-detok.perl), and it indeed gave an improvement, but it was only very slight (around 0.1 higher BLEU). |
don't take data from Stanford. |
I'm currently trying to recreate the results from the paper "Attention is All You Need". According to the paper, the transformer model achieved a BLEU score of 27.3 on the WMT14 EN-DE dataset (with test set newstest2014).
I cannot seem to achieve that score. The best score I achieved was around 25.8.
Below are the commands I'm using (mostly copied exactly from the Opennmt website):
1. Download the data
Here, I'm using newstest2013 for validation and newstest2014 for testing, as described in the Attention is All You Need paper.
After this, I also renamed the newstest files to valid.en, valid.de, test.en, test.de.
2. Preprocess the data
This is taken from https://opennmt.net/OpenNMT-py/extended.html
3. Train
Training code taken from https://opennmt.net/OpenNMT-py/FAQ.html
4. Translate
Code taken from https://opennmt.net/OpenNMT-py/extended.html
onmt_translate -gpu 0 -model results/wmt14_model_step_200000.pt -src wmt14/test.en.atok -tgt wmt14/test.de.atok -replace_unk -verbose -output results/wmt14.test.pred.atok
5. Evaluate
Code taken from https://opennmt.net/OpenNMT-py/extended.html
perl tools/multi-bleu.perl wmt14/test.de.atok < results/wmt14.test.pred.atok
According to the FAQ page, the training command is supposedly able to recreate the WMT14 results. Am I doing something wrong in my steps? I tried looking at this related issue #637 but it seems like some of the commands there are outdated.
The text was updated successfully, but these errors were encountered: