-
Notifications
You must be signed in to change notification settings - Fork 27.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[XLNet] Parameters to reproduce SQuAD scores #947
Comments
Maybe we can use the same issue so the people following #822 can learn from your experiments as well? |
I'm using xlnet-large-cased. Then I took a look at the XLNet repo and found the current preprocessing in transfomers is a little off. For the XLNet repo, they have P SEP Q SEP CLS, but the preprocessing code in this repo has CLS Q SEP P SEP. I tried to follow the XLNet repo preprocessing code and the hyper parameters in the paper and now I have Here are my preprocessing code with the changes. Sorry it's a bit messy. I will create a PR next week.
|
I haven't got a chance to try SQuAD 2.0. My guess is that since the CLS token is needed in SQuAD 2.0 to predict unanswerable questions, when the CLS token is misplaced, the impact on the model performance is bigger. |
This is great @hlums! looking forward to a PR updating the example if you have time |
Updating after I read comments in #1405 carefully. |
@hlums hey you can just fork this repo, make your changes in your version of the repo, and then do a pull request - that should work |
My change is completely independent of data input and preprocessing — it just adjusts a few gemm and batchedGemm calls in the XLNetLayer to be more efficient. I referenced the related issues to give context to the exact f1 scores I was making sure I got on each version of the code. So I believe your PR is very much necessary and important :)
Edit: Original context of the email I replied to as I don't see it here anymore:
@slayton58 , is your change in the modelling code equivalent to changing the order of the tokens in the preprocessing code?
|
Thanks for the clarification @slayton58! I figured it out after reading the comments in you PR more carefully. :) |
Thank you guys! I solved the permission denied issue by git clone using ssh instead of https. Not sure why I never had this issue with my company's repos. Here are what I did
The content of my bash script is following
|
@hlums The reason I ask is that with 2 x 1080Ti NVIDIAs trying to run_squad.py on XLNet & BERT models, I experience data-parallel-run and distributed-performance-reporting (key error) failures. Perhaps you have the solution to either/both? |
@ahotrod I'm using Pytorch 1.2.0. I have 4 NVIDIA V100. |
@ahotrod I've been seeing key errors only when running eval in distributed -- training is fine (and I've run quite a few full 8xV100 distributed finetunings in the last few weeks), but I have to drop back to |
@hlums @slayton58 Thank you both for informative, helpful replies. ** Updated, hope I adequately explain my work-around ** I prefer distributed processing for the training speed-up, plus my latest data parallel runs have been loading one of <parameters & buffers> on cuda:1 and shutting down. As recommended I dropped the Here's my distributed fine-tuning script:
which maxes-out my 2 x 1080Ti GPUs (0: hybrid, 1: open-frame cooling):
After 3 epochs & ~21 hours, here are the results, similar to @colanim :
generated from my single GPU
This model performs well in my Q&A application, but looking forward to @hlums pre-processing code, the imminent RoBERTa-large-SQuAD2.0, and perhaps one-day, ALBERT for the low-resource user that I am. |
OK. Figured out the relative module import issue. Code is running now and should have the PR tomorrow if nothing else goes wrong. |
PR is here #1549. My current result is Still a few points lower than what's reported in the XLNet paper, but we made some progress. :) |
How to convert |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
* Add support for global `onnxruntime` - Allows custom JS runtimes to expose their own ONNX API. * Update onnx.js --------- Co-authored-by: Joshua Lochner <admin@xenova.com>
I'm trying to reproduce the results of XLNet-base on SQuAD 2.0.
From the README of XLNet :
I ran the example with following hyper-parameters, on a single GPU P100 :
And I got these results :
It's 8 points lower than the official results.
What are the parameters needed to reach same score as the official implementation ?
I open another issue than #822, because my results are not that much off.
The text was updated successfully, but these errors were encountered: