-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about the GenerationConfig in commonsense_evaluate.py #6
Comments
By the way, when using |
Thanks for your kind reminder. The hyperparameters are not intended but follow https://github.com/AGI-Edgerunners/LLM-Adapters I understand that such a setting would affect the results but keep the same for all baselines for fair comparison. |
Thanks for your reply.
|
I have not tried this. I think there would be something deeper in the transformer's decoding process. BTW, we must set the batch size to 1 when decoding. (refer to huggingface/transformers#25921)
You can try that.
Yes, the results are not stable. I guess the reasons are as 2-folds: |
Thanks, and I will try more experiments. |
One related response: |
Thanks for the fast open source.
I find that in the
commonsense_evaluate.py
Line 52~58, the value of the parameterdo_sample
of GenerationConfig has not been set, and the default value ofdo_sample
isFalse
. Then, with thedo_sample=False
andnum_beams=4
, the model will generate using beam-search decoding.Besides, I also find that the Line 60~66 may not pass the value of related
attention_mask
which may cause warning in transformers library.I don't know whether this behavior is intended, and the right way (hyper-param in generate) to reproduce the results in Table 4. of this paper.
The text was updated successfully, but these errors were encountered: