-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to reproduce LISA-Llama2-13B? #82
Comments
Hi, I think your setting is the same as mine. I think it may due to the high variance on the validation set. We will soon release the test set (which is larger than the val set), and then you can make evaluation on it. |
I noticed that in your code the 200 images in the validation set are all sampled for evaluation, thus I failed to understand the "high variance" you mentioned here. How does it lead to the failure of result reproduction? (I trained the 7B-v0-ft model and got giou0.408,ciou0.435, which is lower than the reported giou0.529, ciou0.54 in your paper) |
@baoxiaoyi Hi, some issues (#41) have reported that the results of 7B-v0-ft model can be successfully reproduced. Have you strictly followed the instruction in README file? |
The only difference lies in that |
LLaVA-1.5 uses 336px image resolution, so you should change the clip model and control max context length. Also, the image token length is set to 256 by default, but when the resolution is changed to 336, the image token length should be set to 576. Overall, some implementation details need further consideration to adapt to llava-1.5. You should check that in detail. The use of flash-attn should not affect the final performance. |
Do I understand your reply correctly? |
Excellent work! When will you release this test set dataset? |
Have you successfully run the code for LLaVA-1.5? |
I used these settings in LLaVA 1.5, but there are still some errors in certain parts of the configuration. May I ask for some guidance? |
l are trying to change 255 to 575 ,running successfully |
parser.add_argument("--model_max_length", default=575, type=int) this one? |
|
I tried training LISA-Llama2-13B using 4x 80GB A100 with the following commands:
and get the following results:
which is a far-cry from the results in the paper:
Do LISA-Llama2-13B use different hyper-parameters? What am I doing wrong? How can I reproduce the LISA-Llama2-13B results?
The text was updated successfully, but these errors were encountered: