Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reproduce training dd #4

Open
hppy139 opened this issue Dec 16, 2024 · 3 comments
Open

reproduce training dd #4

hppy139 opened this issue Dec 16, 2024 · 3 comments

Comments

@hppy139
Copy link

hppy139 commented Dec 16, 2024

To reproduce training, can you share the training "tensor/" directory?
I am facing a overfitting situation that "train loss is close to zero".
图片

@hppy139 hppy139 changed the title reproduce training reproduce training dd Dec 16, 2024
@ddehun
Copy link
Owner

ddehun commented Dec 16, 2024

Hi. Thank you for your interest in our work.

Unfortunately, I recently left my lab by graduation, so I cannot access the tensor/ directory.

I have some questions to guess the problems.

  1. Which hyperparameters do you use to train your reranker? I think https://github.com/ddehun/DEnsity/blob/master/scripts/train.sh this script can be used to obtain similar results to the paper.
  2. It would be also good to evaluate your model in the downstream task (i.e., meta-evaluation dataset for dialogue).
  3. If your targeted corpus is either ConvAI2 or DailyDialogue, You can also use the pre-released checkpoints in README (https://drive.google.com/drive/folders/1IUUg6xsmEr28oed2yPqIA2m6xsQ9yNRd).

Please ping me if you have further issues. Thank you!

@hppy139
Copy link
Author

hppy139 commented Dec 16, 2024

And actually I have got your checkpoints models, I'm recently working on training on my own data("one round dialogue", i.e. speaker A one turn + speaker one turn). So, the first step for me is to reproduce your training.
As for "1": yes, I am using the same parameters as this "DATASET=dd; temp=0.1; weight=1; lr=5e-5".
It's hard for me to distinguish which part got wrong...
(Bad to hear that "Unfortunately, I recently left my lab by graduation, so I cannot access the tensor/ directory." o(╥﹏╥)o)

@hppy139 hppy139 closed this as completed Dec 16, 2024
@hppy139 hppy139 reopened this Dec 16, 2024
@hppy139
Copy link
Author

hppy139 commented Dec 16, 2024

Besides, I'm wondering your time cost in training dd model. One epoch costs me 9.5 hours by using 4 Tesla K40c GPUs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants