-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrating OAI evals post training #85
Conversation
Here's an example task which barely does any training, but does the full evaluation (assuming this PR is merged): https://wandb.ai/fixie/ultravox/runs/nxgfzg3v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this LGTM, nice work.
Not required for this PR, but it looks like wandb has some basic support for handling oaieval outputs, including sample-by-sample visualization, maybe if we can log to the right format we can get this visualization too: https://wandb.ai/wandb_fc/openai-evals/reports/OpenAI-Evals-Demo-Using-W-B-Prompts-to-Run-Evaluations--Vmlldzo0MTI4ODA3 (edit: looks like it just consumes the log files natively, so perhaps we can just log them to wandb as artifacts). |
example wandb oaieval run: https://wandb.ai/wandb/jobs/runs/ugqqpjff?nw=nwuser_scott |
* allow lower python version for lambda cloud and adding ultravox-vllm * integrate oaievals * evaluations using oaievalset * make sure pipeline can be loaded correctly * force 1 GPU and set max_num_samples * logging eval Table to w&b + make text-only eval optional
This PR switches our end of training evaluations to use the OpenAI evals framework.