Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrating OAI evals post training #85

Merged
merged 22 commits into from
Aug 27, 2024
Merged

Conversation

farzadab
Copy link
Contributor

@farzadab farzadab commented Aug 15, 2024

This PR switches our end of training evaluations to use the OpenAI evals framework.

@farzadab farzadab changed the title Integration OAI evals post training Integrating OAI evals post training Aug 15, 2024
@farzadab farzadab marked this pull request as ready for review August 20, 2024 21:16
pyproject.toml Show resolved Hide resolved
setup.sh Show resolved Hide resolved
@farzadab farzadab requested review from juberti and zqhuang211 August 20, 2024 22:14
@farzadab farzadab marked this pull request as draft August 20, 2024 22:54
@farzadab farzadab marked this pull request as ready for review August 22, 2024 04:23
@farzadab farzadab requested review from juberti, petersalas and zqhuang211 and removed request for petersalas August 22, 2024 04:24
@farzadab
Copy link
Contributor Author

farzadab commented Aug 22, 2024

Here's an example task which barely does any training, but does the full evaluation (assuming this PR is merged): https://wandb.ai/fixie/ultravox/runs/nxgfzg3v

Copy link
Contributor

@juberti juberti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this LGTM, nice work.

ultravox/training/train.py Outdated Show resolved Hide resolved
@juberti
Copy link
Contributor

juberti commented Aug 22, 2024

Not required for this PR, but it looks like wandb has some basic support for handling oaieval outputs, including sample-by-sample visualization, maybe if we can log to the right format we can get this visualization too: https://wandb.ai/wandb_fc/openai-evals/reports/OpenAI-Evals-Demo-Using-W-B-Prompts-to-Run-Evaluations--Vmlldzo0MTI4ODA3 (edit: looks like it just consumes the log files natively, so perhaps we can just log them to wandb as artifacts).

@juberti
Copy link
Contributor

juberti commented Aug 22, 2024

example wandb oaieval run: https://wandb.ai/wandb/jobs/runs/ugqqpjff?nw=nwuser_scott

@farzadab
Copy link
Contributor Author

Justin using wandb.Table was a great idea. It shows all metrics in one table and also allows us to automatically create charts from them.
The charts from multiple experiments can also be stacked, even if they only match partially.

image image image

@farzadab farzadab merged commit 638a7a6 into main Aug 27, 2024
1 check passed
@farzadab farzadab deleted the farzad-integrate-oaieval branch August 27, 2024 22:31
akshat0311 pushed a commit to jiviai/audio-llm that referenced this pull request Jan 30, 2025
* allow lower python version for lambda cloud and adding ultravox-vllm

* integrate oaievals

* evaluations using oaievalset

* make sure pipeline can be loaded correctly

* force 1 GPU and set max_num_samples

* logging eval Table to w&b + make text-only eval optional
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants