Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable the API cost tracking in Synthesizer #1406

Merged
merged 3 commits into from
Mar 3, 2025

Conversation

chuqingG
Copy link
Contributor

@chuqingG chuqingG commented Mar 2, 2025

Add a new parameter (False by default) to configure and enable complete cost tracking. An use case is shown as following:

synthesizer = Synthesizer(model="gpt-4o-mini", cost_tracking=True)
goldens = synthesizer.generate_goldens_from_docs(document_paths=doc_paths)
dataset = EvaluationDataset(goldens=goldens)

The output looks like:
image

Copy link

vercel bot commented Mar 2, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
evals-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Mar 2, 2025 5:06am

@penguine-ip
Copy link
Contributor

@chuqingG this looks great, and you reformatted too! will go over it in the next day and merge it asap, thanks!

@penguine-ip
Copy link
Contributor

@chuqingG one question what is the reset cost for? i think the cumulative cost will scare a lot of people

@chuqingG
Copy link
Contributor Author

chuqingG commented Mar 3, 2025

@chuqingG one question what is the reset cost for? i think the cumulative cost will scare a lot of people

@penguine-ip I added reset_cost because technically generate_goldens_from_context() can be called either directly from outside or by generate_goldens_from_docs(). My intention is that the API cost only shows the overhead of one end-to-end synthesis call.

In other words, the API cost is accumulated when the generate_goldens_from_context is called from generate_goldens_from_docs, but it won't print out inside of generate_goldens_from_context. At the end of generate_goldens_from_docs, it will print the accumulated cost (generate_contexts, generate_goldens_from_context, and quality check).

(p.s., for the scaring thing, during my test, process 40 pdf files (1~12 pages each) using gpt-4o-mini cost ~0.2 usd, so it would be ~2 usd for o1-mini and ~30 usd for gpt-4o. For price-sensitive use scenarios, I also modified the quality check part to let small cheap models work smoothly in my personal branch. Feel free to let me know if you think it's okay to also merge this feature here!

Thanks!

@penguine-ip
Copy link
Contributor

@chuqingG thanks for the detailed explanation, definitely ok and love the money bag emoji! Thanks :)

@penguine-ip penguine-ip merged commit 96bb890 into confident-ai:main Mar 3, 2025
8 of 11 checks passed
@penguine-ip
Copy link
Contributor

@chuqingG for this "I also modified the quality check part to let small cheap models work smoothly in my personal branch."

What was it you had to modify? Any chance it can be used for deepeval?

@chuqingG
Copy link
Contributor Author

chuqingG commented Mar 3, 2025

@chuqingG for this "I also modified the quality check part to let small cheap models work smoothly in my personal branch."

What was it you had to modify? Any chance it can be used for deepeval?

@penguine-ip Thanks for the response, I will organize it and make another PR in the next day!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants