You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, chatllama supports the synthetic data generation just from OpenAI’s davinci-003.
Both for conversations and for scores.
In order to avoid huge costs while generating data we should support other API models (as the cheaper gpt-3.5-turbo ), other API providers and local models (Flan T5 seems a good candidate).
Furthermore, in order to generate more diverse data, it could be beneficial to be able to use multiple prompt templates during the generation.
TODO
Add support for gpt-3.5-turbo . Externally respect to LangChain models.
Add preview of the costs associated with the API models (i.e. n_words / 0.75 * API_cost_per_token) before proceeding with the labelling.
Modify langchain-based script for supporting multiple API models and providers.
Add support for HF models to perform the generation task.
Allow user to specify multiple templates when generating synthetic data that can be customisable to the user needs.
Provide multiple template examples for dataset generation.
The text was updated successfully, but these errors were encountered:
hi did you add support for HF models in dataset generation? It seems only OpenAI’s davinci-003 in line 21 in generate_rewards.py.
PierpaoloSorbellini
changed the title
Add multiple sources for generating synthetic data
[Chatllama] Add multiple sources for generating synthetic data
Mar 31, 2023
Description
Currently, chatllama supports the synthetic data generation just from OpenAI’s
davinci-003
.Both for conversations and for scores.
In order to avoid huge costs while generating data we should support other API models (as the cheaper
gpt-3.5-turbo
), other API providers and local models (Flan T5 seems a good candidate).Furthermore, in order to generate more diverse data, it could be beneficial to be able to use multiple prompt templates during the generation.
TODO
gpt-3.5-turbo
. Externally respect to LangChain models.The text was updated successfully, but these errors were encountered: