Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add context-free grammar constrained decoding(ebnf interface) into reserach project directory #28210

Closed

Conversation

Saibo-creator
Copy link
Contributor

This PR is a follow-up from PR #27557 where @gante has suggested putting this feature as a research project.
It adds a new feature: Context Free Grammar Constrained Decoding, similarly to what llama-cpp has.
It provides the (almost )same interface as llama-cpp.

Fixes #25778

Before submitting

Who can review?

@gante

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Copy link
Member

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It mostly looks good to me, only one final decision to make:

In the readme, the user is requested to install transformers_cfg, i.e. this repo. The diff here and the repo seem to be a near copy of each other, which means that at best they are redundant, and it's quite likely that one of them will get stale over time.

As such, one of two things should happen:
1 - You prefer to keep and maintain your repo 👉 we are more than happy to amplify your work on social media, but this PR should be closed
2 - You prefer to host the code here 👉 remove installation references regarding the transformers_cfg repo

# limitations under the License.


# import torch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# import torch

@Saibo-creator
Copy link
Contributor Author

Hello @gante ,
Thank you for your feedback. I think the first option is better. I will close this PR and keep my repo.
My rationales are:
The grammar constrained generation is not yet complete. It doesn't support all tokenizers and unicode. I will continue to improve it.
As a standalone repo, it is easier to maintain and update.

Thank you for offering to amplify my work on social media. 
I will let you know when I have a more complete version of the grammar constrained generation. Now it's working but no thorough testing has been done yet.
I know that it may have unexpected behavior in some cases, tho users may even not notice it.

@gante
Copy link
Member

gante commented Jan 18, 2024

@Saibo-creator perfect!

Let me know when you think it's ready, so we can post on social media about it 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for context-free-grammars (CFG) to constrain model output
2 participants