[WIP] Add Support for masking output using a Context-Free-Grammar #26520

jvhoffbauer · 2023-10-01T16:28:31Z

What does this PR do?

Review

@gante @LysandreJik as discussed. Also, @oobabooga @ArthurZucker @jorgemcgomes feel free to look at this.
This is in large parts inspired by rellm and parserllm by @r2d4 so I am tagging him here too for visibility.

Discussion

This is totally WIP still. I added a notebook showcasing how it might work when using the generation API from a low-level perspective. Please educate me if that is ok that way. Long-term I want to add tests.
Generally, we use Lark to parse the grammar. The grammar itself would most likely come from a text file or a string but Lark also has a grammar object that one can use. I built a class CfgParsingStepper that we can use to get the state of the parser for any input string. It will give us the terminal symbol we are processing and the regex for this symbol that we can use to find valid tokens. The CfgLogitsProcessor gets the state every turn for every beam. We can consider persisting the state during generation which might be faster, instead of recalculating it.

The whole codebase is still very much WIP. Most importantly, we still need to

Add tests (instead of a Jupyter Notebook)
Add error handling to make sure the parser throws an error if it is started with an invalid input
Connect the logits processor to the actual generation APIs. I am thinking of something like model.generate(..., grammar=Grammar(...))
Think about cases where you want the grammar constraint to start only for the new generations and not for the input prompt

Anyhow, I wanted to put this out there for discussion. Let me know if it makes sense this way and whether you would like me to continue working in this way or if I should change anything!

LysandreJik · 2023-10-02T08:50:09Z

Thanks for your PR! I'm prompting @gante for review once he's back from leave :)

I appreciate your effort and patience @jvhoffbauer!

jvhoffbauer · 2023-10-05T16:56:36Z

Thanks for your PR! I'm prompting @gante for review once he's back from leave :)

I appreciate your effort and patience @jvhoffbauer!

Hey @LysandreJik, did you already have time to review?

oobabooga · 2023-10-07T04:11:47Z

My 2 cents: it would be desirable to have compatibility with the BNF grammars in the llama.cpp repository:

https://github.com/ggerganov/llama.cpp/tree/master/grammars

Existing work in this direction (using custom LogitsProcessor in the transformers library):

https://github.com/Shopify/torch-grammar

https://github.com/im-not-tom/text-generation-webui-output-template-extension/

The original llama.cpp PR:

ggerganov/llama.cpp#1773

LysandreJik · 2023-10-09T09:25:43Z

Hey @jvhoffbauer, I will let @gante that just came back from holiday review as soon as he has a spare cycle; he's the owner of generate and should be able to provide a much better review than I can :)

Thanks for your contribution

gante · 2023-10-16T13:26:35Z

(it's not forgotten, it's in my queue -- I should be able to review it over the new few days)

jvhoffbauer · 2023-10-16T19:21:55Z

Awesome! It’s just a draft. Please let me know your thoughts and I will focus on wrapping it up to a complete PR. Potentially already over the coming week.

gante

Thank you for opening the PR @jvhoffbauer! And apologies for my delayed review 🤗

This is a very challenging PR to review. On one hand, see the value and the possibilities it unlocks. On the other hand, it is hard to incorporate inside generate -- first allow me to explain the problem, to then propose an alternative avenue!

Problem: this operation depends on the tokenizer. We really want to avoid having tokenizer-related operations inside generate, as a) it will introduce a new degree of freedom and complexity to an already complex function; b) tokenization happens in the CPU, which means this will be an (unexpected) cause of slowdowns to most GPU users AND will difficult/restrict our hardware optimization goals.

Proposed alternative: In technical terms, we can work around the problem I described above if we iteratively call generate one token at a time, passing around past_key_values for inference speed purposes, and performing this additional logic outside the generate call. My suggestion would be to add an advanced tutorial to our transformers documentation, using lark and an example as in your notebook (FYI we intend to do the same with RAG). In other words, this wouldn't be a logits processor, but the code to add this feature would be just as easy to find as it and our team wouldn't suffer from the complexity expansion 🤗

WDYT?

jvhoffbauer · 2023-11-08T14:23:52Z

@gante no worries, I am still here!

I understand you do not want to make the generate function dependent on a CPU-bound tokenizer which I get. The way you explain the approach makes total sense and I trust it is the best way of integrating this functionality into transformers.

Does it make sense if I start drafting out the code that could be used for such an article?

gante · 2023-11-08T14:29:16Z

@jvhoffbauer absolutely! You can start by drafting a stand-alone .py file (easier to see the diff and iterate than a notebook :) ), and when we're happy with it, I will share further pointers on integrating it into the documentation.

After this is done, I would love to invite you to write a community blog post explaining the virtues of context-free-grammar and to share a space with this technique! 💛

jvhoffbauer · 2023-11-08T14:45:44Z

Sounds great! I will prepare a draft and we can iterate.

Saibo-creator · 2023-11-14T11:09:02Z

Is it a bit overkill to introduce the dependency of Lark to implement the grammar-constrained decoding feature ?
It seems the llama-cpp repo simply made a standalone implementation ? @gante https://github.com/ggerganov/llama.cpp/tree/master/grammars

gante · 2023-11-15T15:51:38Z

@Saibo-creator It's okay, since lark is stable and has no dependencies of its own :)

Saibo-creator · 2023-11-15T23:51:36Z

For this feature, I think there are several aspects to take into account while implementing:

how does this work with different tokenizers(bpe, unigram, wordpiece etc). For example, LLAMA-CPP's implementation is specific to llama tokenizer, I don't think it will work for wordpiece.
unicode support. This is a bit tricky but important for multilingual usage, llama-cpp 's first implementation didn't take care of that and later they spotted and fixed it . Emojis in grammar cause assertion failure ggerganov/llama.cpp#2501
possible comptatbility with other sampling methods such as top-k sampling
[future] comptaiblity with utf-16 ?

Saibo-creator · 2023-11-17T10:37:31Z

My 2 cents: it would be desirable to have compatibility with the BNF grammars in the llama.cpp repository:

https://github.com/ggerganov/llama.cpp/tree/master/grammars

Existing work in this direction (using custom LogitsProcessor in the transformers library):

https://github.com/Shopify/torch-grammar

https://github.com/im-not-tom/text-generation-webui-output-template-extension/

The original llama.cpp PR:

ggerganov/llama.cpp#1773

See #27557 for an implementation compatible with llama-cpp

RadixSeven · 2024-04-09T11:53:59Z

Is anyone still working on this? I need this functionality to test an idea. I can either do a quick fork for my own use or spend a little longer and make something worth sharing.

Saibo-creator · 2024-04-09T12:18:16Z

Is anyone still working on this? I need this functionality to test an idea. I can either do a quick fork for my own use or spend a little longer and make something worth sharing.

Hey @RadixSeven, due to the complexity of this feature, the HF team has decided it would be best to transition it to a separate project instead of integrating it.

Feel free to check out https://github.com/epfl-dlab/transformers-CFG, which was created from this PR.

jvhoffbauer added 2 commits October 1, 2023 16:10

Add first version of CFG masking logits processor

68b60d7

Temporarily add notebook showcasing CFG masking logits processor

f98464c

gante reviewed Nov 8, 2023

View reviewed changes

Saibo-creator mentioned this pull request Nov 17, 2023

Context Free Grammar Constrained Decoding (ebnf interface, compatible with llama-cpp) #27557

Open

19 tasks

github-actions bot closed this Dec 21, 2023

LysandreJik reopened this Dec 31, 2023

huggingface deleted a comment from github-actions bot Dec 31, 2023

LysandreJik added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label Dec 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add Support for masking output using a Context-Free-Grammar #26520

[WIP] Add Support for masking output using a Context-Free-Grammar #26520

jvhoffbauer commented Oct 1, 2023 •

edited

Loading

LysandreJik commented Oct 2, 2023

jvhoffbauer commented Oct 5, 2023

oobabooga commented Oct 7, 2023

LysandreJik commented Oct 9, 2023

gante commented Oct 16, 2023

jvhoffbauer commented Oct 16, 2023

gante left a comment

jvhoffbauer commented Nov 8, 2023

gante commented Nov 8, 2023

jvhoffbauer commented Nov 8, 2023

Saibo-creator commented Nov 14, 2023

gante commented Nov 15, 2023

Saibo-creator commented Nov 15, 2023 •

edited

Loading

Saibo-creator commented Nov 17, 2023

RadixSeven commented Apr 9, 2024

Saibo-creator commented Apr 9, 2024

[WIP] Add Support for masking output using a Context-Free-Grammar #26520

Are you sure you want to change the base?

[WIP] Add Support for masking output using a Context-Free-Grammar #26520

Conversation

jvhoffbauer commented Oct 1, 2023 • edited Loading

What does this PR do?

Review

Discussion

LysandreJik commented Oct 2, 2023

jvhoffbauer commented Oct 5, 2023

oobabooga commented Oct 7, 2023

LysandreJik commented Oct 9, 2023

gante commented Oct 16, 2023

jvhoffbauer commented Oct 16, 2023

gante left a comment

Choose a reason for hiding this comment

jvhoffbauer commented Nov 8, 2023

gante commented Nov 8, 2023

jvhoffbauer commented Nov 8, 2023

Saibo-creator commented Nov 14, 2023

gante commented Nov 15, 2023

Saibo-creator commented Nov 15, 2023 • edited Loading

Saibo-creator commented Nov 17, 2023

RadixSeven commented Apr 9, 2024

Saibo-creator commented Apr 9, 2024

jvhoffbauer commented Oct 1, 2023 •

edited

Loading

Saibo-creator commented Nov 15, 2023 •

edited

Loading