Segfault when using simple self-referential grammar on M2 Mac #3801

jonastemplestein · 2023-10-26T19:18:20Z

Hey folks, first of all massive thanks for this amazing project and in particular grammar support - it honestly feels like magic <3

I was inspired by this tweet that @ggerganov retweeted to try converting tree sitter grammars to GBNF form. After some hiccups I had a script that created grammar's that could be parsed by llama.cpp, but unfortunately I'm now getting segmentation faults.

I've isolated it to this simple example grammar:

root ::=  expression
expression ::= (integer | binary-operator)
integer ::=  [0-9]+
binary-operator ::=  expression  ("+"  |  "-")  expression

Here's the segfault. I've included the full arguments and shortened output of llama.cpp:

% ./main --grammar-file elixir.gbnf --model mistral-7b-instruct-v0.1.Q6_K.gguf -t 7 -b 24 -n -1 --temp 0 -ngl 1 -ins

Log start
main: build = 1428 (6961c4b)
main: built with Apple clang version 14.0.3 (clang-1403.0.22.14.1) for arm64-apple-darwin22.6.0
main: seed  = 1698347229

[... the usual loading of the model output...]

== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

zsh: segmentation fault  ./main --grammar-file  --color --model  -t 7 -b 24 -n -1 --temp 0 -ngl 1 -ins

A few other observations:

The shell hangs for a couple of seconds before the segfault, suggesting there might be an infinite loop
This doesn't happen if the grammar isn't self-referential (e.g. by replacing the last line with binary-operator ::= integer ("+" | "-") integer
This happens irrespective of which model I use and doesn't happen without using a grammar
It's not a grammar parse error. Those would happen before model loading and print specific error messages

Any help would be much appreciated. It'd be really cool if we could force llama.cpp to output any programming language that tree sitter supports. But most of them contain this sort of self-referential symbols.

The text was updated successfully, but these errors were encountered:

ggerganov · 2023-10-26T19:21:11Z

I haven't tried self-referential grammars yet - probably this is not supported atm. Pinging @ejones

jonastemplestein · 2023-10-26T20:03:20Z

In case it's interesting, I've written up a bit more about the context here: https://github.com/jonastemplestein/tree_sitter_grammar_to_gbnf

Mostly so I can pick up where I left off if/when support for such grammars is added to llama.cpp

Thanks so much for looking into this <3

ejones · 2023-10-26T21:27:35Z

Yeah the issue here is left recursion on binary-operator -> grammar -> binary-operator. The grammar implementation uses a simple top-down parser which doesn't support this. It does indeed cause an infinite loop.

In other words, there can be recursion, just not in the first term of a rule body. So this grammar works, for instance:

root ::= expression
expression ::= integer binary-operator?
integer ::= [0-9]+
binary-operator ::= ("+" | "-") expression

In this case, by moving integer out of binary-operator (reformulating it as a suffix), we avoided the left recursion.

Now, I haven't looked into tree sitter and I don't know how easy it is to produce or convert grammars in this way automatically.

jonastemplestein · 2023-10-27T10:10:30Z

Oh amazing, thank you so much. I can just insert a whitespace symbol on the left and the toy grammar works:

root ::=  expression
expression ::= (integer | binary-operator)
integer ::=  [0-9]+
binary-operator ::=  ws expression  ("+"  |  "-")  expression
ws ::= [ \n\t]+

I've prepended whitespace symbols to various definitions in my Elixir grammar and I can now get to the input prompt! Exciting!

Unfortunately llama.cpp still doesn't generate Elixir code. It just keeps giving either single word answers or printing an infinite series of newlines.

And each subsequent token that llama.cpp generates takes longer than the first one, suggesting to me that the whole output string is parsed in the grammar each time, as opposed to tagging the output as we go along and storing that state. Is that true?

In that case we are probably some way off from using formal grammars to force llama.cpp to output only valid code in high level programming languages.

In case you're interested, you can find the full elixir grammar that is causing this behaviour in this repo (elixir_no_left_recursion.gbnf) alongside some more notes

ejones · 2023-10-27T15:23:01Z

suggesting to me that the whole output string is parsed in the grammar each time,

It is not. We do indeed just store the parse state as we go along. I'd have to look into this. One thing that comes to mind is that: we store all possible parse states, which is reasonable for deterministic or mostly deterministic grammars. If the grammar is ambiguous or nondeterministic in certain ways this could cause a compounding effect; I don't know off the top of my head if this is the case here.

Another thing is, there's definitely certain pathological cases that cause slow token sampling, which I haven't had a chance to look into. But when I've seen that it's more of a sudden thing than a slow accumulation.

printing an infinite series of newlines.

This is something we've observed with whitespace in grammars and generally we have to limit the amount of whitespace the model is allowed to add.

still doesn't generate Elixir code

Note that the model isn't aware of the grammar constraints, so it will get confused if the grammar diverges greatly from what it's intending to do. Looking at your sample repo, it looks like this instruct model is trying to write explanations, which are sort of being forced into Elixir code. Maybe you need to incorporate a code block syntax (``` etc) into your grammar, or let the model talk first. In general, I've found a good approach is to see what the model does without constraints and build a grammar around that. But that may be at odds with the goal of a one-size-fits-all translation of tree sitter grammars.

Furthermore, once you commit to grammar constraints, your grammar basically needs to describe the full language. Or you need to explain to the model what subset you're targeting. Again, the model will think it can do any Elixir-y thing and if the grammar doesn't recognize that, the generation could go off the rails.

This tension between the constraints and completeness of the grammar is more what I see as a barrier to applying them to high-level languages. The latest models are also pretty good at generating code I think? Depending on the use case, it's not clear how much value is added by specifying the programming language grammar. It may make more sense to adopt an approach like, "generate a code block and I will take whatever's inside".

jonastemplestein · 2023-10-30T08:25:50Z

Thank you so much for the thoughtful response, this gives me some avenues to explore.

I can remove the whitespace from the grammar and will play around with code blocks and different prompts

If I get that working, I'll try to debug the increasing slowdown over time with some better test cases.

This tension between the constraints and completeness of the grammar is more what I see as a barrier to applying them to high-level languages. The latest models are also pretty good at generating code I think? Depending on the use case, it's not clear how much value is added by specifying the programming language grammar. It may make more sense to adopt an approach like, "generate a code block and I will take whatever's inside".

Regarding the need for such a model, I agree that for large models that have seen a lot of Elixir in training, you can just ask it to output a code block. But for very small models, in particular ones that haven't been fine-tuned on Elixir (because the base models often haven't been exposed to Elixir nearly as much as python, javascript etc)

jonastemplestein · 2023-10-30T11:58:52Z

One last thought on this - I hear mention of the idea of custom WASM samplers. Is this being worked on?

If so, the easiest way to accomplish this for many programming languages at once might actually be to build a kind of tree-sitter WASM that exposes an interface that llama.cpp can query to limit generated tokens to only grammatically legal ones

(apologies for hijacking the thread from the original infinite loop - let me know if I should drop this idea elsewhere 🙏)

github-actions · 2024-04-02T01:12:56Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

jonastemplestein added the bug Something isn't working label Oct 26, 2023

github-actions bot added the stale label Mar 19, 2024

github-actions bot closed this as completed Apr 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segfault when using simple self-referential grammar on M2 Mac #3801

Segfault when using simple self-referential grammar on M2 Mac #3801

jonastemplestein commented Oct 26, 2023

ggerganov commented Oct 26, 2023

jonastemplestein commented Oct 26, 2023

ejones commented Oct 26, 2023

jonastemplestein commented Oct 27, 2023

ejones commented Oct 27, 2023

jonastemplestein commented Oct 30, 2023

jonastemplestein commented Oct 30, 2023

github-actions bot commented Apr 2, 2024

Segfault when using simple self-referential grammar on M2 Mac #3801

Segfault when using simple self-referential grammar on M2 Mac #3801

Comments

jonastemplestein commented Oct 26, 2023

ggerganov commented Oct 26, 2023

jonastemplestein commented Oct 26, 2023

ejones commented Oct 26, 2023

jonastemplestein commented Oct 27, 2023

ejones commented Oct 27, 2023

jonastemplestein commented Oct 30, 2023

jonastemplestein commented Oct 30, 2023

github-actions bot commented Apr 2, 2024