Removes segfault when parsing grammar with missing symbols. Adds friendly error message. #5950

HanClinto · 2024-03-09T04:51:49Z

This adds a friendly error message to help debug GBNF parsing errors. When developing a new grammar recently, I didn't realize that I had some typos in my identifier names, and it caused me some amount of headache until I figured it out.

This addition will raise an exception when parsing grammars that have undefined symbols.

Example: Create "chess_bad.gbnf" and comment out the definition of nonpawn.

Then run it with your favorite model:

Before:

./main -m models/mistral-7b-v0.1.Q4_K_M.gguf --grammar-file grammars/chess_bad.gbnf -p 'The following chess match is a rousing game of blitz played at the 2002 Ohio Valley Regional Championship: '
Log start
main: build = 2366 (c2101a2e)
main: built with Apple clang version 15.0.0 (clang-1500.1.0.2.5) for arm64-apple-darwin23.2.0
<SNIP>
sampling order:
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 1


[1]    64973 segmentation fault  ./main -m models/mistral-7b-v0.1.Q4_K_M.gguf --grammar-file  -p

After this fix:

./main -m models/mistral-7b-v0.1.Q4_K_M.gguf --grammar-file grammars/chess_bad.gbnf -p 'The following chess match is a rousing game of blitz played at the 2002 Ohio Valley Regional Championship: '
Log start
main: build = 2367 (92ee6311)
main: built with Apple clang version 15.0.0 (clang-1500.1.0.2.5) for arm64-apple-darwin23.2.0
<SNIP>
sampling order:
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 1


libc++abi: terminating due to uncaught exception of type std::runtime_error: Undefined rule identifier 'nonpawn'
[1]    68828 abort      ./main -m models/mistral-7b-v0.1.Q4_K_M.gguf --grammar-file  -p

It's not much, but in my case it telling me the name of the missing identifier was invaluable, and hopefully this can be of help to others!

This also nicely removes crashing due to segfault, which may be of help to things like #4066 or #3878.

ggerganov

Should we move this validation logic into grammar_parser::parse so that it is performed during initialization?

… function.

HanClinto · 2024-03-10T03:42:36Z

Should we move this validation logic into grammar_parser::parse so that it is performed during initialization?

That's a good call. I think I moved it into this function because that's where the segfault call stack was originating from, but you're right -- I think that's the better place to put it. Just amended the PR with a new commit that moves it.

Definitely a few more lines of code to put it here (the c_rules location is already iterating through all the rules, so it was a simpler change), but it feels better to put it at the end of the parse.

With where it's at currently, the segfault is back, but it still gives a friendly error message. This is the current behavior:

system_info: n_threads = 8 / 10 | AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 |
sampling:
	repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
	top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
	mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order:
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 1


parse: error parsing grammar: Undefined rule identifier 'nonpawn'
llama_sampling_init: failed to parse grammar
[1]    21252 segmentation fault  ./main -m models/mistral-7b-v0.1.Q4_K_M.gguf --grammar-file  -p

Any other suggestions for this? I definitely think we'll need to address a more graceful handling of bad parse errors (especially if we're going to address the way that it crashes the server when a bad grammar is passed in an API call) -- happy to address that in this PR, or in another one. I haven't fully mapped that strategy out in my head yet, so that's the main reason why I didn't tackle it here.

ggerganov

Great!

Regarding error handling - now that parse() returns an empty state we can check for that and handle it more gracefully

HanClinto · 2024-03-11T21:33:57Z

I was able to confirm that this fixes #4492.

ggerganov reviewed Mar 9, 2024

View reviewed changes

Moving logic to validate GBNF rule references to the end of the parse…

d39dc65

… function.

HanClinto force-pushed the grammar_undefined_symbol_fix branch from 92ee631 to d39dc65 Compare March 10, 2024 03:39

ggerganov approved these changes Mar 10, 2024

View reviewed changes

ggerganov merged commit 2960eae into ggerganov:master Mar 10, 2024
60 checks passed

HanClinto mentioned this pull request Mar 11, 2024

Invalid grammar crashes the entire server instead of returning Null or ignoring it #4492

Closed

NeoZhangJianyu pushed a commit to NeoZhangJianyu/llama.cpp that referenced this pull request Mar 12, 2024

grammar : verify parsed state (ggerganov#5950)

58ec963

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024

grammar : verify parsed state (ggerganov#5950)

fbdcb67

HanClinto mentioned this pull request Mar 19, 2024

GBNF Validator Program #5948

Merged

hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024

grammar : verify parsed state (ggerganov#5950)

fb0a1a6

HanClinto mentioned this pull request Jun 6, 2024

Llama.cpp server doesn't return grammar error messages when in streaming mode #7391

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Removes segfault when parsing grammar with missing symbols. Adds friendly error message. #5950

Removes segfault when parsing grammar with missing symbols. Adds friendly error message. #5950

HanClinto commented Mar 9, 2024

ggerganov left a comment

HanClinto commented Mar 10, 2024

ggerganov left a comment

HanClinto commented Mar 11, 2024

Removes segfault when parsing grammar with missing symbols. Adds friendly error message. #5950

Removes segfault when parsing grammar with missing symbols. Adds friendly error message. #5950

Conversation

HanClinto commented Mar 9, 2024

ggerganov left a comment

Choose a reason for hiding this comment

HanClinto commented Mar 10, 2024

ggerganov left a comment

Choose a reason for hiding this comment

HanClinto commented Mar 11, 2024