Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blacklist control characters in JSON string grammars #1259

Closed
jndiogo opened this issue Mar 6, 2024 · 0 comments
Closed

Blacklist control characters in JSON string grammars #1259

jndiogo opened this issue Mar 6, 2024 · 0 comments

Comments

@jndiogo
Copy link

jndiogo commented Mar 6, 2024

This llama.cpp PR fixed a random problem that I've observed in the past but never been able to reproduce:

ggerganov/llama.cpp#5888

The problem happens because the string definition in the JSON grammar doesn't exclude emitting raw \n, \r, etc. A model can thus produce a string split into multiple lines, which is not valid JSON.

File llama_cpp/llama_grammar.py needs the string grammar definition fixed in several places:

  • At the JSON_GBNF definition and others that have a string rule, like JSON_ARR_GBNF, etc.
  • At the string definition inside PRIMITIVE_RULES.

It's a single line change in the above PR.

Regards

@abetlen abetlen closed this as completed in d02a9cf Mar 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant