Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

json: document schema conversion in GBNF readme, align manual grammar examples & converters #7841

Merged
merged 4 commits into from
Jun 11, 2024

Conversation

ochafik
Copy link
Collaborator

@ochafik ochafik commented Jun 10, 2024

cc/ @HanClinto @ExtReMLapin

@github-actions github-actions bot added testing Everything test related examples python python script changes server labels Jun 10, 2024
@@ -57,7 +57,7 @@ std::unordered_map<std::string, BuiltinRule> PRIMITIVE_RULES = {
{"object", {"\"{\" space ( string \":\" space value (\",\" space string \":\" space value)* )? \"}\" space", {"string", "value"}}},
{"array", {"\"[\" space ( value (\",\" space value)* )? \"]\" space", {"value"}}},
{"uuid", {"\"\\\"\" [0-9a-fA-F]{8} \"-\" [0-9a-fA-F]{4} \"-\" [0-9a-fA-F]{4} \"-\" [0-9a-fA-F]{4} \"-\" [0-9a-fA-F]{12} \"\\\"\" space", {}}},
{"char", {"[^\"\\\\] | \"\\\\\" ([\"\\\\/bfnrt] | \"u\" [0-9a-fA-F]{4})", {}}},
{"char", {"[^\"\\\\\\x7F\\x00-\\x1F] | [\\\\] ([\"\\\\bfnrt] | \"u\" [0-9a-fA-F]{4})", {}}},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these possibly be updated to use the new . operator, or is now not the time for that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, we can't do that, because we need to exclude backslashes from this list. Nevermind, carry on! :)


# Optional space: by convention, applied in this grammar after literal chars when allowed
ws ::= ([ \t\n] ws)?
ws ::= [ \t\n]{,20}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a good change -- I like constraining the output in this way. Could even consider limiting it to something more restrictive like {,4} or {,8}, but this is a good start.

Copy link
Collaborator Author

@ochafik ochafik Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've drafted an updated space rule in #7866. No matter what the bound is with this current syntax, models like Llama-3-8B & Phi-3-mini seem keen to misuse it. But given near-unlimited indent space only (and only 1 newline at a time), they're very sensible.

Copy link
Collaborator

@HanClinto HanClinto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! 👍

@ochafik ochafik marked this pull request as ready for review June 10, 2024 23:59
@ochafik ochafik changed the title [WIP] json: document schema conversion in GBNF readme, align manual grammar examples & converters json: document schema conversion in GBNF readme, align manual grammar examples & converters Jun 10, 2024
@ochafik ochafik merged commit 396b18d into ggerganov:master Jun 11, 2024
38 of 55 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples python python script changes server testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants