Added support for . (any character) token in grammar engine. #6467

HanClinto · 2024-04-04T02:51:53Z

Low priority feature. Consider this more of a suggestion than a request. :)

As came up in discussion of #6441, I wanted a way to create a grammar that ensured a minimum response length. It seemed prudent to add support for a "." character that would match on any generated token -- without it, I used the string [^\x00], which matches on any non-null character (which is very nearly the same thing).

I don't know if this character be useful in too many other situations or not, so feel free to leave this one out if you don't think it's worthy.

I didn't add this token to the grammar tests, because frankly I haven't really been able to wrap my head around them. I would still like to eventually get around to writing some end-to-end / integration tests for the grammar engine that are a bit easier to grok and extend, but unless otherwise requested, I'll leave that exercise for another PR.

Example usage:

./main -m ./models/llama-2-7b.Q4_0.gguf -e -r "\n" --grammar "root ::= [^\n].+" -p "My favorite flavor is "

Example output:

 My favorite flavor is 🍌. surely you know that I love to eat meat, I'm a real carnivore, I like to eat pork, chicken and beef. I can not imagine my life without meat, I also like to eat seafood.

github-actions · 2024-04-04T03:06:14Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3: 489 iterations 🚀

Concurrent users: 8, duration: 10m
HTTP request : avg=9604.19ms p(90)=26264.61ms fails=0, finish reason: stop=489 truncated=0
Prompt processing (pp): avg=245.14tk/s p(90)=742.6tk/s total=195.52tk/s
Token generation (tg): avg=101.83tk/s p(90)=292.3tk/s total=130.03tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=feature_grammar_char_any commit=9a3acbba9afa314a57acd546943fe91565a65d19

Time series

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 489 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1712213110 --> 1712213738
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 374.08, 374.08, 374.08, 374.08, 374.08, 482.72, 482.72, 482.72, 482.72, 482.72, 517.75, 517.75, 517.75, 517.75, 517.75, 560.12, 560.12, 560.12, 560.12, 560.12, 575.93, 575.93, 575.93, 575.93, 575.93, 580.77, 580.77, 580.77, 580.77, 580.77, 586.04, 586.04, 586.04, 586.04, 586.04, 595.78, 595.78, 595.78, 595.78, 595.78, 597.48, 597.48, 597.48, 597.48, 597.48, 608.13, 608.13, 608.13, 608.13, 608.13, 608.83, 608.83, 608.83, 608.83, 608.83, 623.06, 623.06, 623.06, 623.06, 623.06, 639.61, 639.61, 639.61, 639.61, 639.61, 658.27, 658.27, 658.27, 658.27, 658.27, 688.53, 688.53, 688.53, 688.53, 688.53, 653.29, 653.29, 653.29, 653.29, 653.29, 657.92, 657.92, 657.92, 657.92, 657.92, 657.34, 657.34, 657.34, 657.34, 657.34, 670.15, 670.15, 670.15, 670.15, 670.15, 673.02, 673.02, 673.02, 673.02, 673.02, 673.98, 673.98, 673.98, 673.98, 673.98, 673.06, 673.06, 673.06, 673.06, 673.06, 677.99, 677.99, 677.99, 677.99, 677.99, 681.33, 681.33, 681.33, 681.33, 681.33, 698.91, 698.91, 698.91, 698.91, 698.91, 698.29, 698.29, 698.29, 698.29, 698.29, 700.09, 700.09, 700.09, 700.09, 700.09, 701.44, 701.44, 701.44, 701.44, 701.44, 710.78, 710.78, 710.78, 710.78, 710.78, 708.49, 708.49, 708.49, 708.49, 708.49, 701.23, 701.23, 701.23, 701.23, 701.23, 698.32, 698.32, 698.32, 698.32, 698.32, 698.71, 698.71, 698.71, 698.71, 698.71, 698.51, 698.51, 698.51, 698.51, 698.51, 696.25, 696.25, 696.25, 696.25, 696.25, 699.24, 699.24, 699.24, 699.24, 699.24, 709.71, 709.71, 709.71, 709.71, 709.71, 714.14, 714.14, 714.14, 714.14, 714.14, 714.36, 714.36, 714.36, 714.36, 714.36, 719.03, 719.03, 719.03, 719.03, 719.03, 717.46, 717.46, 717.46, 717.46, 717.46, 717.29, 717.29, 717.29, 717.29, 717.29, 718.29, 718.29, 718.29, 718.29, 718.29, 716.27, 716.27, 716.27, 716.27, 716.27, 710.81, 710.81, 710.81, 710.81, 710.81, 698.63, 698.63, 698.63, 698.63, 698.63, 697.89, 697.89, 697.89, 697.89, 697.89, 697.41, 697.41, 697.41, 697.41, 697.41, 694.88, 694.88, 694.88, 694.88, 694.88, 691.54, 691.54, 691.54, 691.54, 691.54, 695.51, 695.51, 695.51, 695.51, 695.51, 698.07, 698.07, 698.07, 698.07, 698.07, 698.0, 698.0, 698.0, 698.0, 698.0, 697.73, 697.73, 697.73, 697.73, 697.73, 701.91, 701.91, 701.91, 701.91, 701.91, 704.73, 704.73, 704.73, 704.73, 704.73, 705.11, 705.11, 705.11, 705.11, 705.11, 706.42, 706.42, 706.42, 706.42, 706.42, 705.59, 705.59, 705.59, 705.59, 705.59, 708.55, 708.55, 708.55, 708.55, 708.55, 708.54, 708.54, 708.54, 708.54, 708.54]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 489 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1712213110 --> 1712213738
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 30.1, 30.1, 30.1, 30.1, 30.1, 25.89, 25.89, 25.89, 25.89, 25.89, 17.89, 17.89, 17.89, 17.89, 17.89, 18.98, 18.98, 18.98, 18.98, 18.98, 19.74, 19.74, 19.74, 19.74, 19.74, 20.24, 20.24, 20.24, 20.24, 20.24, 20.75, 20.75, 20.75, 20.75, 20.75, 20.89, 20.89, 20.89, 20.89, 20.89, 20.9, 20.9, 20.9, 20.9, 20.9, 20.85, 20.85, 20.85, 20.85, 20.85, 20.69, 20.69, 20.69, 20.69, 20.69, 20.54, 20.54, 20.54, 20.54, 20.54, 20.17, 20.17, 20.17, 20.17, 20.17, 19.81, 19.81, 19.81, 19.81, 19.81, 19.47, 19.47, 19.47, 19.47, 19.47, 18.7, 18.7, 18.7, 18.7, 18.7, 18.67, 18.67, 18.67, 18.67, 18.67, 18.8, 18.8, 18.8, 18.8, 18.8, 19.05, 19.05, 19.05, 19.05, 19.05, 18.89, 18.89, 18.89, 18.89, 18.89, 18.82, 18.82, 18.82, 18.82, 18.82, 18.73, 18.73, 18.73, 18.73, 18.73, 18.52, 18.52, 18.52, 18.52, 18.52, 18.52, 18.52, 18.52, 18.52, 18.52, 18.57, 18.57, 18.57, 18.57, 18.57, 18.57, 18.57, 18.57, 18.57, 18.57, 18.63, 18.63, 18.63, 18.63, 18.63, 18.74, 18.74, 18.74, 18.74, 18.74, 18.74, 18.74, 18.74, 18.74, 18.74, 18.66, 18.66, 18.66, 18.66, 18.66, 18.53, 18.53, 18.53, 18.53, 18.53, 18.37, 18.37, 18.37, 18.37, 18.37, 18.39, 18.39, 18.39, 18.39, 18.39, 18.47, 18.47, 18.47, 18.47, 18.47, 18.52, 18.52, 18.52, 18.52, 18.52, 18.65, 18.65, 18.65, 18.65, 18.65, 18.68, 18.68, 18.68, 18.68, 18.68, 18.63, 18.63, 18.63, 18.63, 18.63, 18.65, 18.65, 18.65, 18.65, 18.65, 18.53, 18.53, 18.53, 18.53, 18.53, 18.48, 18.48, 18.48, 18.48, 18.48, 18.51, 18.51, 18.51, 18.51, 18.51, 18.56, 18.56, 18.56, 18.56, 18.56, 18.59, 18.59, 18.59, 18.59, 18.59, 18.52, 18.52, 18.52, 18.52, 18.52, 18.42, 18.42, 18.42, 18.42, 18.42, 18.41, 18.41, 18.41, 18.41, 18.41, 18.4, 18.4, 18.4, 18.4, 18.4, 18.12, 18.12, 18.12, 18.12, 18.12, 17.82, 17.82, 17.82, 17.82, 17.82, 17.55, 17.55, 17.55, 17.55, 17.55, 17.53, 17.53, 17.53, 17.53, 17.53, 17.56, 17.56, 17.56, 17.56, 17.56, 17.64, 17.64, 17.64, 17.64, 17.64, 17.65, 17.65, 17.65, 17.65, 17.65, 17.7, 17.7, 17.7, 17.7, 17.7, 17.73, 17.73, 17.73, 17.73, 17.73, 17.71, 17.71, 17.71, 17.71, 17.71, 17.71, 17.71, 17.71, 17.71, 17.71, 17.67, 17.67, 17.67, 17.67, 17.67, 17.65, 17.65, 17.65, 17.65, 17.65]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 489 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1712213110 --> 1712213738
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.08, 0.08, 0.08, 0.08, 0.08, 0.28, 0.28, 0.28, 0.28, 0.28, 0.09, 0.09, 0.09, 0.09, 0.09, 0.14, 0.14, 0.14, 0.14, 0.14, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.17, 0.17, 0.17, 0.17, 0.17, 0.13, 0.13, 0.13, 0.13, 0.13, 0.17, 0.17, 0.17, 0.17, 0.17, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.21, 0.21, 0.21, 0.21, 0.21, 0.16, 0.16, 0.16, 0.16, 0.16, 0.14, 0.14, 0.14, 0.14, 0.14, 0.31, 0.31, 0.31, 0.31, 0.31, 0.12, 0.12, 0.12, 0.12, 0.12, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.27, 0.27, 0.27, 0.27, 0.27, 0.3, 0.3, 0.3, 0.3, 0.3, 0.12, 0.12, 0.12, 0.12, 0.12, 0.15, 0.15, 0.15, 0.15, 0.15, 0.2, 0.2, 0.2, 0.2, 0.2, 0.16, 0.16, 0.16, 0.16, 0.16, 0.14, 0.14, 0.14, 0.14, 0.14, 0.12, 0.12, 0.12, 0.12, 0.12, 0.32, 0.32, 0.32, 0.32, 0.32, 0.32, 0.32, 0.32, 0.32, 0.32, 0.34, 0.34, 0.34, 0.34, 0.34, 0.11, 0.11, 0.11, 0.11, 0.11, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.13, 0.13, 0.13, 0.13, 0.13, 0.15, 0.15, 0.15, 0.15, 0.15, 0.18, 0.18, 0.18, 0.18, 0.18, 0.15, 0.15, 0.15, 0.15, 0.15, 0.2, 0.2, 0.2, 0.2, 0.2, 0.19, 0.19, 0.19, 0.19, 0.19, 0.15, 0.15, 0.15, 0.15, 0.15, 0.18, 0.18, 0.18, 0.18, 0.18, 0.12, 0.12, 0.12, 0.12, 0.12, 0.16, 0.16, 0.16, 0.16, 0.16, 0.38, 0.38, 0.38, 0.38, 0.38, 0.42, 0.42, 0.42, 0.42, 0.42, 0.51, 0.51, 0.51, 0.51, 0.51, 0.53, 0.53, 0.53, 0.53, 0.53, 0.53, 0.53, 0.53, 0.53, 0.53, 0.42, 0.42, 0.42, 0.42, 0.42, 0.12, 0.12, 0.12, 0.12, 0.12, 0.16, 0.16, 0.16, 0.16, 0.16, 0.17, 0.17, 0.17, 0.17, 0.17, 0.11, 0.11, 0.11, 0.11, 0.11, 0.12, 0.12, 0.12, 0.12, 0.12, 0.18, 0.18, 0.18, 0.18, 0.18, 0.22, 0.22, 0.22, 0.22, 0.22, 0.26, 0.26, 0.26, 0.26, 0.26, 0.28, 0.28, 0.28, 0.28, 0.28, 0.25, 0.25, 0.25, 0.25, 0.25, 0.16, 0.16, 0.16, 0.16, 0.16]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 489 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1712213110 --> 1712213738
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0]

ggerganov · 2024-04-04T13:54:20Z

I didn't add this token to the grammar tests, because frankly I haven't really been able to wrap my head around them. I would still like to eventually get around to writing some end-to-end / integration tests for the grammar engine that are a bit easier to grok and extend, but unless otherwise requested, I'll leave that exercise for another PR.

Matching any character seems it could be a useful addition, but I agree it would be better to first focus on improving grammar tests (and potentially performance). We can revisit this addition at a bit later point

HanClinto · 2024-04-04T15:01:25Z

but I agree it would be better to first focus on improving grammar tests (and potentially performance). We can revisit this addition at a bit later point

Sounds great! That's where I'll turn my attention next -- I've done some profiling, and have some ideas in the works for how to improve the grammar sampler. Next step will be to add profiling to the grammar engine (I need to investigate the current state of benchmarks that include grammars). Meanwhile, this PR can stay here and we can return to it whenever we feel like it.

Thank you!

ggerganov · 2024-04-04T15:46:30Z

Awesome! The grammar functionality is a great feature and would be nice to get some extra attention. I sent you a collaborator invite, if you feel like helping out (no pressure if you don't have time / resources, this is mainly a token of appreciation at this point)

HanClinto · 2024-04-04T16:49:37Z

Awesome! The grammar functionality is a great feature and would be nice to get some extra attention. I sent you a collaborator invite, if you feel like helping out (no pressure if you don't have time / resources, this is mainly a token of appreciation at this point)

Wow, I am honored -- thank you very much!! I will do my best to not abuse the privilege.

I agree about the grammar functionality being very powerful. I think that verifiable correctness is going to be one of the big ways that local LLMs can really gain some usefulness -- I started doing some experiments a couple of months ago with text-to-SQL generation and I think that using grammars to ensure syntactically-correct SQL queries (even tuned for a person's specific database schema) offer a lot of potential to expand usefulness of LLMs. That's what started me down this whole road of digging into grammars on llama.cpp, and I'm excited to see what the future holds for it.

Thanks for everything!

ggerganov

@HanClinto Feel free to merge this if it is ready

HanClinto · 2024-05-10T16:06:12Z

@HanClinto Feel free to merge this if it is ready

Thank you! I'm much more familiar with the grammar engine now than I was when I first wrote this, so I'd like to try to look it all over again with fresh eyes.

Overall I'm feeling much better about making changes like this to the grammar engine now that we have integration test coverage.

@ochafik not sure what your availability is these days, but wouldn't mind your critique at some point as well.

common/grammar-parser.cpp

HanClinto · 2024-05-10T17:35:30Z

TODO before merge:

Add "." symbol to integration tests

github-actions · 2024-05-11T00:40:37Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 545 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=8604.06ms p(95)=21214.22ms fails=, finish reason: stop=488 truncated=57
Prompt processing (pp): avg=99.82tk/s p(95)=411.45tk/s
Token generation (tg): avg=32.22tk/s p(95)=46.44tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=feature_grammar_char_any commit=c1b89b83815248a986b9ec906a8d30dfc013b3e6

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 545 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1717680788 --> 1717681422
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 576.98, 576.98, 576.98, 576.98, 576.98, 839.89, 839.89, 839.89, 839.89, 839.89, 824.87, 824.87, 824.87, 824.87, 824.87, 873.16, 873.16, 873.16, 873.16, 873.16, 928.15, 928.15, 928.15, 928.15, 928.15, 920.62, 920.62, 920.62, 920.62, 920.62, 929.21, 929.21, 929.21, 929.21, 929.21, 939.5, 939.5, 939.5, 939.5, 939.5, 932.63, 932.63, 932.63, 932.63, 932.63, 943.06, 943.06, 943.06, 943.06, 943.06, 957.1, 957.1, 957.1, 957.1, 957.1, 934.95, 934.95, 934.95, 934.95, 934.95, 939.12, 939.12, 939.12, 939.12, 939.12, 858.97, 858.97, 858.97, 858.97, 858.97, 842.92, 842.92, 842.92, 842.92, 842.92, 846.62, 846.62, 846.62, 846.62, 846.62, 832.45, 832.45, 832.45, 832.45, 832.45, 829.19, 829.19, 829.19, 829.19, 829.19, 820.97, 820.97, 820.97, 820.97, 820.97, 822.99, 822.99, 822.99, 822.99, 822.99, 832.46, 832.46, 832.46, 832.46, 832.46, 832.66, 832.66, 832.66, 832.66, 832.66, 836.47, 836.47, 836.47, 836.47, 836.47, 850.72, 850.72, 850.72, 850.72, 850.72, 852.12, 852.12, 852.12, 852.12, 852.12, 853.31, 853.31, 853.31, 853.31, 853.31, 845.53, 845.53, 845.53, 845.53, 845.53, 844.19, 844.19, 844.19, 844.19, 844.19, 843.93, 843.93, 843.93, 843.93, 843.93, 848.22, 848.22, 848.22, 848.22, 848.22, 849.66, 849.66, 849.66, 849.66, 849.66, 847.52, 847.52, 847.52, 847.52, 847.52, 851.63, 851.63, 851.63, 851.63, 851.63, 863.86, 863.86, 863.86, 863.86, 863.86, 865.42, 865.42, 865.42, 865.42, 865.42, 867.95, 867.95, 867.95, 867.95, 867.95, 866.54, 866.54, 866.54, 866.54, 866.54, 865.89, 865.89, 865.89, 865.89, 865.89, 868.43, 868.43, 868.43, 868.43, 868.43, 870.28, 870.28, 870.28, 870.28, 870.28, 878.9, 878.9, 878.9, 878.9, 878.9, 884.83, 884.83, 884.83, 884.83, 884.83, 883.69, 883.69, 883.69, 883.69, 883.69, 882.52, 882.52, 882.52, 882.52, 882.52, 880.15, 880.15, 880.15, 880.15, 880.15, 883.33, 883.33, 883.33, 883.33, 883.33, 885.36, 885.36, 885.36, 885.36, 885.36, 884.23, 884.23, 884.23, 884.23, 884.23, 886.85, 886.85, 886.85, 886.85, 886.85, 888.83, 888.83, 888.83, 888.83, 888.83, 890.7, 890.7, 890.7, 890.7, 890.7, 891.68, 891.68, 891.68, 891.68, 891.68, 891.51, 891.51, 891.51, 891.51, 891.51, 892.79, 892.79, 892.79, 892.79, 892.79, 894.42, 894.42, 894.42, 894.42, 894.42, 895.71, 895.71, 895.71, 895.71, 895.71, 895.43, 895.43, 895.43, 895.43, 895.43, 896.4, 896.4, 896.4, 896.4, 896.4, 898.03, 898.03, 898.03, 898.03, 898.03, 899.61, 899.61, 899.61, 899.61, 899.61, 899.61, 899.61, 899.61]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 545 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1717680788 --> 1717681422
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 38.61, 38.61, 38.61, 38.61, 38.61, 30.22, 30.22, 30.22, 30.22, 30.22, 29.71, 29.71, 29.71, 29.71, 29.71, 32.5, 32.5, 32.5, 32.5, 32.5, 33.28, 33.28, 33.28, 33.28, 33.28, 34.1, 34.1, 34.1, 34.1, 34.1, 34.5, 34.5, 34.5, 34.5, 34.5, 34.57, 34.57, 34.57, 34.57, 34.57, 34.65, 34.65, 34.65, 34.65, 34.65, 34.29, 34.29, 34.29, 34.29, 34.29, 34.55, 34.55, 34.55, 34.55, 34.55, 34.28, 34.28, 34.28, 34.28, 34.28, 32.97, 32.97, 32.97, 32.97, 32.97, 32.74, 32.74, 32.74, 32.74, 32.74, 32.37, 32.37, 32.37, 32.37, 32.37, 31.74, 31.74, 31.74, 31.74, 31.74, 30.3, 30.3, 30.3, 30.3, 30.3, 30.31, 30.31, 30.31, 30.31, 30.31, 30.52, 30.52, 30.52, 30.52, 30.52, 30.23, 30.23, 30.23, 30.23, 30.23, 30.35, 30.35, 30.35, 30.35, 30.35, 30.38, 30.38, 30.38, 30.38, 30.38, 30.51, 30.51, 30.51, 30.51, 30.51, 30.59, 30.59, 30.59, 30.59, 30.59, 30.54, 30.54, 30.54, 30.54, 30.54, 30.69, 30.69, 30.69, 30.69, 30.69, 30.79, 30.79, 30.79, 30.79, 30.79, 30.66, 30.66, 30.66, 30.66, 30.66, 30.88, 30.88, 30.88, 30.88, 30.88, 31.1, 31.1, 31.1, 31.1, 31.1, 31.09, 31.09, 31.09, 31.09, 31.09, 31.31, 31.31, 31.31, 31.31, 31.31, 31.49, 31.49, 31.49, 31.49, 31.49, 31.3, 31.3, 31.3, 31.3, 31.3, 31.2, 31.2, 31.2, 31.2, 31.2, 31.14, 31.14, 31.14, 31.14, 31.14, 30.59, 30.59, 30.59, 30.59, 30.59, 30.65, 30.65, 30.65, 30.65, 30.65, 30.82, 30.82, 30.82, 30.82, 30.82, 30.85, 30.85, 30.85, 30.85, 30.85, 31.02, 31.02, 31.02, 31.02, 31.02, 30.99, 30.99, 30.99, 30.99, 30.99, 30.85, 30.85, 30.85, 30.85, 30.85, 30.53, 30.53, 30.53, 30.53, 30.53, 29.36, 29.36, 29.36, 29.36, 29.36, 29.04, 29.04, 29.04, 29.04, 29.04, 29.05, 29.05, 29.05, 29.05, 29.05, 29.08, 29.08, 29.08, 29.08, 29.08, 29.15, 29.15, 29.15, 29.15, 29.15, 29.22, 29.22, 29.22, 29.22, 29.22, 29.26, 29.26, 29.26, 29.26, 29.26, 29.28, 29.28, 29.28, 29.28, 29.28, 29.18, 29.18, 29.18, 29.18, 29.18, 29.12, 29.12, 29.12, 29.12, 29.12, 29.08, 29.08, 29.08, 29.08, 29.08, 29.2, 29.2, 29.2, 29.2, 29.2, 29.3, 29.3, 29.3, 29.3, 29.3, 29.51, 29.51, 29.51, 29.51, 29.51, 29.54, 29.54, 29.54, 29.54, 29.54, 29.6, 29.6, 29.6, 29.6, 29.6, 29.58, 29.58, 29.58]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 545 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1717680788 --> 1717681422
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.18, 0.18, 0.18, 0.18, 0.18, 0.3, 0.3, 0.3, 0.3, 0.3, 0.13, 0.13, 0.13, 0.13, 0.13, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.17, 0.17, 0.17, 0.17, 0.17, 0.12, 0.12, 0.12, 0.12, 0.12, 0.2, 0.2, 0.2, 0.2, 0.2, 0.38, 0.38, 0.38, 0.38, 0.38, 0.14, 0.14, 0.14, 0.14, 0.14, 0.33, 0.33, 0.33, 0.33, 0.33, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.19, 0.19, 0.19, 0.19, 0.19, 0.18, 0.18, 0.18, 0.18, 0.18, 0.31, 0.31, 0.31, 0.31, 0.31, 0.1, 0.1, 0.1, 0.1, 0.1, 0.19, 0.19, 0.19, 0.19, 0.19, 0.21, 0.21, 0.21, 0.21, 0.21, 0.14, 0.14, 0.14, 0.14, 0.14, 0.3, 0.3, 0.3, 0.3, 0.3, 0.11, 0.11, 0.11, 0.11, 0.11, 0.17, 0.17, 0.17, 0.17, 0.17, 0.3, 0.3, 0.3, 0.3, 0.3, 0.08, 0.08, 0.08, 0.08, 0.08, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.15, 0.15, 0.15, 0.15, 0.15, 0.13, 0.13, 0.13, 0.13, 0.13, 0.11, 0.11, 0.11, 0.11, 0.11, 0.3, 0.3, 0.3, 0.3, 0.3, 0.26, 0.26, 0.26, 0.26, 0.26, 0.37, 0.37, 0.37, 0.37, 0.37, 0.24, 0.24, 0.24, 0.24, 0.24, 0.15, 0.15, 0.15, 0.15, 0.15, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.26, 0.26, 0.26, 0.26, 0.26, 0.47, 0.47, 0.47, 0.47, 0.47, 0.59, 0.59, 0.59, 0.59, 0.59, 0.61, 0.61, 0.61, 0.61, 0.61, 0.44, 0.44, 0.44, 0.44, 0.44, 0.21, 0.21, 0.21, 0.21, 0.21, 0.16, 0.16, 0.16, 0.16, 0.16, 0.21, 0.21, 0.21, 0.21, 0.21, 0.15, 0.15, 0.15, 0.15, 0.15, 0.17, 0.17, 0.17, 0.17, 0.17, 0.29, 0.29, 0.29, 0.29, 0.29, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.19, 0.19, 0.19, 0.19, 0.19, 0.1, 0.1, 0.1, 0.1, 0.1, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.13, 0.13, 0.13, 0.13, 0.13, 0.27, 0.27, 0.27, 0.27, 0.27, 0.33, 0.33, 0.33]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 545 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1717680788 --> 1717681422
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 1.0, 1.0, 1.0, 1.0, 1.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0]

mofosyne · 2024-05-22T07:22:08Z

Noting that it appears there is a general agreement to merge this PR, but just waiting on someone to add "." symbol to integration tests. (Monitoring via this filter )

HanClinto · 2024-06-05T23:00:04Z

Rebased on master, integration tests added in 9e30513 -- ready for final review and merge!

ochafik

Looks good!

tests/test-grammar-integration.cpp

HanClinto force-pushed the feature_grammar_char_any branch from da3dc77 to 9a3acbb Compare April 4, 2024 06:40

HanClinto changed the title ~~Added support for . (any characer) token in grammar engine.~~ Added support for . (any character) token in grammar engine. Apr 4, 2024

ochafik mentioned this pull request Apr 12, 2024

grammars: x{min,max} repetition operator #6640

Merged

5 tasks

mofosyne added Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix enhancement New feature or request labels May 10, 2024

ggerganov approved these changes May 10, 2024

View reviewed changes

HanClinto commented May 10, 2024

View reviewed changes

common/grammar-parser.cpp Show resolved Hide resolved

mofosyne requested a review from ochafik May 11, 2024 01:57

mofosyne added the help wanted Extra attention is needed label May 22, 2024

HanClinto force-pushed the feature_grammar_char_any branch from e56761d to 774e9f5 Compare June 5, 2024 22:43

github-actions bot added the testing Everything test related label Jun 5, 2024

HanClinto mentioned this pull request Jun 6, 2024

Llama.cpp server doesn't return grammar error messages when in streaming mode #7391

Closed

ochafik approved these changes Jun 6, 2024

View reviewed changes

tests/test-grammar-integration.cpp Show resolved Hide resolved

HanClinto added 2 commits June 6, 2024 06:01

Added support for . (any characer) token in grammar engine.

d0c0083

Add integration tests for any-character symbol.

c1b89b8

HanClinto force-pushed the feature_grammar_char_any branch from 9e30513 to c1b89b8 Compare June 6, 2024 13:02

HanClinto merged commit ad675e1 into ggerganov:master Jun 6, 2024
61 of 68 checks passed

HanClinto mentioned this pull request Jun 10, 2024

json: document schema conversion in GBNF readme, align manual grammar examples & converters #7841

Merged

ExtReMLapin mentioned this pull request Jul 29, 2024

Ported back new grammar changes from C++ to Python implementation abetlen/llama-cpp-python#1637

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added support for . (any character) token in grammar engine. #6467

Added support for . (any character) token in grammar engine. #6467

HanClinto commented Apr 4, 2024 •

edited

Loading

github-actions bot commented Apr 4, 2024 •

edited

Loading

ggerganov commented Apr 4, 2024

HanClinto commented Apr 4, 2024 •

edited

Loading

ggerganov commented Apr 4, 2024

HanClinto commented Apr 4, 2024

ggerganov left a comment

HanClinto commented May 10, 2024

HanClinto commented May 10, 2024 •

edited

Loading

github-actions bot commented May 11, 2024 •

edited

Loading

mofosyne commented May 22, 2024 •

edited

Loading

HanClinto commented Jun 5, 2024

ochafik left a comment

Added support for . (any character) token in grammar engine. #6467

Added support for . (any character) token in grammar engine. #6467

Conversation

HanClinto commented Apr 4, 2024 • edited Loading

github-actions bot commented Apr 4, 2024 • edited Loading

ggerganov commented Apr 4, 2024

HanClinto commented Apr 4, 2024 • edited Loading

ggerganov commented Apr 4, 2024

HanClinto commented Apr 4, 2024

ggerganov left a comment

Choose a reason for hiding this comment

HanClinto commented May 10, 2024

HanClinto commented May 10, 2024 • edited Loading

github-actions bot commented May 11, 2024 • edited Loading

mofosyne commented May 22, 2024 • edited Loading

HanClinto commented Jun 5, 2024

ochafik left a comment

Choose a reason for hiding this comment

HanClinto commented Apr 4, 2024 •

edited

Loading

github-actions bot commented Apr 4, 2024 •

edited

Loading

HanClinto commented Apr 4, 2024 •

edited

Loading

HanClinto commented May 10, 2024 •

edited

Loading

github-actions bot commented May 11, 2024 •

edited

Loading

mofosyne commented May 22, 2024 •

edited

Loading