Chat format: Recognize specified language and offloaded lexguessing to every newline #81

SinanAkkoyun · 2023-10-01T23:18:19Z

It also looks like calling guess_lexer on every token is a little slow, so maybe it makes more sense to call it at the end of every line instead?

#71 (comment)

Sometimes the model specifies the language, use that (and/or incorporate a small prompt to ensure lang specification)

I found some time :) Now it's detecting the specified language (I like to finetune my models to always do that) and if not, it only lexguesses each newline.
The PR is nothing too big, but I thought this might help nonetheless.

Here is the prompt I found most effective for Llama2-7B-chat 4.0bpw to specify the language:
-sp "You are a helpful coding assistant. Always answer as helpfully as possible. Specify the language after starting a codeblock like: ```python\nprint('hello')\n"

#71 (comment)

So that won't really work for code snippets. Also it supports a lot of obscure formats so maybe that could be narrowed down a bit for more accurate results. I'm looking into it.

You said you wanted to improve the lexguesser, but still, if I can somehow help or if you want to spend your time on other problems, please let me know and I'll try to take care of it.

SinanAkkoyun · 2023-10-02T18:39:02Z

I just tested Mistral 7B's coding skills and noticed that the current chat code only treats ``` when it's one chunk/token. Mistral outputs it as `` and ` or so. Fix commit ahead

SinanAkkoyun · 2023-10-02T21:32:52Z

@turboderp The Mistral 7B chunking has been fixed! If you find any bug with that please let me know

Chat format: Recognize specified language and offloaded lexguessing to every newline

SinanAkkoyun added 5 commits October 2, 2023 00:41

Implemented lang detection by specified block lang

8b2f6bb

removed debug line

f7072fa

fixed line-deletion bug with lang replacement

f58000e

Offloaded lexguessing to every new line :)

c5283cb

Removed debug code

8d5e02c

Fixed Mistral 7B codeblock delim chunking (` + )

2c9b122

Merge branch 'turboderp:master' into code-chat

433c1fa

SinanAkkoyun mentioned this pull request Oct 4, 2023

Added ChatML format to chat.py #86

Closed

Merge branch 'turboderp:master' into code-chat

fe047c4

turboderp merged commit a9f3f17 into turboderp:master Oct 7, 2023

anchortense pushed a commit to anchortense/exllamav2-logit-threshold-samplers that referenced this pull request Oct 21, 2024

Merge pull request turboderp#81 from SinanAkkoyun/code-chat

fb1b800

Chat format: Recognize specified language and offloaded lexguessing to every newline

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chat format: Recognize specified language and offloaded lexguessing to every newline #81

Chat format: Recognize specified language and offloaded lexguessing to every newline #81

SinanAkkoyun commented Oct 1, 2023 •

edited

Loading

SinanAkkoyun commented Oct 2, 2023

SinanAkkoyun commented Oct 2, 2023

Chat format: Recognize specified language and offloaded lexguessing to every newline #81

Chat format: Recognize specified language and offloaded lexguessing to every newline #81

Conversation

SinanAkkoyun commented Oct 1, 2023 • edited Loading

SinanAkkoyun commented Oct 2, 2023

SinanAkkoyun commented Oct 2, 2023

SinanAkkoyun commented Oct 1, 2023 •

edited

Loading