-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add Fill-In-Middle example #2934
Conversation
id special_prefix_id = 32007; | ||
id special_middle_id = 32009; | ||
id special_suffix_id = 32008; | ||
id special_eot_id = 32010; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, does Code Llama 34B have these special tokens?
If it does not, then how would FIM work with it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does, yeah. These are new, and I think only in codellama. I don't think they're in llama2. To get the token ids themselves, the codellama people run the tokenizer, and these are the values that came out.
It should work. But I've been busy with my day job, and haven't gotten a chance to test it yet. Definitely not going to suggest merging until I'm certain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good example. Would be nice to add a README.md with instructions to run simple test. Maybe include a sample prefix and suffix files to use for input
@ggerganov Those things are also forthcoming, should be able to do all my testing and polishing over the weekend. What I'm particularly worried about as far as code review is concerned is whether I'm calling Also, I'm confused about the tokenizer. Is the tokenizer included with gguf models? Facebook's code reads three or four files, but creating a |
Looking at the code it seems OK.
Yes, the The usage seems correct, though I would double-check everything with extra logging. It's easy to make a mistake when concatenating tokenizer results. |
@ggerganov I'm getting a segfault on a null pointer from deep inside ggml. Do you have any idea what this means? It's crashing on the very first token, which is the prefix token. I've been debugging for a while, but I'm at a loss. Is it an unrelated bug with Q2 models, or is this what happens when you try to eval an invalid token? Here is the test script that I ran, the latest commit is the commit that I'm testing. #!/bin/bash
# run.sh
# Execute with `sudo ./run.sh` so that `ulimit` works and `mlock()` doesn't fail.
ulimit -l 2000000
./fill-in-middle \
models/CodeLlama-34B-GGUF/codellama-34b.Q2_K.gguf \
$'def add(a, b):\n' \
$'\n' \
40 \
1
|
I thought only 7B and 13B Code Llama have the special tokens. 34B's vocab is only 32000 so I assume no FIM support. |
@ggerganov You may be right. But, I would expect it in that case just to give garbage output, rather than segfault. When I use CodeLlama-7B-Python-GGUF/codellama-7b-python.Q2_K.gguf, the result is the same. |
Can it be added to the flake output? |
@alitariq4589 I got an email that you commented on this PR, did it somehow get deleted? Why do you need this PR merged? It's just an example. @ggerganov Sorry I let the PR go stale, going to have more time to work on it over the weekend. Could you clone this branch and tell me what you see? Still having trouble debugging the segfault, since it happens deep inside ggml. Presumably, something at a higher level is not right, but I'm not sure what. Still using the weights from CodeLlama-7B-Python-GGUF/codellama-7b-python.Q2_K.gguf, same as above. |
@apaz-cli Sorry about that message. It was generated by CI and was sent to all the PRs accidentally. |
@alitariq4589 Gotcha, no problem. It reminded me this still exists :) |
@apaz-cli No worries - this example is still on my radar, but haven't had time to come back to it. Will do so eventually |
Sounds good, thanks @ggerganov ❤️ |
We have recently introduced the There is still some work to make the tokenization correct #3503 |
Fixes #2818
I'm done implementing the FIM example, looking for code review. Currently untested.