Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add Fill-In-Middle example #2934

Closed
wants to merge 11 commits into from
Closed

Conversation

apaz-cli
Copy link
Contributor

Fixes #2818

I'm done implementing the FIM example, looking for code review. Currently untested.

id special_prefix_id = 32007;
id special_middle_id = 32009;
id special_suffix_id = 32008;
id special_eot_id = 32010;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, does Code Llama 34B have these special tokens?
If it does not, then how would FIM work with it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, yeah. These are new, and I think only in codellama. I don't think they're in llama2. To get the token ids themselves, the codellama people run the tokenizer, and these are the values that came out.

https://github.com/facebookresearch/codellama/blob/cb51c14ec761370ba2e2bc351374a79265d0465e/llama/tokenizer.py#L28-L31

It should work. But I've been busy with my day job, and haven't gotten a chance to test it yet. Definitely not going to suggest merging until I'm certain.

Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good example. Would be nice to add a README.md with instructions to run simple test. Maybe include a sample prefix and suffix files to use for input

@apaz-cli
Copy link
Contributor Author

apaz-cli commented Sep 1, 2023

@ggerganov Those things are also forthcoming, should be able to do all my testing and polishing over the weekend.

What I'm particularly worried about as far as code review is concerned is whether I'm calling llama_eval() right, and sampling right.

Also, I'm confused about the tokenizer. Is the tokenizer included with gguf models? Facebook's code reads three or four files, but creating a struct llama_context* seems to only require one.

@ggerganov
Copy link
Owner

What I'm particularly worried about as far as code review is concerned is whether I'm calling llama_eval() right, and sampling right.

Looking at the code it seems OK.

Also, I'm confused about the tokenizer. Is the tokenizer included with gguf models? Facebook's code reads three or four files, but creating a struct llama_context* seems to only require one.

Yes, the llama.cpp lib provides built-in tokenization functionality. The vocab of the model is embedded inside the .gguf model file and is automatically loaded.

The usage seems correct, though I would double-check everything with extra logging. It's easy to make a mistake when concatenating tokenizer results.

@ggerganov ggerganov added the need feedback Testing and feedback with results are needed label Sep 4, 2023
@apaz-cli
Copy link
Contributor Author

apaz-cli commented Sep 5, 2023

@ggerganov I'm getting a segfault on a null pointer from deep inside ggml. Do you have any idea what this means? It's crashing on the very first token, which is the prefix token. I've been debugging for a while, but I'm at a loss. Is it an unrelated bug with Q2 models, or is this what happens when you try to eval an invalid token?

Here is the test script that I ran, the latest commit is the commit that I'm testing.

#!/bin/bash

# run.sh
# Execute with `sudo ./run.sh` so that `ulimit` works and `mlock()` doesn't fail.

ulimit -l 2000000
./fill-in-middle \
    models/CodeLlama-34B-GGUF/codellama-34b.Q2_K.gguf \
    $'def add(a, b):\n' \
    $'\n' \
    40 \
    1
=================================================================
==99666==ERROR: AddressSanitizer: SEGV on unknown address 0x7c50a91cb2d0 (pc 0x564cd24ad03c bp 0x000000000000 sp 0x7fffd3d85aa0 T0)
==99666==The signal is caused by a READ memory access.
    #0 0x564cd24ad03c in dequantize_row_q2_K /home/apaz/git/llama.cpp/k_quants.c:418
    #1 0x564cd2319765 in ggml_compute_forward_get_rows_q /home/apaz/git/llama.cpp/ggml.c:11796
    #2 0x564cd2319765 in ggml_compute_forward_get_rows /home/apaz/git/llama.cpp/ggml.c:11875
    #3 0x564cd234087a in ggml_compute_forward /home/apaz/git/llama.cpp/ggml.c:15805
    #4 0x564cd23444d4 in ggml_graph_compute_thread /home/apaz/git/llama.cpp/ggml.c:17241
    #5 0x564cd236d3e1 in ggml_graph_compute /home/apaz/git/llama.cpp/ggml.c:17751
    #6 0x564cd23a2ee0 in ggml_graph_compute_helper /home/apaz/git/llama.cpp/llama.cpp:402
    #7 0x564cd23a9408 in llama_eval_internal /home/apaz/git/llama.cpp/llama.cpp:2935
    #8 0x564cd23a9ba9 in llama_eval /home/apaz/git/llama.cpp/llama.cpp:6060
    #9 0x564cd22f2040 in codellama_fill_in_middle examples/fill-in-middle/FIM.c:92
    #10 0x564cd22f2040 in main examples/fill-in-middle/FIM.c:181
    #11 0x7f01284456c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #12 0x7f0128445784 in __libc_start_main_impl ../csu/libc-start.c:360
    #13 0x564cd22f4f90 in _start (/home/apaz/git/llama.cpp/fill-in-middle+0x32f90) (BuildId: 1d0a09041fd8fc40f04d4c50dbd084bfffe605f9)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /home/apaz/git/llama.cpp/k_quants.c:418 in dequantize_row_q2_K
==99666==ABORTING

@ggerganov
Copy link
Owner

I thought only 7B and 13B Code Llama have the special tokens. 34B's vocab is only 32000 so I assume no FIM support.
I'll take a detailed look and run some tests a bit later

@apaz-cli
Copy link
Contributor Author

apaz-cli commented Sep 5, 2023

@ggerganov You may be right. But, I would expect it in that case just to give garbage output, rather than segfault. When I use CodeLlama-7B-Python-GGUF/codellama-7b-python.Q2_K.gguf, the result is the same.

@kurnevsky
Copy link
Contributor

Can it be added to the flake output?

@apaz-cli
Copy link
Contributor Author

@alitariq4589 I got an email that you commented on this PR, did it somehow get deleted? Why do you need this PR merged? It's just an example.

@ggerganov Sorry I let the PR go stale, going to have more time to work on it over the weekend. Could you clone this branch and tell me what you see? Still having trouble debugging the segfault, since it happens deep inside ggml. Presumably, something at a higher level is not right, but I'm not sure what.

Still using the weights from CodeLlama-7B-Python-GGUF/codellama-7b-python.Q2_K.gguf, same as above.

@alitariq4589
Copy link
Contributor

@apaz-cli Sorry about that message. It was generated by CI and was sent to all the PRs accidentally.

@apaz-cli
Copy link
Contributor Author

@alitariq4589 Gotcha, no problem. It reminded me this still exists :)

@ggerganov
Copy link
Owner

@apaz-cli No worries - this example is still on my radar, but haven't had time to come back to it. Will do so eventually

@apaz-cli
Copy link
Contributor Author

Sounds good, thanks @ggerganov ❤️

This was referenced Sep 27, 2023
@ggerganov
Copy link
Owner

We have recently introduced the infill example to demonstrate this functionality. It should cover what has been proposed here.

There is still some work to make the tokenization correct #3503

@ggerganov ggerganov closed this Oct 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need feedback Testing and feedback with results are needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enhancement: Codellama FIM tokenization
4 participants