Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: feat: Support Llama 3 model (#478) #479

Merged
merged 10 commits into from
Apr 20, 2024

Conversation

reneleonhardt
Copy link
Contributor

@reneleonhardt reneleonhardt commented Apr 19, 2024

Wow, fresh of the press, seems to be one of the best models now!! 🚀

Waiting for llama.cpp support... 😅
ggml-org/llama.cpp#6747
ggml-org/llama.cpp#6751

@reneleonhardt reneleonhardt changed the title feat: Support Llama 3 model (#478) Draft: feat: Support Llama 3 model (#478) Apr 19, 2024
@carlrobertoh
Copy link
Owner

Is there any more work to be expected here, or can we perhaps remove the "Draft" prefix?

@reneleonhardt
Copy link
Contributor Author

As far as I can see it's finished (can you double-check the InfillPrompt?), I'm just waiting for the llama.cpp server support 😅
Or can you check it locally? In LM Studio I did see no big problems, the model is amazing, I just have to ask it to tell the truth when it fantasized to me about non-existing Kotlin operators 😉
7B Q8 is super fast on M1 Max 🚀

@carlrobertoh
Copy link
Owner

Nice! It looks like the model doesn't support infilling, or at least I couldn't find anything. Maybe we can just remove the new infill template for now.

@reneleonhardt
Copy link
Contributor Author

reneleonhardt commented Apr 20, 2024

Nice! It looks like the model doesn't support infilling, or at least I couldn't find anything. Maybe we can just remove the new infill template for now.

Really? All main calls I saw contained --in-prefix and suffix ggml-org/llama.cpp#6747 (comment) that's why I added it 😅
But if you think it's a problem I can remove it for now 🙂

@carlrobertoh
Copy link
Owner

Hmm, their model card doesn't specify anything related to that matter - https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3

@reneleonhardt
Copy link
Contributor Author

reneleonhardt commented Apr 20, 2024

All I can see is that there is a lot of confusion about the model, some people have many problems, some have none... I guess it depends on how you use it, and some HF models also seem to contain problems.
phymbert mentioned a special infill endpoint ggml-org/llama.cpp#6747 (comment), is CodeGPT using that?

@carlrobertoh
Copy link
Owner

No, but we were using it for a while, I think. However, I can't remember the reason why we switched back.

@carlrobertoh carlrobertoh merged commit 6e6a499 into carlrobertoh:master Apr 20, 2024
2 checks passed
carlrobertoh pushed a commit that referenced this pull request Apr 21, 2024
* feat: Support Llama 3 model (#478)

* Use new InfillPrompt

* Switch to lmstudio-community

* Use new Prompt

* llama.cpp removed the BOS token
ggml-org/llama.cpp@a55d8a9

* Add tests

* I would prefer a stream based solution

* Add 70B models

* Add tests for skipping blank system prompt

* Remove InfillPrompt for now
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants