-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce llama-run #10291
Introduce llama-run #10291
Conversation
b2a336e
to
17f086b
Compare
Some of these builds seem hardcoded to c++11, when we use a feature from c++14. Any reason we aren't using say c++17 Any reasonable platform should be up-to date with c++17 I think |
bf26504
to
0d016a4
Compare
Converted to C++11 only |
0af3f55
to
33eb456
Compare
d2711eb
to
ca45737
Compare
@slaren @ggerganov PTAL, I'm hoping to add other features to this example such as, read prompt from '-p' arg and read prompt from stdin |
It would be good to have a more elaborated chat example, but the goal of this example is to show in the simplest way possible how to use the llama.cpp API. I don't think that these changes achieve that, I think that users will have a harder time understanding the llama.cpp API with all the extra boilerplate that is being added here. If you want to use this as the base of a new example that adds more features that would be great, but I think we should keep this example as simple as possible. |
ca45737
to
0fbc0ae
Compare
Cool sounds good @slaren, mind if I call the new example ramalama-core ? |
0fbc0ae
to
83988df
Compare
llama-inferencer? I really don't care too much about the name, just want to agree on one to unblock this PR. @rhatdan @slp any suggestions on names? I plan on using this as the main program during "ramalama run" but I'm happy for anyone to use it or make changes to it to suit their needs. It's like a drastically simplified version of llama-cli, with one or two additional features, read from stdin, read from -p flag. But it does seem stabler and less error prone than llama-cli also. And the verbose info is all cleaned up to only spit out errors. It was based on llama-simple-chat initially. |
Maybe something like |
SGTM |
It's also tempting to call this something like run and use a kinda RamaLama, LocalAI, Ollama type CLI interface to interact with models. Kinda like daemonless Ollama:
|
We could even possibly add https://, http://, ollama://, hf:// as valid syntaxes to pull models since they all are just a http pull in the end |
That might be implemented as llama-pull that llama-run can fork/exec (or they share a common library) |
1d7f97c
to
4defcf2
Compare
I will be afk for 3 weeks, so expect inactivity in this PR. I did the rename in case we want to merge as is and not leave this go stale. Although the syntax will completely change to: llama-run [file://]somedir/somefile.gguf [prompt] [flags] file:// will be optional, but will set up the possibility of adding the pullers discussed above. |
This drives the compiler crazy FWIW:
seems like it only wants to build C code in that file or something. |
You are probably doing something wrong. Put it at the end of the file, include |
4defcf2
to
3760e6e
Compare
Added the smart pointers typedef stuff @slaren |
include/llama-cpp.h
Outdated
typedef std::unique_ptr<llama_model, llama_model_deleter> llama_model_ptr; | ||
typedef std::unique_ptr<llama_context, llama_context_deleter> llama_context_ptr; | ||
typedef std::unique_ptr<llama_sampler, llama_sampler_deleter> llama_sampler_ptr; | ||
typedef std::unique_ptr<char[]> char_array_ptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest look good, but this one (char_array_ptr
) does not belong here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also need to use more parameters as references, I notice I still pass raw pointers in places where it could be a reference to a smart pointer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed char_array_ptr
32b2999
to
bdac00f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a good start for an improved chat example.
llama-cpp.h
also needs to be added to the list of public headers in CMakeLists.txt
.
bd63eb5
to
a9054cb
Compare
Added as public header |
a9054cb
to
faeba27
Compare
It's like simple-chat but it uses smart pointers to avoid manual memory cleanups. Less memory leaks in the code now. Avoid printing multiple dots. Split code into smaller functions. Uses no exception handling. Signed-off-by: Eric Curtin <[email protected]>
faeba27
to
aa0a507
Compare
Happy to merge this @slaren @ggerganov Although there will be breaking changes in a follow on PR, want to change the syntax to this (ramalama/ollama style):
Eventually want to add optional prefixes for the model also (like file://, hf://, maybe we implement http://, https://, ollama:// eventually also, its not that hard), but for no prefix we will just assume the string is a file path I guess. |
Sounds good. Btw, to support dynamic loading of backends (#10469), you should add a call to |
It's like simple-chat but it uses smart pointers to avoid manual
memory cleanups. Less memory leaks in the code now. Avoid printing
multiple dots. Split code into smaller functions. Uses no exception
handling.