-
Notifications
You must be signed in to change notification settings - Fork 10.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High performance API #321
Comments
I don't agree that HTTP is overkill. I think it is perfect. You have RESTful APIs for all the large LLM providers like OpenAI. You send your inputs and generation params, the service sends you back the generated response string. Here's an example of serving llama.cpp over HTTP, as an emulated KoboldAI server. |
Try out the |
Sure but none of these are performance oriented like this project and none of these run on the local PC.
If done correctly building a REST API on top of THAT would be trivial, even from Python if performance isn't much of a point anyways. Also supporting HTTP would require lots of code or even external dependencies. |
Yes, keep the core lean, portable, fast and free of dependencies while having the option of building things on top of it, as modules. This could be achieved with a C api which simply uses file streams in place of stdin/stdout. A communication layer based on those file streams could then be built upon to achieve whatever functionality necessary. The tcp_server branch is actually almost like that, but not quite. |
My distributable is a 700kb zip containing literally 3 files, one of which is optional. It requires nothing else except a stock python install to run. https://github.com/LostRuins/llamacpp-for-kobold/releases/tag/v1.0.3 and if you don't want that, we have a literal 1 file pyinstaller solution with a built-in server. https://github.com/henk717/llamacpp-for-kobold/releases/tag/v1.0.2-2048 |
@tarruda This is great, is it going to be merged? |
No idea what your specialization is, but in my part of the programming world TCP is already considered too heavy ^^ A TCP based API would be very much welcome over restful APIs from my point of view. |
Hey!
I'd love to see this project being able to be used through some TCP socket with a very optimized protocol. One it may make use of something like protobuf, or even grpc.
I think everyone agrees HTTP would be a complete overkill specially for a project focused on high performance. 😆
Thanks
Niansa
The text was updated successfully, but these errors were encountered: