Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High performance API #321

Closed
niansa opened this issue Mar 20, 2023 · 8 comments
Closed

High performance API #321

niansa opened this issue Mar 20, 2023 · 8 comments
Labels
duplicate This issue or pull request already exists enhancement New feature or request

Comments

@niansa
Copy link
Contributor

niansa commented Mar 20, 2023

Hey!

I'd love to see this project being able to be used through some TCP socket with a very optimized protocol. One it may make use of something like protobuf, or even grpc.
I think everyone agrees HTTP would be a complete overkill specially for a project focused on high performance. 😆

Thanks
Niansa

@LostRuins
Copy link
Collaborator

I don't agree that HTTP is overkill. I think it is perfect. You have RESTful APIs for all the large LLM providers like OpenAI. You send your inputs and generation params, the service sends you back the generated response string.

Here's an example of serving llama.cpp over HTTP, as an emulated KoboldAI server.
https://github.com/LostRuins/llamacpp-for-kobold

@tarruda
Copy link

tarruda commented Mar 20, 2023

Try out the tcp_server branch, more details of how it works in #278

@gjmulder gjmulder added duplicate This issue or pull request already exists enhancement New feature or request labels Mar 20, 2023
@niansa niansa closed this as completed Mar 20, 2023
@niansa
Copy link
Contributor Author

niansa commented Mar 20, 2023

You have RESTful APIs for all the large LLM providers like OpenAI. You send your inputs and generation params, the service sends you back the generated response string.

Sure but none of these are performance oriented like this project and none of these run on the local PC.

Here's an example of serving llama.cpp over HTTP, as an emulated KoboldAI server.

If done correctly building a REST API on top of THAT would be trivial, even from Python if performance isn't much of a point anyways.

Also supporting HTTP would require lots of code or even external dependencies.

@anzz1
Copy link
Contributor

anzz1 commented Mar 20, 2023

Yes, keep the core lean, portable, fast and free of dependencies while having the option of building things on top of it, as modules. This could be achieved with a C api which simply uses file streams in place of stdin/stdout. A communication layer based on those file streams could then be built upon to achieve whatever functionality necessary. The tcp_server branch is actually almost like that, but not quite.

@LostRuins
Copy link
Collaborator

Also supporting HTTP would require lots of code or even external dependencies.

My distributable is a 700kb zip containing literally 3 files, one of which is optional. It requires nothing else except a stock python install to run.

https://github.com/LostRuins/llamacpp-for-kobold/releases/tag/v1.0.3

and if you don't want that, we have a literal 1 file pyinstaller solution with a built-in server.

https://github.com/henk717/llamacpp-for-kobold/releases/tag/v1.0.2-2048

@0x090909
Copy link

0x090909 commented Mar 27, 2023

@tarruda This is great, is it going to be merged?

@tarruda
Copy link

tarruda commented Mar 27, 2023

@tarruda This is great, is it going to be merged?

I don' t think so, check the discussion in #278 for more context.

@Calandiel
Copy link

I don't agree that HTTP is overkill. I think it is perfect. You have RESTful APIs for all the large LLM providers like OpenAI. You send your inputs and generation params, the service sends you back the generated response string.

Here's an example of serving llama.cpp over HTTP, as an emulated KoboldAI server. https://github.com/LostRuins/llamacpp-for-kobold

No idea what your specialization is, but in my part of the programming world TCP is already considered too heavy ^^

A TCP based API would be very much welcome over restful APIs from my point of view.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants