High performance API #321

niansa · 2023-03-20T11:34:40Z

Hey!

I'd love to see this project being able to be used through some TCP socket with a very optimized protocol. One it may make use of something like protobuf, or even grpc.
I think everyone agrees HTTP would be a complete overkill specially for a project focused on high performance. 😆

Thanks
Niansa

LostRuins · 2023-03-20T11:42:07Z

I don't agree that HTTP is overkill. I think it is perfect. You have RESTful APIs for all the large LLM providers like OpenAI. You send your inputs and generation params, the service sends you back the generated response string.

Here's an example of serving llama.cpp over HTTP, as an emulated KoboldAI server.
https://github.com/LostRuins/llamacpp-for-kobold

tarruda · 2023-03-20T12:37:22Z

Try out the tcp_server branch, more details of how it works in #278

niansa · 2023-03-20T19:27:08Z

You have RESTful APIs for all the large LLM providers like OpenAI. You send your inputs and generation params, the service sends you back the generated response string.

Sure but none of these are performance oriented like this project and none of these run on the local PC.

Here's an example of serving llama.cpp over HTTP, as an emulated KoboldAI server.

If done correctly building a REST API on top of THAT would be trivial, even from Python if performance isn't much of a point anyways.

Also supporting HTTP would require lots of code or even external dependencies.

anzz1 · 2023-03-20T23:06:00Z

Yes, keep the core lean, portable, fast and free of dependencies while having the option of building things on top of it, as modules. This could be achieved with a C api which simply uses file streams in place of stdin/stdout. A communication layer based on those file streams could then be built upon to achieve whatever functionality necessary. The tcp_server branch is actually almost like that, but not quite.

LostRuins · 2023-03-22T15:40:27Z

Also supporting HTTP would require lots of code or even external dependencies.

My distributable is a 700kb zip containing literally 3 files, one of which is optional. It requires nothing else except a stock python install to run.

https://github.com/LostRuins/llamacpp-for-kobold/releases/tag/v1.0.3

and if you don't want that, we have a literal 1 file pyinstaller solution with a built-in server.

https://github.com/henk717/llamacpp-for-kobold/releases/tag/v1.0.2-2048

0x090909 · 2023-03-27T07:59:13Z

@tarruda This is great, is it going to be merged?

tarruda · 2023-03-27T13:41:37Z

@tarruda This is great, is it going to be merged?

I don' t think so, check the discussion in #278 for more context.

Calandiel · 2023-04-25T10:24:03Z

I don't agree that HTTP is overkill. I think it is perfect. You have RESTful APIs for all the large LLM providers like OpenAI. You send your inputs and generation params, the service sends you back the generated response string.

Here's an example of serving llama.cpp over HTTP, as an emulated KoboldAI server. https://github.com/LostRuins/llamacpp-for-kobold

No idea what your specialization is, but in my part of the programming world TCP is already considered too heavy ^^

A TCP based API would be very much welcome over restful APIs from my point of view.

gjmulder added duplicate This issue or pull request already exists enhancement New feature or request labels Mar 20, 2023

niansa closed this as completed Mar 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High performance API #321

High performance API #321

niansa commented Mar 20, 2023 •

edited

Loading

LostRuins commented Mar 20, 2023

tarruda commented Mar 20, 2023

niansa commented Mar 20, 2023

anzz1 commented Mar 20, 2023 •

edited

Loading

LostRuins commented Mar 22, 2023

0x090909 commented Mar 27, 2023 •

edited

Loading

tarruda commented Mar 27, 2023

Calandiel commented Apr 25, 2023

High performance API #321

High performance API #321

Comments

niansa commented Mar 20, 2023 • edited Loading

LostRuins commented Mar 20, 2023

tarruda commented Mar 20, 2023

niansa commented Mar 20, 2023

anzz1 commented Mar 20, 2023 • edited Loading

LostRuins commented Mar 22, 2023

0x090909 commented Mar 27, 2023 • edited Loading

tarruda commented Mar 27, 2023

Calandiel commented Apr 25, 2023

niansa commented Mar 20, 2023 •

edited

Loading

anzz1 commented Mar 20, 2023 •

edited

Loading

0x090909 commented Mar 27, 2023 •

edited

Loading