Stream api #392

alex-dixon · 2024-11-26T14:55:45Z

A couple users have asked about a streaming api. Specifically a way to receive chunked output for text blocks and structured output.

How to make this work with tracking?

alex-dixon · 2024-11-26T16:24:10Z

Looked in the code a bit.

One approach is to have streaming versions of the provider calls. This makes it clear we should be streaming all the way through. separate code paths for streaming in complex and or track may help too. There’s a clear spot to branch on track vs track_streaming for example if that helps us.

How should the api should surface at the user level? Eg

Stream as a value returned by an lmp when stream is true
for c in lmp(a,b, stream=True):
…

as a property of whatever result type… (nonstreaming is list[Message]
result, _ = lmp(a,b, stream=True)

for c in result.stream():

Other approaches:
normalize on python’s “lazy sequence” abstraction by forcing a stream api in the provider layer and calling “list” on the result (collecting it) when stream is false. Unfortunately not all provider api calls support streaming so we’d artificially force this in some cases. Provider code is already stream aware by default for text and they just collect the stream. So the extra work would be a faux streaming api in ell (iterate over messages and yield parts of them). Could be something here. Need to review the OpenAI stream chunk data structures and see how they map to ell types. If we can have a normal chunk returned by providers then ell should be able to 1. Yield the chunks when stream true 2. collect them into messages otherwise

gwpl · 2024-12-03T21:41:44Z

Regarding ergonomics of programming libraries, I liked approach taken by rust-genai library,
offering

print_chat_stream(...)

function ( example : https://github.com/jeremychone/rust-genai/blob/d79f0f03c68c3af1c5638c3589469653eefd5edb/examples/c04-chat-options.rs#L41 ).

As often we would like to receive object/functor/iterator capable of iterating over chunks, but displaying them in terminal while prototyping all the time is just against DRY.

So I find that libraries should also provide some ergonomic routine to print to stdout/stderr... basically sth like normal print(...) but working with object returning chunks (i.e. doing iteration for us).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream api #392

Stream api #392

alex-dixon commented Nov 26, 2024

alex-dixon commented Nov 26, 2024

gwpl commented Dec 3, 2024

Stream api #392

Stream api #392

Comments

alex-dixon commented Nov 26, 2024

alex-dixon commented Nov 26, 2024

gwpl commented Dec 3, 2024