Adding term to indicate a stream of data #1044

egekorkan · 2021-02-09T16:11:48Z

Problem

The new Scripting API allows processing data that is sent as a stream, instead of a bulk payload that can be parsed instantly. The stream is sometimes obvious by the protocol or contentType (a video stream) but it may not be obvious since HTTP also allows streaming of data and it is possible to send an application/json as a stream.

Proposal

Thus, the consumer needs to know that values will be a stream or not. So adding a term like stream that can have a boolean value would solve the problem and remove the need for out-of-band information. It would be better to find a widely used term to describe such cases.

Analogy

JSON can be sent in different encoding formats and this is currently indicated by the contentCoding term in the TD spec. Not having this term would result in either saying that

Note: This use case is not limited to Scripting API.
Note2: Should this be opened in use cases repo first @mlagally ?

The text was updated successfully, but these errors were encountered:

egekorkan · 2021-02-09T16:12:53Z

It might be interesting for discovery TF since a TD / JSON-LD can be streamed: See here

sebastiankb · 2021-02-17T17:01:47Z

From today's TD call:

we need some concrete use cases for discussion
one use case is from discovery such as streaming large TDs
other use case about GeoPose which also stream data

mjkoster · 2021-02-17T17:10:12Z

We should try to use subscription subprotocols whenever possible, because we can semantically describe the payload.

Geopose may benefit from a subscription model, even if it doesn't use the SSE subprotocol.

In other cases like pagination, multipart messages, or event logging, it may be text lines or arbitrary boundaries, and in these cases a hint is needed to set up the connection with chunked encoding and perhaps keepalive. WRT this, I wonder if the hint belongs in the protocol binding.

relu91 · 2021-02-18T09:29:41Z

I think we have four scenarios:

The protocol support streaming negotiation natively. Like in HTTP, a client can understand that the server is sending back streaming data by looking for Transfer-Encoding: chunked.
The protocol does not support streaming negotiation. AFIK Coap does not have a standard way to convey a chunked transfer, but there might be other examples.
Application sending live streaming data
The application defines its own chunked transfer (e.g., TDD providing pagination)

Cases 1 and 4 do not need any hint. In 1 the client does all the job and applications can choose to let him to buffer everything or receive the chucks as they arrive. Point 3 I think that it might be sufficient to look at the data schema of the response to understand that the data is chunked.

On the other hand, we definitely need a hint for point 2 since the consumer would expect fully contained data in one single protocol message but the server would send them chucked over a single connection. About 3 I feel that the streaming tag would be nice to have so that client can promptly read the data in a streaming mode. Notice that sometimes the streaming nature of the data is implicit in the protocol or subprotocol ( consider for example an HTTP form that specifies HTTP Live streaming as subprotocol).

mmccool · 2024-11-28T11:15:24Z

Use case: JSON Streaming. See https://github.com/w3c/wot-testing/tree/main/events/2024.11.Munich/Documentation/Intel and the Ollama API, which streams multiple JSON objects in a response.

Note however that in this particular API the use of streams is optional and indicated by a "stream" flag in the request payload, and termination is also given by a "done" flag in the response objects.

However... if we have a "stream" flag in the TD that would probably still work, since a stream with one element is still a stream, and termination can be indicated by the connection closing. So the stream flag in the request object, if used, is just telling the server to stream, the proposed stream flag in the TD is just telling the client (TD consumer) that the server MAY send multiple JSON objects before terminating the connection.

mmccool · 2024-11-28T11:34:22Z

I also assume we are just concerned about streaming responses FROM a Thing (acting as a server). The other way might be interesting (i.e. streaming requests from clients...) but I propose we declare them out of scope for this issue. In this case the "stream" flag would only apply to responses, which is a bit asymmetric, but...

We could however use the term "outputStream" for the flag (instead of just "stream") and reserve "inputStream" for possible future use.

Also: just on actions? I could also see streaming reads from properties, that would send a new object periodically or when the property changes.

Another thought is maybe we could use an "op" for this, like "readstream". But limiting the scope to actions would be ok and probably the right thing to do for simplicity. Although... rather than a "stream" flag on actions, we could also have an op like "invokestream".

benfrancis · 2024-12-05T12:57:11Z

Other than a video stream, I'm not sure I fully understand the distinction between something that is a stream and something that isn't for the purposes of this new term. E.g. would Server-Sent Event connections (unidirectional) and WebSocket connections (bidirectional) count as streams, or are those cases where the hint is not necessary?

Another use case I came across this week is Smart Core OS exposing an API using gRPC over HTTP/2, which notably supports bidirectional streaming.

Regarding restricting the scope of streams to action affordances, I would actually have thought they were more common for observed properties, and events..?

lu-zero · 2024-12-05T16:31:53Z

A video stream is just a special kind of event and is modeled better that way IMHO, even more if you consider multicast setups.

Same could be argued for the Ollama API.

Once we express relationships across affordances properties can be used to control its explicit state (playing, paused).
I would not add additional operations.

egekorkan · 2024-12-06T07:57:08Z

Other than a video stream, I'm not sure I fully understand the distinction between something that is a stream and something that isn't for the purposes of this new term.

From the F2F discussion I remember, as well as a use case in the WoT CG from @RobWin, it is more relevant for actions. You invoke an action, but the response is not data you get in one response; it is a stream of responses that don't have to be like a video stream but can be sporadically sent by the Thing. The current way to model it would be by invoking an action and observing it.

relu91 · 2024-12-06T08:33:07Z

A video stream is just a special kind of event and is modeled better that way IMHO, even more if you consider multicast setups.

I don't have a strong preference but I think a property with content-type: application/vnd.apple.mpegurl should also be supported.

RobWin · 2024-12-06T10:38:54Z

@egekorkan
Yes, to model and implement AI Agents we need streaming support for actions.
Input could be a voice stream and output could also be a voice stream. So we need bidirectional streaming.

But an output stream with multiple message is also needed. Like a Chatbot which does not only return a single response, but multiple intermediate responses, which explain what he is doing, and then the final response.

A single op = invokestream might be not specific enough to tell which part of the action can be streamed. Is it the input and output or output only?

Right now we don't have a usecase to stream properties. But let's assume a property is something like a conversation history, then it might make sense to be able to stream back to a consumer. But this could also be implemented with an action.

RobWin · 2024-12-06T10:59:14Z

So like in OpenAPI v3 we would need to be able to use Multipart Messages but supported by multiple protocol bindings.
So it should be possible to not only use HTTP, but also MQTT or AMQP which split the parts into multiple messages.

benfrancis · 2024-12-06T11:44:35Z

@lu-zero wrote:

A video stream is just a special kind of event and is modeled better that way IMHO, even more if you consider multicast setups.

@relu91 wrote:

I don't have a strong preference but I think a property with content-type: application/vnd.apple.mpegurl should also be supported.

For the record, in WebThings Gateway we model a video stream as a property with a type of null and a contentType in the Form which specifies the streaming video format. In this case I don't think a stream term is needed since the fact it's a stream is already described by the protocol and content type (scenario 1 described above by @relu91).

@lu-zero wrote:

Once we express relationships across affordances properties can be used to control its explicit state (playing, paused).

See the discussion of a potential MediaPlayer schema in WebThings which allows for the remote control of media playback. It's tricky to model, but I think it can be modelled with properties and actions today (with semantic annotations to aid in the rendering of a useful user interface). I think this can be achieved without needing to express relationships between affordances, unless there are multiple media streams playing simultaneously each with their own playback controls, then it might get tricky...

@RobWin wrote:

to model and implement AI Agents we need streaming support for actions.

This makes me very cautious about scope creep because my understanding is that Thing Descriptions are designed to describe physical devices with physical affordances, not to describe any piece of abstract software. I think we should be very cautious about extending the WoT information model in an attempt to describe affordances of software applications, unless there are also use cases in physical devices. Otherwise the problem space is completely unbounded and we will end up with an unusable mess.

I would suggest that describing an AI software agent in general is out of scope for a Thing Description. That said, streaming audio to and from an IoT device is a perfectly valid use case.

Input could be a voice stream and output could also be a voice stream. So we need bidirectional streaming.
But an output stream with multiple message is also needed. Like a Chatbot which does not only return a single response, but multiple intermediate responses, which explain what he is doing, and then the final response.

If it was just a continuous audio stream in each direction I think that could be modelled as two properties, one readOnly and one writeOnly.

If the request and response were discrete audio files then it could be modelled as an action input and output.

If you want to stream a back and forth conversation over a single action affordance that's definitely trickier. I don't know what protocol and content type you're using, but could there be an initial input stream which ends, then an output stream which stays open (broadcasting silence during pauses) until the full response is complete? Or does the input stream need to stay open as well? If so, could a new input be a new action request on the same action affordance?

In general I think action affordances are built on the assumption that there's a discreet input and then a discreet output. If you want a bidirectional stream which is kept open for a back and forth conversation I think that probably needs to be modelled a different way, and if this isn't already supported by an existing protocol your best bet might actually to define a new protocol, or subprotocol. In some ways it's similar to what we're doing with the Web Thing Protocol WebSocket Sub-protocol.

RobWin · 2024-12-06T12:11:01Z

This makes me very cautious about scope creep because my understanding is that Thing Descriptions are designed to describe physical devices with physical affordances, not to describe any piece of abstract software.

Then it's a device which you can talk to and it answers you.
For example https://humane.com/
But behind any hardware, there is software :D

The potential of WoT (Web of Things) extends far beyond simply abstracting devices. At DT, we’ve successfully demonstrated this by abstracting both Weather Stations and Weather Services using WoT. While the end customer might not perceive much difference in the functionality, the impact on the system architecture was profound.
We did the same for MediaPlayers like Sonos, which in the end is a Cloud Service.

By leveraging WoT, the system treats everything—whether a physical device or a cloud-based Web Service — as a WoT Thing. This abstraction eliminated the distinction between hardware and software services.

You might not like it, but I opened another discussion: https://github.com/orgs/modelcontextprotocol/discussions/56

benfrancis · 2024-12-06T12:56:57Z

@RobWin wrote:

By leveraging WoT, the system treats everything—whether a physical device or a cloud-based Web Service like Sonos — as a WoT Thing. This abstraction eliminated the distinction between hardware and software services.

I'm afraid we're going to have to agree to disagree on this one. The scope of the Web of Things is already impossibly large, without extending it to not-things.

General web services can easily be used in conjunction with the Web of Things without needing to model every service as a Thing.

"When you have a hammer, every problem looks like a nail."

RobWin · 2024-12-06T13:06:43Z

Yes, unfortunately we have to disagree on this. Web Services are fragmented. Different protocols, different communication patterns. WoT was perfectly suitable for us to harmonize access to Web Services as well - based on the properties/actions/events abstraction and forms. Much better than any OpenAPI or AsyncAPI did.
Usually web services are missing the semantic information, which we were able to model with WoT.

As said we have experience integrating hundreds of device vendors and device types and different web services, and I can tell you that WoT helped us a lot. It sad to tell people, you are doing it wrong, when it perfectly works.
For us luckily it didn't just look like a nail, it was a nail. WoT is for me an evolution of what web services are right now. And I thank you all for specifying and forming it. I love it.

And I'm really happy that @mmccool is looking into a similar direction with his Thing Description examples of AI Services. The Web of Agents or Internet of Agents will come. And I would prefer if it to be based on WoT than the API hell from OpenAI right now.

But I don't want to distract this valuable discussion about streaming here. I gave my input, because @egekorkan asked.

benfrancis · 2024-12-06T13:18:41Z

@RobWin wrote:

It sad to tell people, you are doing it wrong, when it perfectly works.

I'm not saying it doesn't work. There are plenty of examples of add-ons for WebThings Gateway which model not-things as Things and I get why that's attractive when the tools are already available.

I'm saying that it would be bad for the Web of Things to start adding features to specifications soley for use cases that fall outside of the scope of the Internet of Things. "The Web of Things (WoT) seeks to counter the fragmentation of the IoT", not the whole Internet. Scope creep is dangerous for any standard.

But again, I'm also not saying that bi-directional streaming of audio should be out of scope either, just that we should frame it in terms of concrete IoT use cases.

So circling back to the topic at hand...

It seems like the stream term is only needed where there is a protocol or content type that can optionally be used for streaming, but the distinction is not natively expressed in the protocol or content type itself. Is that correct?

relu91 · 2024-12-06T13:50:04Z

For the record, in WebThings Gateway we model a video stream as a property with a type of null and a contentType in the Form which specifies the streaming video format. In this case I don't think a stream term is needed since the fact it's a stream is already described by the protocol and content type (scenario 1 described above by @relu91).

yup exactly, with the only difference that I would not bother adding type: null. What does it convey here?

It seems like the stream term is only needed where there is a protocol or content type that can optionally be used for streaming, but the distinction is not natively expressed in the protocol or content type itself. Is that correct?

That's my understanding, see also #1044 (comment)

benfrancis · 2024-12-06T14:43:24Z

@relu91 wrote:

the only difference that I would not bother adding type: null. What does it convey here?

I agree, that may be leftover from some time in the distant past when the type member was considered mandatory.

lu-zero · 2024-12-06T15:49:22Z

@benfrancis wrote:

For the record, in WebThings Gateway we model a video stream as a property with a type of null and a contentType in the Form which specifies the streaming video format. In this case I don't think a stream term is needed since the fact it's a stream is already described by the protocol and content type (scenario 1 described above by @relu91).

That property provides a resource unbounded?

@lu-zero wrote:

Once we express relationships across affordances properties can be used to control its explicit state (playing, paused).

See the discussion of a potential MediaPlayer schema in WebThings which allows for the remote control of media playback. It's tricky to model, but I think it can be modelled with properties and actions today (with semantic annotations to aid in the rendering of a useful user interface). I think this can be achieved without needing to express relationships between affordances, unless there are multiple media streams playing simultaneously each with their own playback controls, then it might get tricky...

There are many ways to model a playback system.

You may use actions for the control since the actual change of state could be a monitor connected. (the playback process whole is an hidden state)

Or you may actually use the same control while the system is broadcasting and consumer may subscribe to the specific stream.
Or you are serving videos on-demand so you can use properties.

I general I think action affordances are built on the assumption that there's a discreet input and then a discreet output. If you want a bidirectional stream which is kept open for a back and forth conversation I think that probably needs to be modelled a different way, and if this isn't already supported by an existing protocol your best bet might actually to define a new protocol, or subprotocol. In some ways it's similar to what we're doing with the Web Thing Protocol WebSocket Sub-protocol.

+1, and I add on top that actions are good to model something that has an hidden state that you cannot touch directly and may evolve over time.

lu-zero · 2024-12-06T16:11:31Z

@RobWin wrote:

@egekorkan Yes, to model and implement AI Agents we need streaming support for actions. Input could be a voice stream and output could also be a voice stream. So we need bidirectional streaming.

But an output stream with multiple message is also needed. Like a Chatbot which does not only return a single response, but multiple intermediate responses, which explain what he is doing, and then the final response.

What you are doing is sending a discontinuous amount of information, so you can feed it in using either a very odd write-only property or an action of kind update and you may subscribe to an event matching the process you are feeding.

That it is a "stream" is just a protocol detail IMHO. The data schema would always be an Array of something.

takuki added the Defer to TD 2.0 label May 19, 2021

egekorkan added Has Use Case Potential The use case can be extracted and explained Selected for Use Case The issue is relevant for the work and should move to an use case labels Feb 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding term to indicate a stream of data #1044

Adding term to indicate a stream of data #1044

egekorkan commented Feb 9, 2021

egekorkan commented Feb 9, 2021 •

edited

Loading

sebastiankb commented Feb 17, 2021

mjkoster commented Feb 17, 2021

relu91 commented Feb 18, 2021

mmccool commented Nov 28, 2024 •

edited

Loading

mmccool commented Nov 28, 2024 •

edited

Loading

benfrancis commented Dec 5, 2024

lu-zero commented Dec 5, 2024

egekorkan commented Dec 6, 2024

relu91 commented Dec 6, 2024

RobWin commented Dec 6, 2024

RobWin commented Dec 6, 2024

benfrancis commented Dec 6, 2024 •

edited

Loading

RobWin commented Dec 6, 2024 •

edited

Loading

benfrancis commented Dec 6, 2024 •

edited

Loading

RobWin commented Dec 6, 2024 •

edited

Loading

benfrancis commented Dec 6, 2024 •

edited

Loading

relu91 commented Dec 6, 2024

benfrancis commented Dec 6, 2024 •

edited

Loading

lu-zero commented Dec 6, 2024

lu-zero commented Dec 6, 2024

Adding term to indicate a stream of data #1044

Adding term to indicate a stream of data #1044

Comments

egekorkan commented Feb 9, 2021

Problem

Proposal

Analogy

egekorkan commented Feb 9, 2021 • edited Loading

sebastiankb commented Feb 17, 2021

mjkoster commented Feb 17, 2021

relu91 commented Feb 18, 2021

mmccool commented Nov 28, 2024 • edited Loading

mmccool commented Nov 28, 2024 • edited Loading

benfrancis commented Dec 5, 2024

lu-zero commented Dec 5, 2024

egekorkan commented Dec 6, 2024

relu91 commented Dec 6, 2024

RobWin commented Dec 6, 2024

RobWin commented Dec 6, 2024

benfrancis commented Dec 6, 2024 • edited Loading

RobWin commented Dec 6, 2024 • edited Loading

benfrancis commented Dec 6, 2024 • edited Loading

RobWin commented Dec 6, 2024 • edited Loading

benfrancis commented Dec 6, 2024 • edited Loading

relu91 commented Dec 6, 2024

benfrancis commented Dec 6, 2024 • edited Loading

lu-zero commented Dec 6, 2024

lu-zero commented Dec 6, 2024

egekorkan commented Feb 9, 2021 •

edited

Loading

mmccool commented Nov 28, 2024 •

edited

Loading

mmccool commented Nov 28, 2024 •

edited

Loading

benfrancis commented Dec 6, 2024 •

edited

Loading

RobWin commented Dec 6, 2024 •

edited

Loading

benfrancis commented Dec 6, 2024 •

edited

Loading

RobWin commented Dec 6, 2024 •

edited

Loading

benfrancis commented Dec 6, 2024 •

edited

Loading

benfrancis commented Dec 6, 2024 •

edited

Loading