Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review ABI vNEXT. #1

Closed
wants to merge 4 commits into from
Closed

Conversation

PiotrSikora
Copy link
Member

Signed-off-by: Piotr Sikora [email protected]

Copy link
Collaborator

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. Very well done and I appreciate the work you have put into the docs. I left some comments below. Happy to hop on a quick call about some of the more contentious points. I think my high level comments are:

  1. We should document the key enum values in this doc with full API docs.
  2. I still find the overloaded context concept to be incredibly confusing (vs. having discrete context types). Assuming we stick with that (which I assume we will), can we have a section the doc that talks about context types and especially how it all flows together with proxy_on_context_create?

Thank you very excited to see this land upstream!

abi-versions/vNEXT/README.md Outdated Show resolved Hide resolved
abi-versions/vNEXT/README.md Outdated Show resolved Hide resolved
abi-versions/vNEXT/README.md Outdated Show resolved Hide resolved
abi-versions/vNEXT/README.md Outdated Show resolved Hide resolved
abi-versions/vNEXT/README.md Outdated Show resolved Hide resolved
abi-versions/vNEXT/README.md Outdated Show resolved Hide resolved
abi-versions/vNEXT/README.md Outdated Show resolved Hide resolved
abi-versions/vNEXT/README.md Show resolved Hide resolved
abi-versions/vNEXT/README.md Outdated Show resolved Hide resolved
abi-versions/vNEXT/README.md Outdated Show resolved Hide resolved
@mattklein123
Copy link
Collaborator

cc @htuch to take a look at this also.

abi-versions/vNEXT/README.md Outdated Show resolved Hide resolved
abi-versions/README.md Show resolved Hide resolved
Copy link

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other meta-point from discussions with @htuch -- this ABI has the potential to be impactful beyond WASM. At some point (this may take a while) it would be great to have dynamically loadable Envoy filters (see Apache HTTPD) and for that we need a stable ABI, possibly accessed via a C API. No one wants to tackle that right now as Envoy is moving too fast. But it's useful to think about how to make it possible in the future.

Perhaps this puts too much of a burden on this as a pure WASM solution, and that should be tackled in a layer between WASM and native Envoy. But I wanted to put that on the radar, in addition to asking what the future of this ABI would be in a non-Envoy proxy (which appears to be a goal).

abi-versions/vNEXT/README.md Outdated Show resolved Hide resolved
abi-versions/vNEXT/README.md Outdated Show resolved Hide resolved
abi-versions/vNEXT/README.md Outdated Show resolved Hide resolved
abi-versions/vNEXT/README.md Outdated Show resolved Hide resolved
abi-versions/vNEXT/README.md Outdated Show resolved Hide resolved
abi-versions/vNEXT/README.md Outdated Show resolved Hide resolved
abi-versions/README.md Show resolved Hide resolved
abi-versions/README.md Show resolved Hide resolved
abi-versions/vNEXT/README.md Outdated Show resolved Hide resolved
abi-versions/vNEXT/README.md Outdated Show resolved Hide resolved
abi-versions/vNEXT/README.md Outdated Show resolved Hide resolved
abi-versions/vNEXT/README.md Show resolved Hide resolved
abi-versions/vNEXT/README.md Outdated Show resolved Hide resolved
abi-versions/vNEXT/README.md Outdated Show resolved Hide resolved
abi-versions/README.md Show resolved Hide resolved
jplevyak
jplevyak previously approved these changes Apr 7, 2020
@jplevyak
Copy link

@PiotrSikora can you give an update?

@jplevyak
Copy link

With no action in 4 months, this PR is abandoned. Closing it.

@jplevyak jplevyak closed this Jul 16, 2020
@PiotrSikora PiotrSikora reopened this Jul 17, 2020
@PiotrSikora
Copy link
Member Author

I believe most (all?) comments were addressed, but some things might have been lost in the rewrite, so please take a look. Thanks!

@jplevyak jplevyak self-requested a review July 17, 2020 17:59
@jplevyak jplevyak dismissed their stale review July 17, 2020 18:00

New commit has invalidated my approval.

@PiotrSikora
Copy link
Member Author

@mattklein123 @htuch @yskopets @yuval-k @gbrail @jplevyak @kyessenov could you please review the latest iteration of the ABI spec? Thanks!

Returns `InvalidMemoryAccess` when `message_data` and/or `message_size` point to
invalid memory address.

### Callbacks optionally exposed by the Wasm module

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think , it's worth clarifying to what extent proxy_on_memory_allocate is optional.

Without this function many of essential Host APIs won't work, e.g. proxy_get_buffer, proxy_get_map_values, etc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, this one isn't really optional.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I would have a required section and be clear that a module will fail to load unless it implements all required exports.

that will be used by the host environment in subsequent calls. Returning `0`
retains the original context identifier.

#### `proxy_on_context_finalize`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function a replacement for a former proxy_on_log ?

At what moment will proxy_on_context_finalize be called in the context of Http Filter, Network Filter, Access Logger ? (e.g., before / after access loggers)

With Access Logger extension in mind, does this function imply there must be a separate "Wasm context" for every HTTP request/TCP connection to be able to log them?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, where is on_log?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a strong pushback against having a generic proxy_on_log in previous round of reviews, so it was merged together with proxy_on_done and proxy_on_delete into proxy_on_context_finalize.

One alternative could be to switch to specialized context functions (see: #9), and we could add proxy_on_http_log then... but there is no reason not to log in the "finalize" function.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand motivation to exclude proxy_on_log from the lifecycle of Http Filter / Network Filter.

But why not have this callback to be implemented by Access Logger extensions ?

#### `proxy_on_vm_start`

* params:
- `i32 (uint32_t) vm_id`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How the value of vm_id is supposed to be used ? Or is it context_id of a VM ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's the context identifier, just named more accurately to make it clear what it represents, instead of having generic context_id in each function.

#### `proxy_on_plugin_start`

* params:
- `i32 (uint32_t) plugin_id`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How the value of plugin_id is supposed to be used ? Or is it context_id of a Plugin ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's the context identifier, just named more accurately to make it clear what it represents, instead of having generic context_id in each function.

should be forwarded upstream and that this extension shouldn't be called again
for this HTTP request.
- `Pause` instructs the host environment to pause processing until
`proxy_resume` is called.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no proxy_resume function in this spec

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be proxy_http_resume.

* returns:
- `i32 (proxy_result_t) call_result`

Removes item from the tail of the queue (`queue_id`). Returned value is written

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will be returned if the queue is empty ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, it looks that I accidentally removed it during rewrite. It should return Empty.

- `i32 (uint32_t) new_context_id`

Called when the host environment creates a new context (`context_id`) of type
`context_type` (e.g. `VmContext`, `PluginContext`, `StreamContext`,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this spec supposed to cover Access Logger extensions ?

Do Access Logger extensions have a dedicated context type ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, since logging can happen in the "finalize" calls.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, since logging can happen in the "finalize" calls.

Handling access logs typically involves buffering, timeout, http/grpc call. All these actions have less sense in the context of an individual HTTP request / TCP connection than in the context of a dedicated Access Logger extension.


### Callbacks optionally exposed by the Wasm module

#### `proxy_on_context_create`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use proxy_on_context_create callback to achieve a behaviour where an extension can have multiple instances within a VM each with a different configuration ?

E.g.,

  • 1 Envoy Listener with HttpConnectionManager and 2 instance of the same Http Filter extension but different configs
  • 2 different Envoy Listeners, each with HttpConnectionManager and 1 instance of the same Http Filter extension but different configs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

connection is closed.


## Custom extension points

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it possible to describe example use case ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 not totally clear how this would be used. An example would be great.


### Callbacks optionally exposed by the Wasm module

#### `proxy_on_custom_callback`
Copy link

@yskopets yskopets Aug 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to call back into Plugin context / Http Stream context / TCP stream context ?

How can extension understand for what context this callback is intended ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, but the list of parameters to custom callbacks is not part of this spec.


### Callbacks optionally exposed by the Wasm module

#### `proxy_on_new_connection`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you specify the state machine of the callbacks? E.g.

new_connection (downstream_data | upstream_data)* (downstream_close upstream_data* upstream_close | upstream_close downstream_data* downstream_close)

Also, specify the interleavings with configuration callbacks.

Copy link

@gbrail gbrail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I just had a few comments but I was able to remember a ton of things that confused me initially.

Thanks -- there are a bunch of good clarifications in this version and some simplifications of formerly-confusing things like the return codes and the three different ways for a context to be "done".

I'm a bit worried that the arrays of iovecs in some places are going to make it harder to use the raw ABI at the expense of efficiency in some cases, but if others think they're important then people will figure it out.


Adding new enum values requires increasing the patch version.

Wasm modules should export `proxy_abi_version_<major>_<minor>_<patch>` function, indicating which
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that "Wasm modules must export..." otherwise they will not work, correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

Wasm modules should export `proxy_abi_version_<major>_<minor>_<patch>` function, indicating which
version of the ABI they were compiled against.

Host environments should be able to support multiple versions of the ABI.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is "should," then what should a host do if it does not? "Must" it return a readable error and refuse to start the module?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

* returns:
- `i32 (proxy_result_t) call_result`

Logs message (`message_data`, `message_size`) at the log level (`log_level`).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any guidance on the data? "Should" it be UTF-8?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should, but should we be enforcing this in the spec? Do we need to define this for all "string" types, then?

module, which is expected to free this memory when it no longer needs it.

SDKs implementing this ABI are expected to automatically handle freeing of the memory-managed data
for the developers.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to say anything about the data that the Wasm module sends TO the host? For instance, when calling proxy_log, can the module assume that the host takes ownership of the data passed and that the caller is free to free it when the call returns?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the host doesn't take the ownership of the data referenced in the hostcalls.

Yes, the caller is free to free all parameters as soon as the function returns. This is the standard behavior for virtually (literally?) any programming language, so I'm not sure if that's something worth pointing out... it might result more confusion than anything else, IMHO.

implicitly created in `proxy_on_vm_start`, `proxy_on_plugin_start`,
`proxy_on_new_connection` and `proxy_on_http_request_headers` callbacks.

Return value (`new_context_id`) can be used to assign new context identifier
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dumb question but are all 32 bits of "new_context_id" always required to be uninterpreted by the host? Like, could a filter return a pointer to memory in its own "address space?"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the custom context identifier is opaque to the host, and using it to point to the context object inside WasmVm is the intended use case for it (this requested by @yuval-k)... But it feels a bit unsafe to me, to be honest.


Returns `NotAllowed` when requests to given upstream are not allowed.

Returns `InvalidMemoryAccess` when `upstream_name_data`, `upstream_name_size`,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I learned that Envoy will return an error in this call if the ":path", ":method", and "host" headers aren't set. In general we should say something about what the various header fields mean to the host. Do we want to document that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks.

#### `proxy_dispatch_grpc_call`

* params:
- `i32 (const char*) upstream_name_data`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in this version can this just be the name of the upstream? As discussed elsewhere this is currently a gnarly protobuf.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

Returns `Ok` on success.

Returns `NotFound` for unknown service (`service_name_data`,
`service_name_size`).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't that error come back from the upstream gRPC server? In that case would the error really be there or in the response? OTOH, I imagine that an invalid upstream name would immediately return an error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this should be for unknown upstream.

Dispatch a call to a gRPC service (`service_name_data`, `service_name_size`)
using upstream (`upstream_name_data`, `upstream_name_size`).

Once the response is returned to the host, `proxy_on_grpc_call_response` and/or
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we mean proxy_on_grpc_call_response (which I can't find) or something else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant proxy_on_grpc_call_response_message, although maybe we should have "all-on-one" proxy_on_grpc_call_response similar to proxy_on_http_call_response and leave proxy_on_grpc_call_response_{header_metadata,message,trailer_metadata} only for the gRPC stream? What do you think?

* returns:
- `i32 (proxy_result_t) call_result`

Returns map containing key-values for keys from the provided list of vectors
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this map is going to be used for HTTP headers, then do we want to standardize what names are there? Envoy has a particular set of system-defined headers like :path, :method, etc, and the rest are just lower-cased versions of the HTTP header. Should the spec discuss this so that people can write modules that can theoretically support multiple hosts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:path, :method and other pseudo-headers come from HTTP/2 transport, not Envoy. But yes, we should clarify that, since exposing those together with other headers is unique to Envoy.

Copy link
Member Author

@PiotrSikora PiotrSikora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. I responded in comments for now, but I will add the necessary clarifications and changes to the document tomorrow.

that will be used by the host environment in subsequent calls. Returning `0`
retains the original context identifier.

#### `proxy_on_context_finalize`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a strong pushback against having a generic proxy_on_log in previous round of reviews, so it was merged together with proxy_on_done and proxy_on_delete into proxy_on_context_finalize.

One alternative could be to switch to specialized context functions (see: #9), and we could add proxy_on_http_log then... but there is no reason not to log in the "finalize" function.

#### `proxy_on_vm_start`

* params:
- `i32 (uint32_t) vm_id`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's the context identifier, just named more accurately to make it clear what it represents, instead of having generic context_id in each function.

#### `proxy_on_plugin_start`

* params:
- `i32 (uint32_t) plugin_id`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's the context identifier, just named more accurately to make it clear what it represents, instead of having generic context_id in each function.

Note: `downstream` means the connection between client and proxy, and `upstream`
means the connection between proxy and backend (aka “next hop”).

Note: The same unique context identifier (`stream_id`) is shared by a pair of
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the context identifier, just named more accurately to make it clear what it represents, instead of having generic context_id in each function.

Return value (`next_action`) instructs the host environment what action to take:
- `Continue` means that this HTTP request should be accepted and that HTTP
headers should be forwarded upstream.
- `EndStream` means that this HTTP request should be accepted, that HTTP headers
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, "afterwards" mean "after the headers are sent".

It's used to set end_stream=true in the HTTP/2 HEADERS frame, since we don't have any other mechanism for doing that, which can be used e.g. when you're converting GET request to HEAD request and want to forward stream without forwarding original request body.

extension shouldn't be called again for the current stream.
- `Pause` means that further processing should be paused until
`proxy_resume_stream` is called.
- `WaitForMoreData` means that further processing should be paused and that host
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have that right now. I can add proxy_send_bytes_downstream if you want.

should be forwarded upstream and that this extension shouldn't be called again
for this HTTP request.
- `Pause` instructs the host environment to pause processing until
`proxy_resume` is called.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be proxy_http_resume.

* returns:
- `i32 (proxy_result_t) call_result`

Removes item from the tail of the queue (`queue_id`). Returned value is written
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, it looks that I accidentally removed it during rewrite. It should return Empty.


### Callbacks optionally exposed by the Wasm module

#### `proxy_on_queue_ready`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not correlated with any context_id, since queue is a standalone object.

proxy_open_shared_queue returns unique queue_id handle, which is used in proxy_on_queue_ready.

Always notifying all WasmVms seems wasteful, so we should do some kind of load-balancing (probably with ACK/NACK), but I guess we could declare each queue as either unicast or multicast upon creation, if you have a use case for it?

Returns `InvalidOperation` when `metric_type` doesn't support incrementing
and/or decrementing values.

#### `proxy_delete_metric`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, it's here to remove metrics from the host and avoid resource leaks, but I'm not sure how useful this will be in practice. It probably should match number of proxy_create_metric calls, but I could be convinced otherwise.

@yuval-k
Copy link

yuval-k commented Aug 20, 2020

i don't have any further comments.
thank you for making this happen!

connection should be closed afterwards.
- `Done` means that data chunk should be forwarded upstream and that this
extension shouldn't be called again for the current stream.
- `Pause` means that further processing should be paused until

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between Pause and WaitForMoreData ?

What should happen to data that arrive while the filter is in the Pause state ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also would be good to comment on watermarking/buffering as part of this.

- `Done` means that this HTTP request should be accepted, that HTTP headers
should be forwarded upstream and that this extension shouldn't be called again
for this HTTP request.
- `Pause` instructs the host environment to pause processing until

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should happen to data that arrive while the filter is in the Pause state ?

Return value (`next_action`) instructs the host environment what action to take:
- `Continue` means that this HTTP request body chunk should be forwarded
upstream.
- `EndStream` means that this HTTP request body chunk should be forwarded

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use case for this?

It invalidates Content-Length header that has already been forwarded upstream

upstream and that the underlying stream should be closed afterwards.
- `Done` means that this HTTP request body should be forwarded upstream and that
this extension shouldn't be called again for this HTTP request.
- `Pause` means that further processing should be paused until `proxy_resume` is

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between Pause and WaitForMoreData ?

What should happen to data that arrive while the filter is in the Pause state ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WaitForMoreData would call this callback when more data arrives where Pause would swallow that event until the wasm filter resumed.

I assume this maps to StopAllIterationAndBuffer, rather than StopIteration (which is kind of the Envoy default) or StopAllItreationAndWatermark. Might be worth clarifying?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general it seems like we are missing the options around watermarking vs. buffering. I think that is needed. I commented on this elsewhere.

this extension shouldn't be called again for this HTTP request.
- `Pause` means that further processing should be paused until `proxy_resume` is
called.
- `WaitForMoreData` means that further processing should be paused and host

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to document the complete workflow with regards to buffered data ?

E.g.,

  • if a filter returns WaitForMoreData, what exactly data will be available in the context of the next call to proxy_on_http_request_body ?
    • Only new data since the last call ?
    • A single buffer with both old and new data together ?
    • Only old data in the buffer ? (surprisingly, current behaviour of envoy-wasm)

Context:

  • I've noticed that current implementation (envoy-wasm) implemented a different workflow with regards to buffered data:
    • native Envoy extensions have access both to the new chunk of data and the data buffered previously; Proxy Wasm extensions can see only one of those at a time
    • when a native Envoy extension returns StopIterationAndBuffer, next time it will be called with the new chunk of data; Proxy Wasm extensions get called next time with data in the buffer (which doesn't include new chunk yet)
    • native Envoy extensions observe similar behaviour for StopIterationAndBuffer and StopIterationAndWatermark; Proxy Wasm` extensions observe different behaviour

If Proxy Wasm needs a different model for buffered data than Envoy, it should be documented.

* returns:
- `i32 (proxy_result_t) call_result`

Resume processing of paused downstream or upstream (`stream_type`) connection

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to document what exactly should happen on call to proxy_resume_stream ?

E.g.,

  • in what states is it valid to call this function ? Pause / WaitForMoreData / WaitForEndOrFull
  • will proxy_on_upstream_data/proxy_on_downstream_data be called again with all data in buffer ? will behaviour be different for Pause / WaitForMoreData / WaitForEndOrFull ?

* returns:
- `i32 (proxy_result_t) call_result`

Resume processing of paused HTTP request or response (`stream_type`) in current

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to document what exactly should happen on call to proxy_resume_stream ?

E.g.,

  • in what states is it valid to call this function ? Pause / WaitForMoreData / WaitForEndOrFull
  • will proxy_on_request_body/proxy_on_response_body be called again with all data in buffer ? will behaviour be different for Pause / WaitForMoreData / WaitForEndOrFull ?

that will be used by the host environment in subsequent calls. Returning `0`
retains the original context identifier.

#### `proxy_on_context_finalize`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand motivation to exclude proxy_on_log from the lifecycle of Http Filter / Network Filter.

But why not have this callback to be implemented by Access Logger extensions ?

- `i32 (bool) is_done`

Called when the host environment is done processing context (`context_id`).
Any finalization (e.g. logging) should be done within this callback.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_property was also used by SDKs to get ahold of Root ID and implement support for multiple extensions packed inside a single Wasm module.

- `i32 (uint32_t) new_context_id`

Called when the host environment creates a new context (`context_id`) of type
`context_type` (e.g. `VmContext`, `PluginContext`, `StreamContext`,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, since logging can happen in the "finalize" calls.

Handling access logs typically involves buffering, timeout, http/grpc call. All these actions have less sense in the context of an individual HTTP request / TCP connection than in the context of a dedicated Access Logger extension.

Copy link

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, this is a very readable spec. Since I haven't spent any time writing Wasm yet, my comments reflect those of a new reader and someone coming from the Envoy world; clarifying these would help make the ABI clearer to folks with this perspective.


The ABI is in the form of: `<major>.<minor>.<patch>`

Breaking changes require increasing the major version.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to clarify somewhere what is meant by a breaking change. One experience we've had from xDS is that distinguishing between build and runtime breakage is important (I imagine it's both in this case). Also, bug fixes can be pretty ambiguous, e.g. a fix that introduces the "correct" behavior as per the specification, but where the existing implementation(s) all followed the "incorrect" behavior can be a challenge to handle.

## Naming conventions

All functions are using `snake_case`, and they start with the `proxy_` prefix. Implementation
specific extensions should use different prefixes (e.g. `envoy_`, `nginx_`, `ats_`, etc.) in order
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markdroth will we ever have Wasm gRPC interceptors? Have you folks been tracking these ABI specs? This is another purportedly multi-client ABI/API that might benefit from gRPC input if you plan on supporting at some point in the future.


## Memory ownership

No memory-managed data is passed to Wasm module in any of the event callbacks, and Wasm module must
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the term "memory-managed data" needs to be spelled out in a specification like this. It means different things in different situations, e.g. depending on who is managing the data, whether you have GC, etc. I think the point you want to make is that ownership of all memory that participates in ABI calls belongs to the Wasm code, it's allocated/freed by it. So, you might want to say that "No host managed memory is referencable by a Wasm module" or something like that as well. This is all semantics, but I think it makes the spec more readable.


Allocates continuous memory buffer of size `memory_size` using the in-VM memory
allocator and returns address of the memory buffer to the host environment using
`memory_data`. Returning `0` indicates failure.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you using 0 or null in this spec for pointers?

shouldn't be called again for the current stream.
- `Pause` means that further processing should be paused until
`proxy_resume_stream` is called.
- `Close` means that this connection should be rejected.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, in Envoy we handle these via a callback model, which is strictly more general as you can also do things like create local replies. What is the rationale for moving to return?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I see why this applies for proxy_on_new_connection, but not for proxy_on_downstream_data.

- `Continue` means that this connection should be accepted.
- `Done` means that this connection should be accepted and that this extension
shouldn't be called again for the current stream.
- `Pause` means that further processing should be paused until
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you thought about how watermarking, flow control and/or other buffer management is going to work in Wasm? The Envoy experience is that this is hard to add after the fact and would best be designed from the get-go. @jplevyak @mattklein123 @alyssawilk.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last we talked about this the API was going to replicate all of the filter return codes we already have, so watermarking should be included, though I haven't read that far yet.


### Functions exposed by the host environment

#### `proxy_dispatch_grpc_call`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a difference in the ABI between a call and a stream? Why can't we just treat everything like a stream and let higher level code deal with the distinction?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only difference I think is in terms of ideas like timeout, which the SDK can potentially handle itself.


### Functions exposed by the host environment

#### `proxy_open_shared_kvstore`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't the kvstore be used to model headers? It seems a bit awkward to have two different ABI mechanisms for key-values.

* returns:
- `i32 (proxy_action_t) next_action`

Called when HTTP response headers for stream `stream_id` are received from
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning on explicitly modeling 1xx in the ABI?

Copy link

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool to see this moving forward!

Got through up to the L7 section - some thoughts from my end

portability.

Function names are in the form of: `proxy_<verb>_<namespace>_<field>_<field>` (e.g.
`proxy_on_http_request_headers`, `proxy_get_map_value`, etc).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nitty, but on is a preposition or adverb?

when its internal buffers are full.
- `Close` means that this connection should be closed immediately.

#### `proxy_on_downstream_close`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is full close, and half close is indicated by end stream?

Return value (`next_action`) instructs the host environment what action to take:
- `Continue` means that data should be forwarded downstream.
- `EndStream` means that data should be forwarded downstream and that this
connection should be closed afterwards.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Envoy has 3 close types - NoFlush (which seems to correspond to Close), FlushWrite, which flushes the data then closes the socket once it's all written, and FlushWriteAndDelay to handle the case where there's bidirectional data, and we want to leave time for the peer to receive the FIN as it might be sending data and if you immediately close the connection the peer could get their connection RST and not actually receive the data.

Do we want to allow all 3 from wasm? Else I'm not sure if Done maps to FlushWrite or FlushWriteAndDelay.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented on this above. It's not super clear to me that closing should be a return code from the filter vs. a callback API, but I agree we need to be able to do all of these types.

* returns:
- `i32 (proxy_result_t) call_result`

Close downstream or upstream (`stream_type`) connections in current context.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments above about Envoy close types

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if we have this why do we need the return codes for end stream, etc.? Can we just have a stop iteration return code?

* returns:
- `i32 (proxy_action_t) next_action`

Called for each chunk of HTTP request body for stream `stream_id` received from

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chunk in the context of HTTP could be confused for HTTP/1.1 chunks. rephrase or clarify?


* params:
- `i32 (uint32_t) stream_id`
- `i32 (size_t) body_size`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

body size or size of this chunk?

- `EndStream` means that HTTP trailers should be forwarded downstream and that
the underlying stream should be closed afterwards.
- `Done` means that HTTP trailers should be forwarded downstream and that this
extension shouldn't be called again for this HTTP response.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are EndStream and Done redundant for trailers calls?

- `EndStream` means that this HTTP request body chunk should be forwarded
upstream and that the underlying stream should be closed afterwards.
- `Done` means that this HTTP request body should be forwarded upstream and that
this extension shouldn't be called again for this HTTP request.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for HTTP bodies specifically it would be useful to have DoneWithData.

I could easily imagine filters which don't want to process body data (or incur the context switch to even be informed about it) but do want to interact with trailers (and maybe metadata).

upstream and that the underlying stream should be closed afterwards.
- `Done` means that this HTTP request body should be forwarded upstream and that
this extension shouldn't be called again for this HTTP request.
- `Pause` means that further processing should be paused until `proxy_resume` is

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WaitForMoreData would call this callback when more data arrives where Pause would swallow that event until the wasm filter resumed.

I assume this maps to StopAllIterationAndBuffer, rather than StopIteration (which is kind of the Envoy default) or StopAllItreationAndWatermark. Might be worth clarifying?

Returns `InvalidMemoryAccess` when `response_code_details_data`,
`response_code_details_size`, `response_body_data`, `response_body_size`,
`additional_headers_map_data` and/or `additional_headers_size` point to invalid
memory address.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we've added the API yet but it's on my todo list to allow trySendLocalReply on the encode path (see comments in sendLocalReply for the 3 best-effort modes)

given wasm APIs are supposed to be stable it's worth considering starting with this from the get go.

Copy link
Collaborator

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very excited to see this happening. Flushing out some comments from a first pass.

Returns `InvalidMemoryAccess` when `message_data` and/or `message_size` point to
invalid memory address.

### Callbacks optionally exposed by the Wasm module
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I would have a required section and be clear that a module will fail to load unless it implements all required exports.

- `i32 (proxy_result_t) call_result`

Indicates to the host environment that the Wasm module finished processing
current context. It should be used after previously returning `false` in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you might want to mention proxy_set_effective_context which I assume might need to be called if the contexts have switched before finalizing.

- `Continue` means that this connection should be accepted.
- `Done` means that this connection should be accepted and that this extension
shouldn't be called again for the current stream.
- `Pause` means that further processing should be paused until
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last we talked about this the API was going to replicate all of the filter return codes we already have, so watermarking should be included, though I haven't read that far yet.

Called when the host environment starts WebAssembly Virtual Machine (WasmVm).

Its configuration (of size `vm_configuration_size`) can be retrieved using
`proxy_get_buffer`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I find this proxy_get_buffer dance to be a bit magical from an API design perspective. What if in the future we have two buffers that we want to get? IMO for this type of thing I think it would be much cleaner and more future proof to pass an opaque_handle that could then be passed back to proxy_get_buffer, or a size and handle if that makes more sense. This would make it very explicit what the user is fetching if we ever need to add the ability to fetch multiple things on the context of an API call.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. I was going to comment on this, but I realized that for the practical use cases in the existing ABI that the key/value method seemed to work fine. But, I suspect that the proposal here will future proof and make the ABI a little more regular.

Comment on lines +216 to +217
- `EndStream` means that data chunk should be forwarded upstream and that this
connection should be closed afterwards.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean in terms of half duplex? It's a bit odd to me that this is a return code. Would it be better to have this happen via a callback on the connection?


Returns `InvalidOperation` when stream wasn't paused.

#### `proxy_close_http_stream`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment about if we have this why do we need the special return codes from the filter callbacks. Seems like it would be simpler to reduce the return codes there and just have people call this before they stop iteration or whatever, though I don't feel strongly about it.

Comment on lines +599 to +600
slices pointed by a list of vectors (`return_buffer_data`,
`return_buffer_size`).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify this? The memory is allocated by callout via the allocate API, right? I'm not completely clear on what are in/out params here and how this works. I think more text would help.


Sets content of the buffer `buffer_type` to the values from memory slices
pointed by the provided list of vectors (`buffer_data`, `buffer_size`),
replacing `size` bytes in the existing buffer, starting at the `offset`. Host
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add this type of thing into the spec? It would help similar to my comments above.

`proxy_create_timer`.


## Stats/metrics
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did we land on the differences between counters/gauges/histograms? Do we need a histogram API? It's not clear to be from reading the spec how a generalized API makes sense, or it seems like it should be more clear about what APIs are supported on what types and otherwise will return an error.

connection is closed.


## Custom extension points
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 not totally clear how this would be used. An example would be great.

@seblaz
Copy link

seblaz commented Dec 1, 2021

Hi @PiotrSikora, is there something I can do to help you with this changes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.