Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ext_proc: potential changes to attributes in request #32125

Closed
rshriram opened this issue Jan 31, 2024 · 30 comments
Closed

ext_proc: potential changes to attributes in request #32125

rshriram opened this issue Jan 31, 2024 · 30 comments
Labels
enhancement Feature requests. Not bugs or questions.

Comments

@rshriram
Copy link
Member

rshriram commented Jan 31, 2024

Today, we can send attributes in the ProcessingRequest (thanks to @jbohanon 's work). However, there are some gaps here that we can fix if we act quickly.

We send attributes as part of request header event only(https://github.com/envoyproxy/envoy/blob/main/api/envoy/service/ext_proc/v3/external_processor.proto#L212). But not as part of the body. This is assuming that someone will always do the headers, store the attributes and use it for body. It would be better to remove (deprecate/reserved) the attributes in the ProcessingRequest's request field (its not even published so it should be okay to reserve that field https://github.com/envoyproxy/envoy/blob/main/api/envoy/service/ext_proc/v3/external_processor.proto#L83) and add a new attributes field at the top level in ProcessingRequest

message ProcessingRequest {
 one_of { }
config.core.v3.Metadata metadata_context = 8;
map<string, google.protobuf.Struct> attributes = 9; // NEW
}

Doing so allows one to forward the attributes regardless of the ext_proc event.

Secondly, we also need a way to add additional header/value pairs for traffic sent to the ext_proc service. These are headers that are not part of the client request to Envoy and not meant for the regular upstream. Something along the lines of this change to the ExtProc filter is required to achieve this functionality

message ExternalProcessor {
  repeated string request_attributes = 5;

  // Envoy provides a number of :ref:`attributes <arch_overview_attributes>`
  // for expressive policies. Each attribute name provided in this field will be
  // matched against that list and populated in the response_headers message.
  // See the :ref:`attribute documentation <arch_overview_attributes>`
  // for the list of supported attributes and their types.
  repeated string response_attributes = 6;

  map<string, google.protobuf.Struct> static_attributes = 7; // NEW. 
@rshriram rshriram added enhancement Feature requests. Not bugs or questions. triage Issue requires triage labels Jan 31, 2024
@rshriram
Copy link
Member Author

cc @htuch @tyxia

@rshriram
Copy link
Member Author

rshriram commented Feb 1, 2024

@jbohanon as well (sorry I missed this).

@htuch
Copy link
Member

htuch commented Feb 1, 2024

I think the semantics we are after is that we send attributes (all of them) on exactly one message, the first message sent from Envoy to the server. No matter what lifecycle event it is (headers, first body chunk). Does this make sense to folks?

The alternative is we have a separate OOB metadata for this (e.g. using gRPC initial metadata). The reason I prefer attributes is that there is conceptual economy (when to use an attributes vs. metadata for example?) and also that we plan other ext_proc transports than gRPC, e.g. HTTP. For HTTP there is a fairly obvious mapping of initial metadata (it's just outer headers), but TBH it feels like we're mixing transport and data model unnecessarily.

@htuch
Copy link
Member

htuch commented Feb 1, 2024

I think there are actually two independent issues here:

  1. How do we sensibly optimize sending attributes for ext_proc so it only happens on the first message, including when request headers are not being sent over ext_proc? The idea of sending attributes on first message makes sense to me here.
  2. As discussed in ext_proc: allow override metadata without grpc_service #31544 with @sitano, there is a question on where things like authentication tokens go - is it initial metadata or is it attributes? It seems that we can provide semantics where Envoy attributes can be augmented with some static attributes and not use gRPC specific mechanisms. This becomes interesting when mapping attributes to other environments like ProxyWasm as well.

For (2) and authentication tokens, probably initial metadata is fine in practice, since this is really an RPC property, but when it becomes things like metadata (e.g. the config name/version being used on the LB) this probably matters across different environments. Probably a mix is fine.

Interested in hearing others' takes on (2). For (1) that seems less open ended.

@htuch
Copy link
Member

htuch commented Feb 1, 2024

Pinging @gbrail @mpwarres as well for thoughts.

@rshriram
Copy link
Member Author

rshriram commented Feb 1, 2024

Whichever way we go, @jbohanon can we pleaes make sure that you folks dont deploy your last PR to any customer? We still have a window to remove that attribute field from the ProcessingRequest.http_headers and move it to top level because currently, that field is not documented and marked as not-implemented-hide, and your PR just merged a couple of days ago.

@gbrail
Copy link
Contributor

gbrail commented Feb 1, 2024

You all are much closer to how you want to use metadata from Envoy in the ext_proc stream than I am -- I'll leave that to the team since I don't have any specific requirements, other than this was something that I knew we needed eventually.

As for additional headers, I can definitely see the case where you may want to send an HTTP header from Envoy to the ext_proc processor that's not part of the actual request / response flow. For example, perhaps you need to send a security token, API key, or some metadata to help the processor decide what to do. It makes sense to me that you would want that in the proto, and then need some way to configure that, either statically or dynamically, etc.

I could imagine needing a way to have some other Envoy filter or bit of logic calculate a value from the request info, or generate a security token, that needs to be forwarded to the ext_proc processor, but not to the upstream or downstream systems. That would imply that not only is there a field in the proto to do this, but that there would be somewhere in the Envoy metadata where another filter could stash data to be sent to ext_proc in this way.

@rshriram
Copy link
Member Author

rshriram commented Feb 1, 2024

Here is a taxonomy we could use to decide..

headers - properties of a HTTP request and content is what the client sends to Envoy. Needs to be sent per request/response.

client_attributes - information extracted/inferred by Envoy from the connection or session with the client (principals, ips, etc.) [attributes like request.headers dont make much sense here IMO]. But the entirety of dynamic metadata should fall into this as its inferred state. If we disregard the request.headers/response.headers or per request attributes, then it stands to reason that its sufficient to send this state once per new client session, i.e. per new ext_proc stream

rpc_metadata - information that the user/operator (not client) provides to Envoy to send to the ext_proc/ext_authz server for various purposes. Its just out of band info between Envoy & the server. Has no bearing on the client connection or the HTTP payload. In fact, its common across many clients/HTTP requests entering Envoy. Needs to be sent once at the start of channel/connection establishment to ext_proc server, i.e. shared across ext_proc streams.

Sadly gRPC provides a mechanism to add rpc_metadata only per stream (many streams per channel). And grpc streams correspond to individual client connections to Envoy, in ext_proc land. So, we could either just merge attributes & rpc_metadata into one but it feels like we are piling them all into one?

@rshriram
Copy link
Member Author

rshriram commented Feb 1, 2024

So a refined proposal is as follows:

message ProcessingRequest {
 config.core.v3.Metadata metadata_context = 8; // has rpc_metadata
 map<string, google.protobuf.Struct> attributes = 9;  // has client attributes
}

message ExternalProcessor {
  repeated string request_attributes = 5; // attributes to forward
  repeated string response_attributes = 6;

  map<string, google.protobuf.Struct> rpc_metadata = 7; // NEW. Goes into metadata_context.

@jbohanon
Copy link
Contributor

jbohanon commented Feb 1, 2024

I think the semantics we are after is that we send attributes (all of them) on exactly one message, the first message sent from Envoy to the server. No matter what lifecycle event it is (headers, first body chunk). Does this make sense to folks?

The issue here is that not all of the attributes are available at each potential point where a processing request may be sent. For example, response.* attributes are only available after the upstream request completes. If a user requires request and response processing, then they would have to be included in separate messages.

That said, I do agree with the proposal of moving the attributes field to the top-level of the processing request alongside metadata. Would it make sense and be efficient to keep track of attributes sent and only include attributes that are new or changed for any messages after the first message sent from Envoy to the processing server?

Whichever way we go, @jbohanon can we pleaes make sure that you folks dont deploy your last PR to any customer? We still have a window to remove that attribute field from the ProcessingRequest.http_headers and move it to top level because currently, that field is not documented and marked as not-implemented-hide, and your PR just merged a couple of days ago.

Yes, our customers that have been toying with this functionality are fully under the caveat that the API may change before final release.

@htuch
Copy link
Member

htuch commented Feb 1, 2024

For example, response.* attributes are only available after the upstream request completes. If a user requires request and response processing, then they would have to be included in separate messages.

Right - I think this would be a property in both directions. The very first request-side message has the request attributes, the very first response-side message has the response attributes.

@htuch
Copy link
Member

htuch commented Feb 1, 2024

Regarding

I could imagine needing a way to have some other Envoy filter or bit of logic calculate a value from the request info, or generate a security token, that needs to be forwarded to the ext_proc processor, but not to the upstream or downstream systems. That would imply that not only is there a field in the proto to do this, but that there would be somewhere in the Envoy metadata where another filter could stash data to be sent to ext_proc in this way.

and

rpc_metadata - information that the user/operator (not client) provides to Envoy to send to the ext_proc/ext_authz server for various purposes. Its just out of band info between Envoy & the server.

and

So, we could either just merge attributes & rpc_metadata into one but it feels like we are piling them all into one?

I think that's fair, but we do then need some way to dynamically set things like auth tokens (RPC metadata) if they don't come from the control plane, e.g. when they are generated by a previous filter or extension. One option would be to do initial metadata as today but then in the future allow header values subject to header formatter substitution, to provide access to dynamic metadata and filter state.

If we go with this, I think my preference is to have distinct request / response attributes at top-level but then we can just use initial metadata as today for the RPC metadata.

@jbohanon
Copy link
Contributor

jbohanon commented Feb 2, 2024

I think overloading the term attributes to mean not only Attributes as defined in the docs and used elsewhere but also dynamic metadata and any other inferred properties of a request adds needless confusion.

@rshriram
Copy link
Member Author

rshriram commented Feb 2, 2024

So whats the verdict? are we sticking with grpc initial_metadata for auth stuff as grpc intended them to be, but still move the attributes field to top of processingrequest?

@htuch
Copy link
Member

htuch commented Feb 2, 2024

@jbohanon I'm confused by:

I think overloading the term attributes to mean not only Attributes as defined in the docs and used elsewhere but also dynamic metadata and any other inferred properties of a request adds needless confusion.

The docs for attributes already state that dynamic metadata is an attribute, see https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/advanced/attributes.html#configuration-attributes.

These are legitimate first-class attributes today, although not listed under "request attributes". It's entirely reasonable that we pass these to ext_proc and we (Google) probably will in the future, as use cases for chaining together ext_proc calls emerge and we want to dynamically get/set state similar to native filters.

@htuch
Copy link
Member

htuch commented Feb 2, 2024

@rshriram re:

So whats the verdict? are we sticking with grpc initial_metadata for auth stuff as grpc intended them to be, but still move the attributes field to top of processingrequest?

I think we will continue to support initial metadata and for metadata that is RPC specific like an auth token it probably belongs there. We need some story to dynamically generate them in the future but I have a suggestion for this (i.e. with header formatter) above.

For other attributes, e.g. metadata giving config provenance and version, I think we should use Envoy attributes, which already have support for metadata and filter state. Every use case that motivated filter state or dynamic metadata basically applies to ext_proc services, and we definitely don't want a 3rd or 4th mechanism for OOB metadata given there is already established precedent for modeling these as attributes.

In any case, we need to fix things so that #32125 (comment) applies.

@jbohanon
Copy link
Contributor

jbohanon commented Feb 2, 2024

The docs for attributes already state that dynamic metadata is an attribute, see https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/advanced/attributes.html#configuration-attributes.

Ah I had not seen that there is an attribute for dynamic metadata (and filter state!). This is actually at the anchor metadata-and-filter-state for future readers.

The current state of the dynamic metadata feature in ext_proc allows fine-grained control over the forwarding namespaces. I think it is reasonable to keep the filter config API for dynamic metadata separate to allow this, while under the hood including it as attributes to the processing server. WDYT @htuch?

@rshriram
Copy link
Member Author

rshriram commented Feb 2, 2024

Clarifying intent.. are we agreeing on this then?

message ProcessingRequest {
 config.core.v3.Metadata metadata_context = 8; // has rpc_metadata
 map<string, google.protobuf.Struct> attributes = 9;  // NEW. has client attributes
}

message ExternalProcessor {
  repeated string request_attributes = 5; // attributes to forward
  repeated string response_attributes = 6;

@jbohanon , would you be able to update the API/code to use the attributes from top level processing request and fully remove the attributes from the http_headers section of api? Thankfully, its not marked as live yet.

@htuch
Copy link
Member

htuch commented Feb 2, 2024

I think we can skip metadata_context and just use initial metadata as today, the rest makes sense. I think for metadata specifically pertaining to the RPC, we can use initial metadata as already supported.

@jbohanon
Copy link
Contributor

jbohanon commented Feb 2, 2024

@jbohanon , would you be able to update the API/code to use the attributes from top level processing request and fully remove the attributes from the http_headers section of api? Thankfully, its not marked as live yet.

I would be able to get to this in the next few weeks. Could I be added to the group that allows me to be assigned for reviews in this space please?

@htuch
Copy link
Member

htuch commented Feb 2, 2024

@jbohanon invite sent for envoyproxy/assignable.

Regarding timing, I think we probably want to at least hide the fields under a not-implemented-hide or WiP annotation to allow for a breaking change to be made if it's more than a few days.

@rshriram
Copy link
Member Author

rshriram commented Feb 5, 2024

I think we can skip metadata_context and just use initial metadata as today, the rest makes sense. I think for metadata specifically pertaining to the RPC, we can use initial metadata as already supported.

My two cents would be to keep the API clean and keep these two fields separate. The metadata is a struct while the attributes are just string-string pairs. It also helps the ext_proc authors clearly distinguish what was configuration induced info vs derived info. From that point of view, the change is also very minimal:

message ProcessingRequest {
 config.core.v3.Metadata metadata_context = 8; // has rpc_metadata. Exists. No change.
 map<string, google.protobuf.Struct> attributes = 9;  // NEW. has attributes
}

message HttpHeaders {
  // map<string, google.protobuf.Struct> attributes = 9;  // current
  reserved 9; // NEW
}

@jbohanon
Copy link
Contributor

jbohanon commented Feb 5, 2024

As the change to promote attributes field up into the ProcessingRequest message is not controversial, I have more tightly scoped #32176 to that API change, and I can check with my team on when I will be able to get heads down on the internals of that.

@sitano
Copy link
Contributor

sitano commented Feb 5, 2024

I am sorry I am not participating in the discussion. I don't think I am deep enough in the context of the problem and can't bring anything useful at the moment. Can you pls clarify a bit what we are gonna do with new ProcessingRequest.attributes = 9? is it going to be a replacement for grpc_service.initial_metadata, an alternative of it or something else? also, what are consequences for #31544 in the light of what we have agreed on here?

@rshriram
Copy link
Member Author

rshriram commented Feb 6, 2024

@sitano ProcessingRequest.attributes will not replace the grpc_service.initial_metadata. However, if your goal is to transport metadata to your extension server, our suggestion would be to use the ProcessingRequest.metadata_context or even ProcessingRequest.attributes as the data flows as part of the RPC rather than as HTTP2 headers.

@martijneken
Copy link
Contributor

Clarifying question:

Do these string fields, to be attached with the first ProcessingRequest in each direction:

message ExternalProcessor {
  repeated string request_attributes = 5;
  repeated string response_attributes = 6;

Get written to this map? If so, what is the conversion from string to key-value?

message ProcessingRequest {
  map<string, google.protobuf.Struct> attributes = 9;  // NEW. has client attributes

@jbohanon
Copy link
Contributor

jbohanon commented Feb 7, 2024

Clarifying question:

Do these string fields, to be attached with the first ProcessingRequest in each direction:


message ExternalProcessor {

  repeated string request_attributes = 5;

  repeated string response_attributes = 6;

Get written to this map? If so, what is the conversion from string to key-value?


message ProcessingRequest {

  map<string, google.protobuf.Struct> attributes = 9;  // NEW. has client attributes

The attributes listed as a repeated string in the config will be included in the map in a struct under the key envoy.filters.http.ext_proc. The keys of the struct are the attribute names from the config, and the values are the derived values as documented here.

@jbohanon
Copy link
Contributor

What is the status of this issue? Are there outstanding decisions left to make? It seems to me that the merging of #32176 to move attributes to top-level of processing requests, and soon #31544 to allow adding/overriding HTTP headers included in the stream open request to the server covers the cases we were discussing.

cc @htuch @rshriram

@htuch
Copy link
Member

htuch commented Feb 20, 2024

I think this issue is largely addressed, @rshriram to confirm.

@rshriram
Copy link
Member Author

rshriram commented Mar 7, 2024

Yep. This is done.

@rshriram rshriram closed this as completed Mar 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests. Not bugs or questions.
Projects
None yet
Development

No branches or pull requests

7 participants