Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Envoy's proxy protocol filter now sanitizes TLV values, breaking some existing use cases #32718

Closed
fedya-at-db opened this issue Mar 6, 2024 · 6 comments · Fixed by #33146
Closed

Comments

@fedya-at-db
Copy link

Description:
I believe the commit from this PR to sanitize the proxy protocol values seems wrong. The sanitizeUtf8String function used modifies the bytes in a non-recoverable way, but the actual value isn't always meant to be a valid utf8 string. I believe the proxy protocol spec allows for this.

Materially, Google's private service connect (PSC) feature encodes a TLV value as a binary encoded int64. With this commit there's no way to recover the original value in most cases. Example value bytes that should be a valid PSC id
00 22 6b cf 0a 00 00 02 (decodes to 9688686178336770)

I think either using set_bytes within ProtobufWkt::Value (which doesn't seem to exist today) or escaping the bad bytes in a reverseable way would be necessary.

Severity:
This is an upgrade blocker for us as it'll break our production critical traffic since we won't be able to accurately compare private link ids. We'd either avoid upgrading envoy or need a local patch to undo the fix.

Repro steps:
The above commit's unit test already covers a relevant example, it's just the expected assertions makes existing use cases non-functional. Data should be stored as is for consumption by other filters (like the lua filter) or at least needs to be escaped to be proper utf8 in a reversible way.

Potential fixes:
Fix with least side effects is probably to try to only escape/sanitize the value when actually serializing it since memory binary data is ok. But this is pretty fragile since the field we're storing it in is just a string, suggesting serializing is ok. Ideally something like ProtobufWkt::Value has a bytes type, but it doesn't seem to because I guess it's meant to represent a Json value and that doesn't support binary data?

A less ideal fix is to save the data encoded in some reversable way (e.g. base64) and then document that for consumers. If they want the original data they can decode the value back out.

I guess another approach is to provide a bytes value separate from ProtobufWkt::Value for the proxy protocol filter to save data to. Then it's clearer during serialization that encoding/sanitization is necessary, and until that happens other filters can keep working with the exact payloads but this is probably the most intrusive from a code perspective.

cc @ggreenway

@fedya-at-db fedya-at-db added bug triage Issue requires triage labels Mar 6, 2024
@KBaichoo KBaichoo added area/proxy_proto and removed triage Issue requires triage labels Mar 6, 2024
@KBaichoo
Copy link
Contributor

KBaichoo commented Mar 6, 2024

cc @nezdolik

@yanavlasov
Copy link
Contributor

yanavlasov commented Mar 6, 2024

@fedya-at-db can you clarify how this fails for you, please? Do you have proprietary Envoy filter that uses TLV values from proxy protocol header?

@fedya-at-db
Copy link
Author

fedya-at-db commented Mar 7, 2024

Hi @yanavlasov !

Oh shoot, I forgot our PR to envoy to upstream that change never got submitted. So this PR allows a lua filter to access connectionStreamInfo and the associated dynamicMetadata, which is where the proxy_protocol filter writes TLV values. We have this in our local fork. This means we can use the TLV values in lua, and we base64 encode the GCP PSC id (and AWS privatelink id, but that one is valid utf8, though the first byte is a 0x01) into a header to proxy to upstream.

@akamac
Copy link

akamac commented Apr 10, 2024

Hi @yanavlasov !

Oh shoot, I forgot our PR to envoy to upstream that change never got submitted. So this PR allows a lua filter to access connectionStreamInfo and the associated dynamicMetadata, which is where the proxy_protocol filter writes TLV values. We have this in our local fork. This means we can use the TLV values in lua, and we base64 encode the GCP PSC id (and AWS privatelink id, but that one is valid utf8, though the first byte is a 0x01) into a header to proxy to upstream.

Does your fork allow to set TLVs from Lua (to pass to the upstream)?

@agrawroh
Copy link
Contributor

Hi @yanavlasov !
Oh shoot, I forgot our PR to envoy to upstream that change never got submitted. So this PR allows a lua filter to access connectionStreamInfo and the associated dynamicMetadata, which is where the proxy_protocol filter writes TLV values. We have this in our local fork. This means we can use the TLV values in lua, and we base64 encode the GCP PSC id (and AWS privatelink id, but that one is valid utf8, though the first byte is a 0x01) into a header to proxy to upstream.

Does your fork allow to set TLVs from Lua (to pass to the upstream)?

Yes, it does. The idea is to expose the connection-level metadata to LUA and read the TLVs from the Proxy Protocol filters and add those as HTTP headers which could be forwarded to the upstream.

cc @wbpcode I know the original PR was blocked on the refactoring which is non-trivial work. Is it fine to get this feature in and do follow-ups for the refactoring you suggested?

@yanavlasov
Copy link
Contributor

There is #33146 that modifies how TLV are stored in the metadata, which may solve this problem in the LUA filter.

ggreenway pushed a commit that referenced this issue May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants