Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Headers (User-Agent) containing non UTF-8 are not well handled (502 http error) #509

Closed
bburnichon opened this issue Aug 5, 2022 · 13 comments
Labels
help wanted Extra attention is needed

Comments

@bburnichon
Copy link

bburnichon commented Aug 5, 2022

In one of my projects, created ages ago (prior to going async) I could handle the issue by myself:

use lambda_http::{
  lambda_runtime::Context,
  Body, Request, Response,
};

// snipped

    // Call handler function based on path
    fn run(&self, req: Request, ctx: Context) -> Result<Response<Body>, Error> {
        let user_agent = req
            .headers()
            .get("User-Agent")
            .map_or("(no user agent)", |ua| {
                ua.to_str().unwrap_or("(non utf-8 user-agent)")
            });

I could then handle bad headers by simply ignoring them or return a 400 status code.

I should and will upgrade my code to version 0.6.0 but I can check that this is failing still in lambda_runtime even before it is made available for my code.

test.json

{
  "requestContext": {
      "elb": {
          "targetGroupArn": "arn:aws:elasticloadbalancing:region:123456789012:targetgroup/my-target-group/6d0ecf831eec9f09"
      }
  },
  "httpMethod": "OPTIONS",
  "path": "",
  "queryStringParameters": {},
  "headers": {
      "accept": "application/json",
      "content-type": "application/json",
      "host": "my-app-287742195.eu-west-3.elb.amazonaws.com:443",
      "user-agent": "Beno\xEEt's User-Agent",
      "x-amzn-trace-id": "Root=1-5bdb40ca-556d8b0c50dc66f0511bf520",
      "x-forwarded-for": "72.21.198.66",
      "x-forwarded-port": "443",
      "x-forwarded-proto": "https",
      "origin": "example.com"
  },
  "isBase64Encoded": false,
  "body": ""
}

Testing the above payload gives the error:

cargo lambda start &
cargo lambda invoke --data-file test.json
Error: Error("invalid escape", line: 14, column: 27)
[Finished running. Exit status: 1]

I could check the same behavior on a real lambda instance to which I sent an actual request and not a crafted payload as above.

curl https://example.com/myaws-fsafs/test --header "User-Agent: Beno`echo -ne '\xEE'`t's User-Agent"
@calavera
Copy link
Contributor

calavera commented Aug 5, 2022

Error: Error("invalid escape", line: 14, column: 27)

I believe that error comes from Serde when we try to deserialize the event.

The problem might be probably here:
https://github.com/LegNeato/aws-lambda-events/blob/master/aws_lambda_events/src/custom_serde/headers.rs#L80-L128

Did the real lambda crash too? I would have expected that information to be sanitized before arriving to the runtime 🤔

@bburnichon
Copy link
Author

The real lambda does not crash and can receive other requests but returns 502 http errors. I think only a worker thread crashes and the main thread then recovers and continues processing data.

Errors are not available at lambda but on elb side. Hopefully I had some monitorings on both and detected 502 surge but had a hard time understanding the root cause of the issue.

@calavera
Copy link
Contributor

calavera commented Aug 7, 2022

I'm unsure how you could handle this before because the code has not changed in that area very much. When the runtime receives the next event from the Lambda API, it tries to deserialize the response body into the structure that gets passed to your function, with the following code:

Here in the current version:
https://github.com/awslabs/aws-lambda-rust-runtime/blob/main/lambda-runtime/src/lib.rs#L124

And here in one of the early versions:
https://github.com/awslabs/aws-lambda-rust-runtime/blob/v0.2.0/lambda-runtime/src/lib.rs#L83

But anyways, you can see the problem reproduced plainly with serde_json and your payload:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=845cd133e6c5f12cd2b7ea1a1676fd79

@bburnichon
Copy link
Author

Probably somewhere, target deserialization type changed from byte slice to str.

https://github.com/serde-rs/json/blob/master/src/de.rs#L1571-L1573

I do agree that the payload is somewhat invalid. But instead of deserializing headers into Cow<_, str>, it should be deserialized into Bytes[] prior to store it as HeaderValue (which are slices of bytes)

@calavera
Copy link
Contributor

calavera commented Aug 9, 2022

Contributions to solve this are very welcome, I did not find a way to fix it myself so far.

@bburnichon
Copy link
Author

I am also looking at this issue. But it seems serde_json does not like non-utf8 characters at all. The least that can be done is handle the error and send a 400 status error.

By the way, I rechecked the old behavior and it seems there were also an issue handling this case but the output was different

With the old program I have normal processing of the request:

bburnichon@bburnichon-Latitude-7420:~/Projects/service$ docker run --rm -v "$PWD":/var/task -e RUST_LOG=info lambci/lambda:provided handler "$(cat test.json)"
{"message":"Starting up","level":"info","date":"2022-08-09T07:56:07.919139520Z"}
{"message":"Initialization done, starting event loop","level":"info","date":"2022-08-09T07:56:07.923578006Z"}
START RequestId: 14a9607f-7a23-1ef0-19f6-0427538ff58e Version: $LATEST
{"message":"Received new event with AWS request id: 14a9607f-7a23-1ef0-19f6-0427538ff58e","type":"lambda_runtime_core::runtime","level":"info","date":"2022-08-09T07:56:07.926159133Z"}
{"message":"Handler returned an error for 14a9607f-7a23-1ef0-19f6-0427538ff58e: JsonError: invalid escape at line 14 column 88","type":"lambda_runtime_core::runtime","level":"error","date":"2022-08-09T07:56:07.926506718Z"}
{"message":"Error response for 14a9607f-7a23-1ef0-19f6-0427538ff58e accepted by Runtime API","type":"lambda_runtime_core::runtime","level":"info","date":"2022-08-09T07:56:07.927456821Z"}
END RequestId: 14a9607f-7a23-1ef0-19f6-0427538ff58e
REPORT RequestId: 14a9607f-7a23-1ef0-19f6-0427538ff58e	Init Duration: 40.51 ms	Duration: 3.35 ms	Billed Duration: 4 ms	Memory Size: 1536 MB	Max Memory Used: 18 MB	

{"errorType":"JsonError","errorMessage":"JsonError: invalid escape at line 14 column 88"}
bburnichon@bburnichon-Latitude-7420:~/Projects/service$ 

With new program:

bburnichon@bburnichon-Latitude-7420:~/Projects/service$ docker run --rm -v "$PWD":/var/task -e RUST_LOG=info lambci/lambda:provided handler "$(cat test.json)"
{"message":"Starting up","level":"info","date":"2022-08-09T08:01:01.661130563Z"}
{"message":"Initialization done, starting event loop","level":"info","date":"2022-08-09T08:01:01.662750520Z"}
START RequestId: 7aad8189-9d59-106a-0825-1a6c9e4d5ff4 Version: $LATEST
{"message":"An error occurred: invalid escape at line 14 column 88","level":"error","date":"2022-08-09T08:01:01.664194102Z"}

then I need to kill the process to have the console back.

@calavera calavera added the help wanted Extra attention is needed label Aug 9, 2022
@bburnichon
Copy link
Author

Actually, the issue is that JSON payload of the ALB is invalid. Should an issue be declared there and find a way to handle these kind of issues

@calavera
Copy link
Contributor

the issue is that JSON payload of the ALB is invalid.

can you explain this further?

@bburnichon
Copy link
Author

serde_json considers a JSON "Beno\xEEt's User-Agent" as invalid because it is trying to have a raw character outside ASCII and not UTF-8 character. According to https://www.json.org/json-en.html or https://www.ietf.org/rfc/rfc4627.txt , \xXX escape sequence does not exist, the only one existing is for \uXXXX. serde_json accepts \x escape sequence but only for values in the range [\x00-\x7f]

Then, the JSON produced by AWS infrastructure as an input to the lambda is not valid.

I am afraid that the way the input JSON is currently produced, it can not be converted to an http::Request using serde_json.

If a header name or value contains characters outside of ASCII/UTF-8 characters, a new payload should be created/designed.
A simple way could be to add headers_are_base64_encoded: bool flag To be able to change the deserializer and properly handle those cases.

Otherwise, lambda runtime should be able to detect deserializing issues and respond with a generic Response. (For my case, I could send a 400 http response indicating invalid characters detected in headers.)

@calavera
Copy link
Contributor

For my case, I could send a 400 http response indicating invalid characters detected in headers

we're already reporting an error when we cannot deserialize the payload, see #520

@bburnichon
Copy link
Author

we're already reporting an error when we cannot deserialize the payload, see #520

Nice! Did not see this. I should definitively take time to upgrade to version including this pr.

@calavera
Copy link
Contributor

I'm closing this ticket because it's not actionable. Serde doesn't handle non UTF-8 characters, and we don't have plans to fix this.

@calavera calavera closed this as not planned Won't fix, can't repro, duplicate, stale Jun 30, 2023
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for the maintainers of this repository to see.
If you need more assistance, please open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants