Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HttpURI.parseQuery rejects [ and ] characters in path section #12259

Closed
joakime opened this issue Sep 11, 2024 · 9 comments
Closed

HttpURI.parseQuery rejects [ and ] characters in path section #12259

joakime opened this issue Sep 11, 2024 · 9 comments
Assignees
Labels
Bug For general bugs on Jetty side

Comments

@joakime
Copy link
Contributor

joakime commented Sep 11, 2024

Jetty version(s)
12.0.13

Jetty Environment
Any

Java version/vendor (use: java -version)
Any

OS type/version
Any

Description
As reported in

This change causes unencoded [ and ] to be rejected too (400 Illegal Path Character). Not sure if that is intended?

How to reproduce?

> GET /[] HTTP/1.1
> Host: api:8080
> User-Agent: curl/7.64.0
> Accept: */*
>
< HTTP/1.1 400 Bad Request
< Server: Jetty(12.0.12)
< Cache-Control: must-revalidate,no-cache,no-store
< Content-Type: text/html;charset=iso-8859-1
< Content-Length: 437
< Connection: close
<
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 400 Illegal Path Character</title>
</head>
<body>
<h2>HTTP ERROR 400 Illegal Path Character</h2>
<table>
<tr><th>URI:</th><td>/badURI</td></tr>
<tr><th>STATUS:</th><td>400</td></tr>
<tr><th>MESSAGE:</th><td>Illegal Path Character</td></tr>

The [ and ] are considered reserved characters in the gen-delims ABNF in the URI spec.
https://datatracker.ietf.org/doc/html/rfc3986#section-2.2

Those two characters are reserved for IPv6 or IPvLiteral authority sections on the URI.

Seems like the change from parsing the whole URI to just parsing the pathQuery is tripping up the gen-delims vs sub-delims nuance of the path parsing.

For the parsing of URI path, the ANBF doesn't mention that the gen-delims characters as part of pchar, is that the flaw?
See: https://datatracker.ietf.org/doc/html/rfc3986#section-3.3

@joakime joakime added the Bug For general bugs on Jetty side label Sep 11, 2024
@joakime joakime self-assigned this Sep 11, 2024
@joakime joakime moved this to 🏗 In progress in Jetty 12.0.14 Sep 11, 2024
@joakime
Copy link
Contributor Author

joakime commented Sep 11, 2024

I'm inclined to close this, as the input URL/URI for this to trigger would be http://localhost:8080/[]

That input URI would be in violation of https://datatracker.ietf.org/doc/html/rfc3986 as the [] would be seen as the authority.

Example, against Jetty 12.0.13 and the ee10-demo-jetty webapp.

$ curl -vvvv http://localhost:8080/[]
*   Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /[] HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.81.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 400 Bad Request
< Server: Jetty(12.0.13)
< Cache-Control: must-revalidate,no-cache,no-store
< Content-Type: text/html;charset=iso-8859-1
< Content-Length: 437
< Connection: close
< 
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 400 Illegal Path Character</title>
</head>
<body>
<h2>HTTP ERROR 400 Illegal Path Character</h2>
<table>
<tr><th>URI:</th><td>/badURI</td></tr>
<tr><th>STATUS:</th><td>400</td></tr>
<tr><th>MESSAGE:</th><td>Illegal Path Character</td></tr>
</table>
<hr/><a href="https://jetty.org/">Powered by Jetty:// 12.0.13</a><hr/>

</body>
</html>
* Closing connection 0

Also of note, is that Java itself doesn't like this URI and expects those characters to be encoded.

$ jshell
|  Welcome to JShell -- Version 17.0.11
|  For an introduction type: /help intro

jshell> var uu = new URI("http://localhost:8080/[]")
|  Exception java.net.URISyntaxException: Illegal character in path at index 22: http://localhost:8080/[]
|        at URI$Parser.fail (URI.java:2976)
|        at URI$Parser.checkChars (URI.java:3147)
|        at URI$Parser.parseHierarchical (URI.java:3229)
|        at URI$Parser.parse (URI.java:3177)
|        at URI.<init> (URI.java:623)
|        at do_it$Aux (#1:1)
|        at (#1:1)

jshell> var ue = new URI("http://localhost:8080/%5B%5D")
ue ==> http://localhost:8080/%5B%5D

jshell> ue.getPath()
$3 ==> "/[]"

@joakime
Copy link
Contributor Author

joakime commented Sep 11, 2024

@gregw thoughts?

@joakime joakime changed the title HttpURI.parseQuery incorrectly includes gen-delims in violation character space. HttpURI.parseQuery rejects [ and ] characters in path section Sep 11, 2024
@joakime
Copy link
Contributor Author

joakime commented Sep 11, 2024

The java URI parsing itself rejects the in the raw [ and ] for non-absolute URIs too.

jshell> var uu = new URI("/[]")
|  Exception java.net.URISyntaxException: Illegal character in path at index 1: /[]
|        at URI$Parser.fail (URI.java:2976)
|        at URI$Parser.checkChars (URI.java:3147)
|        at URI$Parser.parseHierarchical (URI.java:3229)
|        at URI$Parser.parse (URI.java:3188)
|        at URI.<init> (URI.java:623)
|        at do_it$Aux (#1:1)
|        at (#1:1)

And the java URL class will accept the unencoded path, but then reject an attempt to present it as a URI.

jshell> var uu = new URL("http://localhost:8080/[]")
uu ==> http://localhost:8080/[]

jshell> uu.getPath()
$2 ==> "/[]"

jshell> uu.toURI().toASCIIString()
|  Exception java.net.URISyntaxException: Illegal character in path at index 22: http://localhost:8080/[]
|        at URI$Parser.fail (URI.java:2976)
|        at URI$Parser.checkChars (URI.java:3147)
|        at URI$Parser.parseHierarchical (URI.java:3229)
|        at URI$Parser.parse (URI.java:3177)
|        at URI.<init> (URI.java:623)
|        at URL.toURI (URL.java:1056)
|        at (#3:1)

@moddular
Copy link

moddular commented Sep 23, 2024

FWIW this tripped us up due to some (IMO) slightly strange behaviour in nginx. We have a URL containing [ and ] along with some other characters that we would url encode and proxy_pass through to a service running jetty. It turns out when doing this nginx will decode %5B and %5D back to their raw form [ and ] (other encoded characters retained their encoding)

We haven't yet figured out a way to prevent nginx doing that so we had to rollback jetty for now.

https://stackoverflow.com/questions/67521017/nginx-rewriting-brackets-in-reverse-proxy

@joakime
Copy link
Contributor Author

joakime commented Sep 23, 2024

We have a URL containing [ and ] along with some other characters that we would url encode and proxy_pass through to a service running jetty. It turns out when doing this nginx will decode %5B and %5D back to their raw form [ and ] (other encoded characters retained their encoding)

That would be a different issue.
The [ and ] are allowed in the query section per the ANBF at https://datatracker.ietf.org/doc/html/rfc3986#appendix-A

@moddular
Copy link

Ah sorry, I think I was unclear - the urlencoded characters are in the path not the query.

e.g.

http://localhost:8000/path/with(encoded)values[here]

which would go out from nginx as:

proxy_pass http://localhost:8000/path/with%28encoded%29values%5Bhere%5D

but arrive at Jetty as

http://localhost:8000/path/with%28encoded%29values[here]

And now returns a 400 error.

@joakime
Copy link
Contributor Author

joakime commented Sep 23, 2024

The raw, unencoded, [ and ] are not allowed in the path portion per the URI RFC ABNF.

See https://datatracker.ietf.org/doc/html/rfc3986#section-3.3

The defined pchar in the path portion does not include the [ and ] characters.
Any character not in the pchar list must be pct-encoded.

https://datatracker.ietf.org/doc/html/rfc3986#appendix-A

It's a shame that nginx doesn't honor user provided pct-encoded.
But that seems like a bug in nginx.

@d2a-pnagel
Copy link

FWIW this tripped us up due to some (IMO) slightly strange behaviour in nginx. We have a URL containing [ and ] along with some other characters that we would url encode and proxy_pass through to a service running jetty. It turns out when doing this nginx will decode %5B and %5D back to their raw form [ and ] (other encoded characters retained their encoding)

We haven't yet figured out a way to prevent nginx doing that so we had to rollback jetty for now.

https://stackoverflow.com/questions/67521017/nginx-rewriting-brackets-in-reverse-proxy

@moddular I agree this is strange behaviour, but it is also (kind of) documented, and can be solved in the configuration. See https://serverfault.com/questions/459369/disabling-url-decoding-in-nginx-proxy

@joakime
Copy link
Contributor Author

joakime commented Oct 23, 2024

We are not going to make this change.

@joakime joakime closed this as not planned Won't fix, can't repro, duplicate, stale Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For general bugs on Jetty side
Projects
No open projects
Status: ✅ Done
Development

No branches or pull requests

3 participants