Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

url.parse() doesn't correctly handle URL with no path component. #417

Closed
flrgh opened this issue Nov 8, 2023 · 3 comments · Fixed by #418
Closed

url.parse() doesn't correctly handle URL with no path component. #417

flrgh opened this issue Nov 8, 2023 · 3 comments · Fixed by #418
Assignees
Labels

Comments

@flrgh
Copy link

flrgh commented Nov 8, 2023

Here we have a URL with a port component (:8071) and a query string component (?token=xxx&type=http-bulk) but with the path component left out:

local url = require "socket.url"

local parsed = url.parse("https://listener-eu.logz.io:8071?token=xxx&type=http-bulk")

print(require("inspect")(parsed))

Result:

{
  authority = "listener-eu.logz.io:8071?token=xxx&type=http-bulk",
  host = "listener-eu.logz.io",
  port = "8071?token=xxx&type=http-bulk",
  scheme = "https"
}

I feel like one of the following must be true (probably need to read through an RFC to know for sure):

  • This is a valid URL, but it's not parsed correctly
  • This is not a valid URL for $reasons (path is required when query string is used or... something), and it should be rejected with an error

ref: Kong/kong#11931

@daurnimator
Copy link
Contributor

At least trying with lpeg_patterns, it parses correctly:

uri_patts = require "lpeg_patterns.uri"
uri_patts.uri:match("https://listener-eu.logz.io:8071?token=xxx&type=http-bulk")
host	listener-eu.logz.io
port	8071
scheme	https
query	token=xxx&type=http-bulk

So I'm going to say it's indeed an issue with luasocket.

@tzssangglass
Copy link

probably need to read through an RFC to know for sure

I read RFC3986-Syntax Components, which mentions that

  • The scheme and path components are required, though the path may be empty (no characters)
  • If a URI contains an authority component, then the path component must either be empty or begin with a slash ("/") character.
https://listener-eu.logz.io:8042?token=xxx&type=http-bulk
\___/   \______________________/ \______________________/
  |               |                        |       
scheme        authority                  query  


https://listener-eu.logz.io:8071?token=xxx&type=http-bulk matched

  • has authority, the path component can be empty(no characters)

@alerque
Copy link
Member

alerque commented Nov 8, 2023

Yes, this is evidently a bug. Even if the empty path was not valid the port can't have a query string in it!

Contributions welcome if anybody wants to jump in with a PR.

@alerque alerque added the bug label Nov 8, 2023
@alerque alerque self-assigned this Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.

4 participants