Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse_multipart_form does not parse "Content-Type" correctly. #782

Closed
rgankema opened this issue Nov 30, 2021 · 3 comments · Fixed by #832
Closed

parse_multipart_form does not parse "Content-Type" correctly. #782

rgankema opened this issue Nov 30, 2021 · 3 comments · Fixed by #832
Labels
parser HTTP (headers) parser

Comments

@rgankema
Copy link

rgankema commented Nov 30, 2021

Julia 1.6.4

Forms with fields with content types other than "text/plain" don't seem to get parsed correctly. Here's an MRE:

julia> form = HTTP.Form(["foo" => HTTP.Multipart("", IOBuffer("foo"), "application/octet-stream")])
julia> headers = [HTTP.content_type(form)]
julia> request = HTTP.Request("GET", "foo.nl", headers, read(form))
julia> HTTP.parse_multipart_form(request)
1-element Vector{HTTP.Multipart}:
 HTTP.Multipart(filename="", data=::Base.GenericIOBuffer{SubArray{UInt8, 1, Vector{UInt8}, Tuple{UnitRange{Int64}}, true}}, contenttype="text/plain", contenttransferencoding=""))

Note that the parsed Multipart has contenttype="text/plain" instead of "application/octet-stream"

@NHDaly
Copy link
Collaborator

NHDaly commented Mar 25, 2022

Okay, after investigating in #815, I've found that the failure to match comes from the x at the end of the regex, which from what i've read online, puts the regex into "comment mode." I'm still not exactly clear on what regex comment mode means.. this is what it says in the julia docs from ?r"":

  • x enables "comment mode": whitespace is enabled except when escaped with \, and #
    is treated as starting a comment.

Somehow that makes this regex fail to match:

infil> re_ct.re
r"(?i)Content-Type: (\S*[^;\s])"x

infil> header
"Content-Disposition: form-data; name=\"namevalue\"; filename=\"multipart.txt\"\r\nContent-Type: text/plain\r\n\r\n"

infil> occursin(r"(?i)Content-Type: (\S*[^;\s])"x, header)
false

infil> occursin(r"(?i)Content-Type: (\S*[^;\s])", header)
true

Do you know what that x is there for? What is it meant to do in this case? Can we safely get rid of it? Thanks!

@quinnj
Copy link
Member

quinnj commented May 25, 2022

PR up: #832. I imagine the original code probably just copied the regex usage from places like here where we do include comments in the regex string that we'd like to ignore. I think in general this isn't necessary though.

@NHDaly
Copy link
Collaborator

NHDaly commented May 27, 2022

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parser HTTP (headers) parser
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants