-
Notifications
You must be signed in to change notification settings - Fork 496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
handler: Allow more filetypes as Content-Type #1253
base: main
Are you sure you want to change the base?
Conversation
pkg/handler/unrouted_handler.go
Outdated
@@ -34,7 +34,7 @@ const ( | |||
var ( | |||
reForwardedHost = regexp.MustCompile(`host="?([^;"]+)`) | |||
reForwardedProto = regexp.MustCompile(`proto=(https?)`) | |||
reMimeType = regexp.MustCompile(`^[a-z]+\/[a-z0-9\-\+\.]+$`) | |||
reMimeType = regexp.MustCompile(`^(?:application|audio|example|font|haptics|image|message|model|multipart|text|video|x-(?:[0-9A-Za-z!#$%&'*+.^_` + "`" + `|~-]+))\/([0-9A-Za-z!#$%&'*+.^_` + "`" + `|~-]+)((?:[ \t]*;[ \t]*[0-9A-Za-z!#$%&'*+.^_` + "`" + `|~-]+=(?:[0-9A-Za-z!#$%&'*+.^_` + "`" + `|~-]+|"(?:[^"\\]|\.)*"))*)$`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for this PR! I'm a bit worried about this regular expression as I'm not able to understand it due to its complexity. I would be in favor of exploring solutions using Go's builtin media type parser (see #1194 (comment)) to reduce the ease of maintenance for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go's builtin media type parser (https://pkg.go.dev/mime#ParseMediaType) only parses the media type, which means it separates type and subtype from the optional parameters (as defined in RFC 6838).
For example image_png; foo=bar
becomes image_png
with parameters foo -> bar
but it does not check if image_png
is a valid media types known by IANA (which it's not).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could:
- Split the filetype at the
/
and check if both parts are not empty (really simple check). - Find a simpler regex.
- Use Go's buildin
ExtensionsByType
(https://pkg.go.dev/mime#ExtensionsByType), with which we could check if there are any file extensions associated with this media type. - Use Go's builtin
TypeByExtension
(https://pkg.go.dev/mime#TypeByExtension), with which we could check if the filename's extension matches with our filetype. What do we do if we have no filename? - Use external Go dependencies like filetype (https://github.com/h2non/filetype) or mimetype (https://github.com/gabriel-vasile/mimetype)
Hints to Go's builtin mime
package:
- It has a very small buildin table.
- It relies on information available in the OS. On Alpine for example you would have to install the package
mailcap
. On Windows we would not be able to check forapplication/zip
since Windows matches.zip
toapplication/x-zip-compressed
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the PR using Go's buildin ExtensionsByType
, also adding mailcap
to the Dockerfile
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are misunderstand what this code is supposed to do and what it shouldn't do. Tusd itself does not care whether a media type has been registered with the IANA or not. Checking that is not tusd's responsibility. If you want to restrict file uploads to specific media types depending your application, the logic has to be implemented in the pre-create
hook. Tusd will accept whatever media type pre-create
allows.
The only way in which tusd inspects the media type is to determine whether it can instruct browsers to render the file content or force a download. image/png
can be safely rendered, for example, while text/html
cannot as it leads the door to XSS vulnerabilities. Hence, tusd only allows inline rendering via Content-Disposition for a list of selected media types. If a file with an unknown media type is encountered, tusd will still forward the media type to the client, but just instruct browsers to not render it directly.
As far as I understand, your original problem is that video/ogg
files with parameters in the media type as served with a Content-Disposition header field that instructs browser to download the video instead of showing it directly. That's because the current regular expression for parsing media types does not support parameters. That's why I suggest replacing this limiting regular expression with Go's built-in parser which handles parameters correctly.
Go's builtin media type parser (https://pkg.go.dev/mime#ParseMediaType) only parses the media type, which means it separates type and subtype from the optional parameters (as defined in RFC 6838).
Yes, you are correct. But tusd only needs to parse the media type and check it against an internal list to determine the Content-Disposition header. It doesn't have to determine whether the media type is registered with the IANA (or any other registry).
I hope this helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Acconut I think I already understood it, since there is a good amount of comments explaining what the code is supposed to do, but I thought you would like to change the behaviour because of the link to tus/tus-node-server#655
I simplified the PR again. It uses the ParseMediaType
now to get rid of the extensions. Hope this suits now more your goals.
469b083
to
432a76b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to hear that we are now on the same page!
Can you please have a look at the failing tests? |
|
@Acconut Seems like a flaky e2e test. The Adding a bit of delay between PATCH and HEAD seems to let pass the test more reliable, but maybe it also hides some underlying problem. A |
This pull request includes several changes to the
pkg/handler
package, focusing on improving the handling of MIME types and adding new tests. The most important changes include fixing a regular expression, updating comments, modifying the MIME type whitelist, and adding new unit tests.Improvements to MIME type handling:
pkg/handler/unrouted_handler.go
: Fixed a regular expression inreMimeType
to correctly match MIME types with a plus sign.pkg/handler/unrouted_handler.go
: Updated comments to correct typos and improve clarity.pkg/handler/unrouted_handler.go
: Addedvideo/mp4
to themimeInlineBrowserWhitelist
and reordered entries for better readability.Testing enhancements:
pkg/handler/unrouted_handler_test.go
: Changed the package name fromhandler_test
tohandler
to allow testing of unexported functions.pkg/handler/unrouted_handler_test.go
: Added a new test functionTestFilterContentType
to ensure thefilterContentType
function behaves correctly with various metadata inputs.