Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handler: Allow more filetypes as Content-Type #1253

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

masewo
Copy link

@masewo masewo commented Mar 7, 2025

This pull request includes several changes to the pkg/handler package, focusing on improving the handling of MIME types and adding new tests. The most important changes include fixing a regular expression, updating comments, modifying the MIME type whitelist, and adding new unit tests.

Improvements to MIME type handling:

Testing enhancements:

@@ -34,7 +34,7 @@ const (
var (
reForwardedHost = regexp.MustCompile(`host="?([^;"]+)`)
reForwardedProto = regexp.MustCompile(`proto=(https?)`)
reMimeType = regexp.MustCompile(`^[a-z]+\/[a-z0-9\-\+\.]+$`)
reMimeType = regexp.MustCompile(`^(?:application|audio|example|font|haptics|image|message|model|multipart|text|video|x-(?:[0-9A-Za-z!#$%&'*+.^_` + "`" + `|~-]+))\/([0-9A-Za-z!#$%&'*+.^_` + "`" + `|~-]+)((?:[ \t]*;[ \t]*[0-9A-Za-z!#$%&'*+.^_` + "`" + `|~-]+=(?:[0-9A-Za-z!#$%&'*+.^_` + "`" + `|~-]+|"(?:[^"\\]|\.)*"))*)$`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for this PR! I'm a bit worried about this regular expression as I'm not able to understand it due to its complexity. I would be in favor of exploring solutions using Go's builtin media type parser (see #1194 (comment)) to reduce the ease of maintenance for us.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go's builtin media type parser (https://pkg.go.dev/mime#ParseMediaType) only parses the media type, which means it separates type and subtype from the optional parameters (as defined in RFC 6838).
For example image_png; foo=bar becomes image_png with parameters foo -> bar but it does not check if image_png is a valid media types known by IANA (which it's not).

Copy link
Author

@masewo masewo Mar 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could:

Hints to Go's builtin mime package:

  • It has a very small buildin table.
  • It relies on information available in the OS. On Alpine for example you would have to install the package mailcap. On Windows we would not be able to check for application/zip since Windows matches .zip to application/x-zip-compressed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the PR using Go's buildin ExtensionsByType, also adding mailcap to the Dockerfile.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are misunderstand what this code is supposed to do and what it shouldn't do. Tusd itself does not care whether a media type has been registered with the IANA or not. Checking that is not tusd's responsibility. If you want to restrict file uploads to specific media types depending your application, the logic has to be implemented in the pre-create hook. Tusd will accept whatever media type pre-create allows.

The only way in which tusd inspects the media type is to determine whether it can instruct browsers to render the file content or force a download. image/png can be safely rendered, for example, while text/html cannot as it leads the door to XSS vulnerabilities. Hence, tusd only allows inline rendering via Content-Disposition for a list of selected media types. If a file with an unknown media type is encountered, tusd will still forward the media type to the client, but just instruct browsers to not render it directly.

As far as I understand, your original problem is that video/ogg files with parameters in the media type as served with a Content-Disposition header field that instructs browser to download the video instead of showing it directly. That's because the current regular expression for parsing media types does not support parameters. That's why I suggest replacing this limiting regular expression with Go's built-in parser which handles parameters correctly.

Go's builtin media type parser (https://pkg.go.dev/mime#ParseMediaType) only parses the media type, which means it separates type and subtype from the optional parameters (as defined in RFC 6838).

Yes, you are correct. But tusd only needs to parse the media type and check it against an internal list to determine the Content-Disposition header. It doesn't have to determine whether the media type is registered with the IANA (or any other registry).

I hope this helps.

Copy link
Author

@masewo masewo Mar 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Acconut I think I already understood it, since there is a good amount of comments explaining what the code is supposed to do, but I thought you would like to change the behaviour because of the link to tus/tus-node-server#655

I simplified the PR again. It uses the ParseMediaType now to get rid of the extensions. Hope this suits now more your goals.

@masewo masewo force-pushed the main branch 2 times, most recently from 469b083 to 432a76b Compare March 8, 2025 22:12
Copy link
Member

@Acconut Acconut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to hear that we are now on the same page!

@Acconut
Copy link
Member

Acconut commented Mar 8, 2025

Can you please have a look at the failing tests?

@masewo
Copy link
Author

masewo commented Mar 8, 2025

non-a-valid-mime-type is now a valid mime type. I changed it to non a valid mime type.

@masewo masewo requested a review from Acconut March 9, 2025 10:21
@masewo
Copy link
Author

masewo commented Mar 9, 2025

@Acconut Seems like a flaky e2e test. The TestUnexpectedNetworkReset sometimes fails on Windows, because the PATCH and HEAD requests happen too fast after each other, causing the file locker not unlock the lock file fast enough. Happens for me in every 10th run or so.

Adding a bit of delay between PATCH and HEAD seems to let pass the test more reliable, but maybe it also hides some underlying problem.

A conn.(*net.TCPConn).CloseWrite() (flush the sent data) before the RST seems to work also. According to Wireshark the RST gets send anyway. But maybe it is changing the purpose of this test too much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants