Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lenient file-type checking mode #10862 #10863

Merged
merged 9 commits into from
Jun 25, 2024

Conversation

jacobtylerwalls
Copy link
Member

@jacobtylerwalls jacobtylerwalls commented May 2, 2024

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Description of Change

There are several useful safety checks guarded behind the FILE_TYPE_CHECKING setting, not all of which have to do with the file type, strictly speaking.

The main reason this was off by default was the lack of a "lenient" option to allow through files whose extensions cannot be determined.

Add this lenient mode, default new projects to it, and deprecate the old true/false values.

Issues Solved

Closes #10862

Checklist

  • Unit tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Ticket Background

Demo

Run any management command (runserver, check, test, etc.) with python -Wall::DeprecationWarning manage.py ...

/web_root/arches/arches/apps.py:13: DeprecationWarning: Providing boolean values to FILE_TYPE_CHECKING is deprecated. Starting with Arches 8.0, the only allowed options will be None, 'lenient', and 'strict'.

Copy link
Contributor

@aarongundel aarongundel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like where this is heading, but I do have some concerns here.

One of the problems with file type checking (and the primary reason that it's optional) is because the filetype guesser is not exhaustive. There are lots of file types that it will not (and cannot) validate. A simple example is a text file with a txt extension. So if file type checking is permanently true, even if you add "txt" to FILE_TYPES - it will be returned as "unknown" and validation will fail. See test_unknown_filetypes in file_validator.py. We do some manual checking of xlsx and csv in there for this reason (they cannot be validated any other way - and this is admittedly not the best way to validate them - that is, actually reading potentially malicious content into the csv reader to verify that the file is safe!). I don't know of a good way around this issue. We could potentially allow for custom validation of files. That is to say, if you want a txt file, you can write a custom validator for said txt files and use your own logic to verify that it's acceptable. Maybe this is not an issue that needs to be solved now, since the option is still around until 8. But if we are telling people to turn it on to test it - a subset of users are almost certain to do so in production, in which case we're going to get questions on why files are not uploading after enabling the setting we warned them to enable.

@jacobtylerwalls
Copy link
Member Author

jacobtylerwalls commented Jun 5, 2024

What about keeping the option and having three values:

  • none (behavior today if FILE_TYPE_CHECKING is false)
  • lenient (behavior today if FILE_TYPE_CHECKING is true, modified to allow unknown filetypes through, so long as the extension matches something in FILE_TYPES)
  • strict (behavior today if FILE_TYPE_CHECKING is true--file types have to be both guessable and in FILE_TYPES)

We could deprecate any value other than None, "lenient", or "strict" and then map falsy values to None, and truthy values to "strict" until the deprecation period is over, and then set the new project template to "lenient".

That would solve the basic use case sending me here: we have useful checks hidden behind this setting, and we shouldn't need to off all of them just to let a few useful things through.

@jacobtylerwalls jacobtylerwalls changed the title Deprecate FILE_TYPE_CHECKING = False #10862 Add lenient file-type checking mode #10862 Jun 14, 2024
@jacobtylerwalls jacobtylerwalls force-pushed the jtw/deprecate-file-type-checking-setting branch from 62e82f4 to bcaae04 Compare June 14, 2024 21:01
Comment on lines +102 to +111
try:
self.read_zip_file(content, result)
except zipfile.BadZipFile as e:
return {
"success": False,
"data": FileValidationError(
message=e.args[0],
code=400,
),
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed that in #10885 I forgot to surface this error in the UI where FILE_TYPE_CHECKING is false and we go straight to trying read a file as a zip file.

@@ -312,6 +312,11 @@ def test_save_provisional_from_athoritative(self):

provisionaledits = provisional_tile.provisionaledits
self.assertEqual(tiles.count(), 2)
self.assertIn(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated, but not really worth a whole new PR for it: follow-up to #10860 where I wanted to add some more debug output, but was stymied by the KeyError being raised much earlier.

Copy link
Contributor

@aarongundel aarongundel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a a nice improvement!

@jacobtylerwalls jacobtylerwalls merged commit 89b5061 into dev/7.6.x Jun 25, 2024
7 checks passed
@jacobtylerwalls jacobtylerwalls deleted the jtw/deprecate-file-type-checking-setting branch June 25, 2024 15:21
@jacobtylerwalls
Copy link
Member Author

Thanks for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add "lenient" file-checking mode
2 participants