Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new regex config parameter version to blacklisted files and excluded directories #36360

Merged
merged 6 commits into from
Jan 3, 2020

Conversation

mmattel
Copy link
Contributor

@mmattel mmattel commented Nov 2, 2019

This PR is based on the work of @tdaget at PR #36331 (closed)
The referenced PR is now closed because of rebasing issues and CI will never get green there.
I improved the original work a bit.

Description

This PR enhances the existing config parameters blacklisted_files and excluded_directories by a new regex parameter version blacklisted_files_regex and excluded_directories_regex.

Related Issue

  • No issue related

Motivation and Context

Sometimes it is necessary to exclude/blacklist files or directories to be written/renamed/scanned on a more complex parameter than a static version as we have right now. This PR allows eg. to prevent uploading Microsoft Outlook .pst files or creating / renaming / scanning directories based on a pattern defined by a regex. The existing parameters are not touched.

How Has This Been Tested?

Running local with various scenarios successfully.

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Database schema changes (next release will require increase of minor version instead of patch)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Technical debt
  • Tests only (no source changes)

Checklist:

  • Code changes
  • Unit tests added
  • Acceptance tests added
  • Documentation ticket raised: Not necessary as doc is part of config.sample.php
  • Changelog item, see TEMPLATE

@codecov
Copy link

codecov bot commented Nov 2, 2019

Codecov Report

Merging #36360 into master will increase coverage by <.01%.
The diff coverage is 77.27%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #36360      +/-   ##
============================================
+ Coverage     64.68%   64.68%   +<.01%     
- Complexity    19106    19122      +16     
============================================
  Files          1269     1269              
  Lines         74728    74778      +50     
  Branches       1320     1320              
============================================
+ Hits          48337    48372      +35     
- Misses        26003    26018      +15     
  Partials        388      388
Flag Coverage Δ Complexity Δ
#javascript 54.12% <ø> (ø) 0 <ø> (ø) ⬇️
#phpunit 65.85% <77.27%> (ø) 19122 <11> (+16) ⬆️
Impacted Files Coverage Δ Complexity Δ
lib/private/Files/Filesystem.php 69.91% <77.27%> (+0.01%) 141 <11> (+16) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a9413e4...339c2df. Read the comment docs.

Copy link

@tdaget tdaget left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An error is located in each preg_match call, regex must be surrounded by \. Plus, as regex match is case insensitive an i must be added to the end of the string.

In lib/private/Files/Filesystem.php:
It might be: \preg_match('/'.$item.'/i', $path_part)

Copy link

@tdaget tdaget left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You add a comment for regex variables :

  • INFO: Compare pairs are converted to lowercase by default.

I'm ok for regular variables, for regex the comment might be adapted as things are not simple as a lowercase string.

Maybe :

  • INFO: regex matching is forced case insensitive.

@mmattel mmattel force-pushed the additional_regex_in_blacklisted_dirs_and_files branch from b8af495 to fc7d14a Compare November 2, 2019 23:06
@mmattel
Copy link
Contributor Author

mmattel commented Nov 2, 2019

@tdaget thanks for the comments, both fixed

@mmattel mmattel force-pushed the additional_regex_in_blacklisted_dirs_and_files branch from fc7d14a to c5fccd8 Compare November 3, 2019 10:56
@mmattel
Copy link
Contributor Author

mmattel commented Nov 3, 2019

@phil-davis can you do me a favour and take a look why CI fails.
I have rebased on latest master (on and on...) and it never gets green.

@phil-davis
Copy link
Contributor

phil-davis commented Nov 4, 2019

The unit tests fail:

There were 3 failures:

1) Test\Files\FilesystemTest::testIsFileBlacklistedRegex with data set #6 ('/etc/dummy/machin.chose', array('.*\.pst$', '.*dummy.*', '^sample.*'), false)
Failed asserting that true matches expected false.

/drone/src/tests/lib/Files/FilesystemTest.php:299

2) Test\Files\FilesystemTest::testIsExcludedRegex with data set #8 ('/foo/_Thomas/Files', array('.*backup.*', 'Thomas.*'), false)
Failed asserting that true matches expected false.

/drone/src/tests/lib/Files/FilesystemTest.php:347

3) Test\Files\FilesystemTest::testIsExcludedRegex with data set #10 ('/foo/_ThomasFoo/Files', array('.*backup.*', 'Thomas.*'), false)
Failed asserting that true matches expected false.

/drone/src/tests/lib/Files/FilesystemTest.php:347

Run them yourself locally:
make test-php-unit NOCOVERAGE=true TEST_PHP_SUITE=tests/lib/Files/FilesystemTest.php
And then fix the code or the tests (whichever is wrong)

@mmattel mmattel force-pushed the additional_regex_in_blacklisted_dirs_and_files branch 3 times, most recently from 61ea873 to d93880f Compare November 7, 2019 07:13
@mmattel
Copy link
Contributor Author

mmattel commented Nov 7, 2019

Tests fixed (locally tested), rebased and pushed, CI is green 😄
@phil-davis can you do a review pls

Copy link
Contributor

@phil-davis phil-davis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just nits about spelling etc.
Somebody in product management needs to decide if this feature should be added.

@mmattel mmattel force-pushed the additional_regex_in_blacklisted_dirs_and_files branch from d93880f to 8c69cca Compare November 7, 2019 14:09
@mmattel
Copy link
Contributor Author

mmattel commented Nov 7, 2019

@micbar @pmaier1 adding you according the suggestion of @phil-davis
This PR is a great and useful feature, adding good functionality, not breaking an exiting config.
Your review / approval is welcomed.

@mmattel mmattel requested review from micbar and pmaier1 November 7, 2019 20:53
@phil-davis phil-davis force-pushed the additional_regex_in_blacklisted_dirs_and_files branch from c94d216 to 27b2d91 Compare December 27, 2019 16:27
@phil-davis
Copy link
Contributor

@micbar I have completed acceptance tests for this. The new regex features work.

Somebody could give a technical review of the source code changes.

@individual-it can do a review of the acceptance tests.

If this is approved, then we could also cherry-pick this back into release-10.4 branch so it can get released.

In making tests for the existing blacklist and excluded directories features, I discovered #36645 - in "old" "v1" chunking uploads the blocked uploads return a 507 (rather than 403). The uploads are blocked - so that existing issue is not so important. Similar behaviour happens in the new regex features.

Copy link
Member

@individual-it individual-it left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests look good
we could move them all over to a separate feature file though

@phil-davis
Copy link
Contributor

I created issue #36654 for the reorganisation of feature files, because that impacts more than just the scenarios in this PR

@phil-davis phil-davis force-pushed the additional_regex_in_blacklisted_dirs_and_files branch 3 times, most recently from 475020f to d4ac6e3 Compare January 2, 2020 09:56
@phil-davis
Copy link
Contributor

Rebased and resolved conflict

@phil-davis phil-davis force-pushed the additional_regex_in_blacklisted_dirs_and_files branch from d4ac6e3 to 001c231 Compare January 2, 2020 14:39
mmattel and others added 6 commits January 3, 2020 11:43
upload tests for blacklisted_files_regex and excluded_directories_regex

rename tests for blacklisted_files_regex and excluded_directories_regex

move async tests for blacklisted_files_regex and excluded_directories_regex

move folder tests for blacklisted_files_regex and excluded_directories_regex

webUI tests for blacklisted_files_regex and excluded_directories_regex

uploadFileAsyncUsingNewChunking tests for blacklisted_files_regex and excluded_directories_regex

uploadFileUsingNew-and-OldChunking tests for blacklisted_files_regex and excluded_directories_regex
@phil-davis phil-davis force-pushed the additional_regex_in_blacklisted_dirs_and_files branch from 21717a0 to 339c2df Compare January 3, 2020 05:58
@mmattel mmattel merged commit e54f0ef into master Jan 3, 2020
@delete-merged-branch delete-merged-branch bot deleted the additional_regex_in_blacklisted_dirs_and_files branch January 3, 2020 10:14
* @return string
* @see https://www.php.net/manual/function.preg-last-error.php
*/
private static function preg_error_text() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using camel case in method names. I do not think we need an exception in here.

}
@\preg_match($newItem, null);
if (self::regexValidityCheck($item, $callerMessage)) {
foreach ($path as $path_part) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use camel case for variable names.

// important, because if you define an empty config value, '.htaccess' will not be added...
// prevents misuse with a fake config value
$blacklist_files[] = '.htaccess';
$blacklist_files = \array_unique($blacklist_files);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To prevent code duplication, we can move these assignments to a private function. But, I am not sure is it worth. It is up to writer.

$match = \array_intersect($path_parts, $excluded);
if ($match) {

// force that the last element (possibly the filename) is an array
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is confusing, the last element of $path_parts is string, not an array. As far as understand, what is meant here, the last element of $path_parts should be holded on an array to easy comparison or I am misunderstanding.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have rewritten the comment a bit. PR incoming.

*
* WARNING: USE THIS ONLY IF YOU KNOW WHAT YOU ARE DOING.
*/
'excluded_directories_regex' => [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not this only apply to directories? As far as I see, we are also applying it to files.

// check if the first and last character is a '/' and add one if not
// terminate with i == case insensitive
$newItem = $item;
if (\substr($item, 0, 1) !== '/') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we use $item[0] instead of calling substr. Guess accessing index directly, when we know is much faster?

if (\substr($item, 0, 1) !== '/') {
$newItem = '/' . $item;
}
if (\substr($item, -1) !== '/') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here? How about using $item[-1]

} else {
$newItem = $newItem . 'i';
}
@\preg_match($newItem, null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we ar suppressing error using @. Any reason to add it here?

$exclude_folders = $ed;
$exclude_folders_regex = $ed;
$blacklist_files = $ed;
$blacklist_files_regex = $ed;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about writing it in a line say:
$exclude_folders = $exclude_folders_regex = $blacklist_files = blacklist_files_regex = $ed. Or any other suggestion?

}

// second, check if the file is blacklisted
if (isset($blacklist_files)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be a bad idea to set blacklist_files to null in the beginning and then try to check if its non null here?

return true;
}
}
if (isset($blacklist_files_regex)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Couldn't we assign blacklist_files_regex to null in the beginning of the method?

$blacklist_array[] = \end($path_parts);

// first, check if the folder is excluded
if (isset($exclude_folders)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to set the exclude_folders to null in the beginning of the method?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm - I am not even sure why this test is here - there is already code above that always sets $exclude_folders - so the code inside this if will always be run.
I will remove the unneeded if in all 4 places.

$match = \array_intersect($path_parts, $blacklist);
if ($match) {
return true;
if (isset($exclude_folders_regex)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to set the exclude_folders_regex to null in the beginning of the method?

This was referenced Jan 3, 2020
@phil-davis
Copy link
Contributor

See #36694 and #36695 for the changes to address the review comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants