-
-
Notifications
You must be signed in to change notification settings - Fork 291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When iterating over Zipfiles, always use the Unix file separator to fix a Windows issue #638
Merged
jsirois
merged 10 commits into
pex-tool:master
from
yorinasub17:yori-vendor-regex-supports-windows
Apr 20, 2019
Merged
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
2b52fb4
Change windows separator with unix separator in regex
yorinasub17 8567355
Fix indentation
yorinasub17 f17e3a1
Simplify to force / instead of using os.sep
yorinasub17 081fd1c
Update link to zip spec and style update for pattern
yorinasub17 08db4b3
Fix style error
yorinasub17 539e671
Fix prefix to use / instead os.sep as well
yorinasub17 3241d07
ensure relpath uses /
yorinasub17 07e88b3
Clean up some more by using replace
yorinasub17 83ae158
Make comments better
yorinasub17 2546df0
Clean up code based on PR comments
yorinasub17 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any guarantee
prefix
will never already haveos.sep
in it? That's an earnest question - I couldn't figure it out from a very quick search for_ZipIterator
.Either way, for consistency and safety, might be better to change this line to this idiom:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be guaranteed because
prefix
starts off as''
outside the while loop, and only contains parts constructed withos.path.basename
.os.path.basename
takes the last path element, so it is equivalent toarg.split(os.sep)[-1]
, which will mean the result won't contain theos.sep
character.That said, there is something that doesn't make sense to me about this while loop. This threw me off at first, but I believe the while loop is repeatedly walking up
root
directory by directory until it hits a zip file. E.g ignoringprefix
for a minute, the loop boils down to:Given that, I think the
prefix
is actually backwards. If we go into the loop withfoo.zip/bar/baz
, assumingfoo.zip
is the zip file,prefix
will becomebaz/bar
, because after the first iteration,prefix
will be basename of the path (which isbaz
), and the second iteration joins that with basename of the path again (the original wasprefix = os.path.join(prefix, os.path.basename(path))
, soos.path.join('baz', os.path.basename('foo.zip/bar')) = 'baz/bar'
). Am I missing something with my logic here?This must be working because the tests pass, but I wonder if this is actually an uncaught bug? I haven't had a chance to walk through all the test cases to know if there is any test case for this, or if the caller of
_ZipIterator
somehow guarantees this loop only goes one iteration deep.Also this loop is an infinite loop if
root
is ever an absolute path that does not contain a zip file, becauseos.path.dirname('/') == '/'
, so it will never be falsy.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right about the bug - and it skates by because items are exactly one level deep. The
prefix = os.path.join(prefix, os.path.basename(path))
line should beprefix = os.path.join(os.path.basename(path), prefix)
.As far as the infinite loop goes, the one call of
containing
ensuresroot
is not a dir (but not that it's not a file). It probably makes sense to either guard the infinite loop case or else assert early that root is either a zipfile or else not a file and not a directory (foo.zip/bar/baz
is not either of these).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great investigation @yorinasub17 and thanks @jsirois for confirming.
I think it'd be best to save this bug fix for a separate PR, as this one is focused on Windows support. If you're up for it, that'd be great for you to open this PR. No worries if you're busy - we can do it otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the confirmation. I made the fix for the prefix here, but have not addressed the infinite loop.
I won't be able to address it today though. I'll check back again when I have time and if it still hasn't been fixed yet, I'll open a PR.