-
Notifications
You must be signed in to change notification settings - Fork 668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[api] fix issue in Tar/Zip Utils that resulted in incorrect artifact … #3544
Conversation
} | ||
return name.substring(index); | ||
return name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why return the input variable as is? Maybe this method should be getAbsolutePath
and the method is expected to throw if the path is invalid.
We are using the output in vis.validate
. Do we need that after this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, this method can be void
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still need vis.validate
since that method is doing some deeper validation specific to zips. It's validating the entries in the header match what is in the archive
For Zip file, entry name starts with "/" is valid, and will be ignored by common zip utils. I think it's OK we make it more strict that doesn't allows "/", but your changes makes |
public void testLinuxCreatedWindowsUsedOffendingTar() throws IOException { | ||
TestRequirements.windows(); | ||
Path tarPath = Paths.get("src/test/resources/linux_create_windows_use.tar"); | ||
Path output = Paths.get("C:/out"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why choose "C:/out"?, better output to build
folder, although the test throws exception before access to file system but using "c:/out" is a bit confusion:
- not all machine has c driver
- not all environment has write access to /out directory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, this was leftover from my testing on a windows machine - build/out suffices here
} | ||
return name.substring(index); | ||
return name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, this method can be void
@si2d @frankfliu what do you think about this method? We would use
|
in archive file, the path is always linux style, we don't really need use os specific Path to validate it. And we don't have to support special cases (even they are valid). We can just check if the entry name starts with "/" or contains "..", we treat them as invalid. This was the original algorithm. I'm curious why that cause test failure on windows. |
File.separatorChar is os specific. On mac/linux it is |
@frankfliu with the existing logic (ignoring any of my changes here), this is what the testOffendingTar unit test produces on windows (i added some print statements, but otherwise logic is same).
The output dir in that test is |
I wonder - instead of trying to block this, should we only disallow overwrites? so even if the path is a/../b.txt, it will be allowed as long as nothing else writes to b.txt. We are explicitly setting REPLACE_EXISTING when unarchiving, but is overwriting files a valid use case? (Also, this change might break customers if they are somehow using it) |
The simplest change that solves the issue is to keep everything the same, except for this modification to
|
Right, we should not use File.separatorChar. Archive file format is not os specific |
|
The original algorithm was using File.separatorChar, not
|
82d65a8
to
2ce8065
Compare
To me, checking that the path resolves to be within the destination is the more correct change rather than modifying our custom method. What is a reason to keep this change to a simplest change? |
I agree with you. I think we should be checking that the path resolves to be within the destination. I've kept that portion in and removed the check on "path contains .." since that would be redundant in my opinion. The part that still seems open is whether we want to sanitize the archive entry at all. If it's possible and valid for zip archives to start with |
2ce8065
to
ac401e7
Compare
@si2d @frankfliu I have updated the PR based on the above discussions. The changes are now
This will fix the issue with the current code. Some additional things we may consider:
|
static String removeLeadingFileSeparator(String name) { | ||
String osAwareArchiveEntryName = FilenameUtils.separatorsToSystem(name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- try to avoid dependency on
commons.io
- If we always run
sanitizeAndValidateArchiveEntry()
, we only need to remove "/" char here, "\" issue will be caught bysanitizeAndValidateArchiveEntry()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why avoid commons.io? we're using it in a few places in this code path already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- in
api
module, the only place use commons.io is inTarUtils.java
,commons-io
is transient dependency fromcommons-compression
recently added in 1.27.x, which is not intention of api project - there are customers they don't use
tar
files, they can excludecommons-compression
from their project. see: Make commons-compress an optional dependency #2949
ac401e7
to
836e1f0
Compare
Updated the PR - i've opted for the more strict approach where entries that start with '/' are invalid. We validate that each archive entry will be written to a location under the provided output directory. If it won't (either because it starts with |
} | ||
static void validateArchiveEntry(String name, Path destination) throws IOException { | ||
Path expectedOutputPath = destination.resolve(name).normalize(); | ||
if (!expectedOutputPath.startsWith(destination.normalize())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'd better block ".." as well in this method. it prevent file overwrite inside the destination folder. In original version, we already blocking "..", and nobody complained about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default behavior of extracting a tar is to overwrite isn't it? I'm not sure why we need to differ.
If an archive mytar.tar had (in order)
b.txt
a/../b.txt
a/../b.txt would overwrite b.txt (e.g. using tar -xvf mytar.tar)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've gone ahead and added it back to keep in line with what we had before, but curious to know what you think about my point above.
836e1f0
to
d543ffe
Compare
cd00a44
to
34d457a
Compare
34d457a
to
971e251
Compare
…extraction
Description
This change fixes an issue with
ZipUtils.unzip
, andTarUtils.untar
. This issue was obfuscated by the fact that our ci tests were not failing when they should have been. See #3543 for details.Rather than stripping leading file separators and still extracting the archive, this change validates that the tar entry will extract into the expected output directory.
While this change means that some tars that previously worked (such as those with entries starting with
/
or\
) will no longer work, I argue that the new behavior is correct and those tar entries are invalid/incorrect.