-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression: ZipFile/ZipArchive entry name decoding not working correctly when entryNameEncoding is specified #92283
Comments
Tagging subscribers to this area: @dotnet/area-system-io-compression Issue DetailsDescriptionThe documentation for ZipFile.Open contains the following remarks:
This is how it always worked, but in .NET 7 and 8 there seems to be a bug, so that the last rule is no longer applied. Reproduction StepsHer are some test zip files, to reproduce the problem: The first zip was created by the windows 11 file explorer and the second was created by .NET without specifying entryNameEncoding. The windows file explorer does not set the language encoding flag, but .NET does. When reading a zip file with .NET you always had to specify the entryNameEncoding, otherwise the special file name characters would not be read correctly. Something like this:
Expected behaviorFor both zip files the name of the extracted file should be "Nürburgring.txt" Actual behaviorThe zip file created by the file explorer is correctly extracted, but the other file is not correctly extracted "N├╝rburgring.txt" when using .NET 7/8. Regression?In .NET Framework and .NET 6 this worked correctly. Known WorkaroundsNo response ConfigurationNo response Other informationNo response
|
I would like to emphasize that this is not some kind of special case. It's pretty normal that you have to specify entryNameEncoding, otherwise almost all zip files, containing filenames with special characters, (including those created by windows file explorer) will not be read correctly. So not specifying entryNameEncoding is not really an option either. I've taken a look at the source code and my suspicion was confirmed. In .NET 6 the filename was decoded like this:
But in .NET 7/8 it's done like this:
In the new version when EntryNameAndEncoding is specified, it's always used, regardless of the _generalPurposeBitFlag having the BitFlagValues.UnicodeFileName flag set. Probably should be changed to something like this:
|
I ran into the same problem. Currently, it seems that the UTF8 flags included in the ZIP file are not used at all in the read. |
We recently ran into this issue as well and agree it is a pretty serious regression. We are now basically forced to stop using the dot net built-in functionality to uncompress zip files and start using some other library, probably SharpZipLib. |
Not my area, but I'm guessing if someone is interested in offering a PR this might meet the bar for servicing. |
I can confirm that we are seeing the same bug. Here is the relevant portion of the PKWARE's .ZIP File Format Specification:
So basically: when the bit is unset use CP437, when set use UTF-8. In case it helps, here are some sample ZIP archives whose names should decode correctly when passing Documentación.zip - the bit is unset Passing CP437 should also work in both cases, but this time it fails on the Japanese ZIP. Here is a simple console project illustrating the problem: |
Here is a quick unit test:
ref: |
Description
The documentation for ZipFile.Open contains the following remarks:
This is how it always worked, but in .NET 7 and 8 there seems to be a bug, so that the last rule is no longer applied.
It seems entryNameEncoding is always used, even when the zip file entry has the language encoding flag set.
In my opinion this is a serious regression.
Reproduction Steps
Her are some test zip files, to reproduce the problem:
test_win.zip
test_dotnet.zip
The first zip was created by the windows 11 file explorer and the second was created by .NET without specifying entryNameEncoding. The windows file explorer does not set the language encoding flag, but .NET does.
The problems begin when you try to read those zip files with .NET.
When reading a zip file with .NET you always had to specify the entryNameEncoding, otherwise the special file name characters would not be read correctly. Something like this:
Expected behavior
For both zip files the name of the extracted file should be "Nürburgring.txt"
Actual behavior
The zip file created by the file explorer is correctly extracted, but the other file is not correctly extracted "N├╝rburgring.txt" when using .NET 7/8.
Regression?
In .NET Framework and .NET 6 this worked correctly.
You could always specify an entryNameEncoding and .NET correctly respected the language encoding flag.
Known Workarounds
No response
Configuration
No response
Other information
No response
The text was updated successfully, but these errors were encountered: