Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AES decryption support in ZipInputStream #551

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

remogloor
Copy link

This pull request continues the work of @Numpsy and replaces Pull Request #381

The issue with the HMAC calculation in the original pull request was that it always decrypted and calculated HMAC for the entire input buffer. But actually it should only decrypt and calculate the HMAC up to the compressed size of the current entry. This issue is fixed now and all the unit tests pass. AES encrypted Zips can now be decrypted and decompressed using the ZipInput Stream.

Original Information from #381

I started looking at this and then haven't had time to try to finish it yet, so here it is in case anyone has any ideas.

The basic decryption seems to be ok, but the auth code checking is disabled as it doesn't work. I think the issue is the way that inflater input buffer runs all its data through the crypto transform rather than just the part which is actually encrypted, so the auth code calculation in the AES transform gets the wrong result.
Not sure of the best way to handle it - might need changes in the input buffer, or a different approach to calculating the value?

The changes also have a couple of more minor changes in separate commits, in case they're useful on their own.

I certify that I own, and have sufficient rights to contribute, all source code and related material intended to be compiled or integrated with the source code for the SharpZipLib open source product (the "Contribution"). My Contribution is licensed under the MIT License.

@remogloor remogloor mentioned this pull request Dec 11, 2020
11 tasks
@Numpsy
Copy link
Contributor

Numpsy commented Dec 11, 2020

One of the issues here is the way that the inflator input machinery works by trying to consume all the input so that it can work in cases where it doesn't know the length of the data in advance (i.e. when reading a Zip entry with a descriptor).
In that case, it doesn't know what value DecryptionLimit should be, so it can't set it.

I've thought in the past that it could be changed to use the compressed size fields in cases where they're known, but that would be a more generic change and not encryption specific.

Saying that - just being able to decrypt entries that don't use a descriptor is still a benefit, just needs a few extra checks to make sure things like CanDecompress give the correct values.

(/not particularly a fan of more zip specific encryption code in the inflator, but that's how it works so, !)

@remogloor
Copy link
Author

One of the issues here is the way that the inflator input machinery works by trying to consume all the input so that it can work in cases where it doesn't know the length of the data in advance (i.e. when reading a Zip entry with a descriptor).
In that case, it doesn't know what value DecryptionLimit should be, so it can't set it.

I've thought in the past that it could be changed to use the compressed size fields in cases where they're known, but that would be a more generic change and not encryption specific.

Saying that - just being able to decrypt entries that don't use a descriptor is still a benefit, just needs a few extra checks to make sure things like CanDecompress give the correct values.

(/not particularly a fan of more zip specific encryption code in the inflator, but that's how it works so, !)

I don't quite get what situation you are refering to.

To my understanding the Zip specification ensures that a Zip entry always comes with a header that specifies the compressed size. The DecryptionLimit is actually set to that compressed size as this is exactly what we want to be used here. The compressed data is decrypted and HMAC is calculated on the compressed data. After that happend the inflator can inflate the data. The uncompressed size does not need to be known in advance. (even though that is part of the entry header too.)

@Numpsy
Copy link
Contributor

Numpsy commented Dec 11, 2020

To my understanding the Zip specification ensures that a Zip entry always comes with a header that specifies the compressed size.

Not quite - there are two modes of operation for entries - one which puts the sizes in the file header, and one which sets the header values to 0 and puts the sizes in a Descriptor field which is after the file data. (it's described in the AppNote, e.g in sections 4.4.8 and 4.3.9).
In those cases ZipInputStream has to read the data before it can know the length (that;s also the reason why it can't read entries which have a descriptor and are stored rather than compressed).

@remogloor
Copy link
Author

Not quite - there are two modes of operation for entries - one which puts the sizes in the file header, and one which sets the header values to 0 and puts the sizes in a Descriptor field which is after the file data. (it's described in the AppNote, e.g in sections 4.4.8 and 4.3.9).

Ok got it. That use case is now covered and tested as well.
Although it seems that this is really an edge case as non of the big players actually supports this use case:

  • Linux zip => no support to create AES encrypted files
  • Pkzip => no support to create AES encrypted files
  • 7Zip => no support to create files with data descruptor
  • WinZip => no support to create files with data descruptor

The only way I could create such a file that has both AES encryption and data descriptor was using SharpZipLib itself.

@piksel
Copy link
Member

piksel commented Dec 12, 2020

Well, the reason is that Descriptors are used for streaming data, where you might not know the file size before hand, or you cannot modify the stream to update the header (like with a web server response). This scenario doesn't really make sense with a file-based stand-alone executable. Perhaps for a CLI version that pipes to stdout, not sure if 7z CLI might do it perhaps?
Update: 7z does not support piping to stdout for zip-files.

@remogloor remogloor marked this pull request as ready for review March 8, 2021 10:25
@remogloor remogloor changed the title [WIP] AES decryption support in ZipInputStream AES decryption support in ZipInputStream Mar 8, 2021
@remogloor
Copy link
Author

Ready for review

@piksel piksel requested review from piksel and Numpsy March 12, 2021 12:45
@Numpsy
Copy link
Contributor

Numpsy commented Mar 13, 2021

I'll try to have a look later

This scenario doesn't really make sense with a file-based stand-alone executable

I've seen document files created by LibreOffice (using the file formats that use zip file containers) that use descriptors, though probably not together with this form of encryption (just as an example of how these sort of files can come about through 'normal' file based tools)

@Numpsy
Copy link
Contributor

Numpsy commented Mar 14, 2021

Did I ever query whether there would be any benefir of making use of the entry size (when known) in all cases, rather than just for AES entries (e.g. something like what DecryptionLimit does, but for plain compressed entries)?

@Numpsy
Copy link
Contributor

Numpsy commented Mar 14, 2021

(I still need to have a more detailed read over the other logic in the nexzt day or two)

@codecov
Copy link

codecov bot commented Mar 15, 2021

Codecov Report

Attention: Patch coverage is 90.62500% with 9 lines in your changes missing coverage. Please review.

Project coverage is 74.92%. Comparing base (d2a0c68) to head (ba5296b).
Report is 18 commits behind head on master.

Files with missing lines Patch % Lines
src/ICSharpCode.SharpZipLib/Zip/ZipInputStream.cs 86.15% 9 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #551      +/-   ##
==========================================
+ Coverage   74.70%   74.92%   +0.21%     
==========================================
  Files          72       72              
  Lines        8465     8533      +68     
==========================================
+ Hits         6324     6393      +69     
+ Misses       2141     2140       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@remogloor
Copy link
Author

I'll try to have a look later

This scenario doesn't really make sense with a file-based stand-alone executable

I've seen document files created by LibreOffice (using the file formats that use zip file containers) that use descriptors, though probably not together with this form of encryption (just as an example of how these sort of files can come about through 'normal' file based tools)

It doesn't matter how these entries can be created with other libraries. It is possible to create them with SharpZipLib. And there is a test to verify these entries can be unzipped and decrypted.

@remogloor
Copy link
Author

Is there any update for this pull request?
Is something needed so that it can get merged?

@Numpsy
Copy link
Contributor

Numpsy commented Apr 21, 2021

Sorry, I haven't had much spare time to do things lately, I'll have another look over the weekend

@Numpsy
Copy link
Contributor

Numpsy commented Apr 25, 2021

Is that last change a separate bug fix for ZipOutputStream?

@remogloor
Copy link
Author

Yes and No

It's a separate bugfix for the ZipOutputStream.

But due to that bug it is not possible to create certain zip files needed fo the unit tests of this pull request.
The tests fail without this change as the zip files created during setup of the unit tests are not according to the specification.

@piksel
Copy link
Member

piksel commented Apr 26, 2021

Regarding ZipOutputStream:

The specification says that the CRC should be set to 0, but only for AE-2 and explicitly states that

NOTE: Zip utilities that support the AE-2 format are required to be able to read files that were created in the AE-1 format, and during decryption/extraction of files in AE-1 format should verify that the file's CRC matches the value stored in the CRC field.

And I think that

the standard Zip CRC value is not used

should be interpreted as "when reading, ignore the Zip CRC", it shouldn't refuse to read it.
And the same goes for the descriptor, even if it's not strictly needed, it shouldn't cause any exceptions if it's there. I don't really see what test you were fixing with this though, as the commit before it had all tests passing (except for InvalidPasswordSeekable which may randomly fail due to the random zip generated may lead to invalid dist codes being found with the wrong password)?

@remogloor
Copy link
Author

Seems I had the issue not correctly in mind anymore. The problem is not that the tests fail. But that the tests do not test what they are supposed to test.

There are multiple use cases that need to be tested. AES decryption needs to be able to handle both cases the one where there is a data descrpytor and the one where there isn't one. This is done by one tests that used InlineData to test the different variations.

public void ZipInputStreamDecryption(int aesKeySize, CompressionMethod compressionMethod, bool forceDataDescriptor, bool skipEntries, bool partiallyReadEntries)

The thing is here that the parameter forceDataDescriptor is basically meaningless if the ZipOutputStream always adds one anyways in case there is a password. With the unchanged ZipOutputStream it is only possible to test the cases where there is a data descryptor as only such files can be created by it. All the cases where there isn't a data desscryptor can't be tested otherwise.

With my second last change I fixed a bug that lead to an exception in case an entry was only read partially and not until the end of the stream. If you undo these changes (only the ZipInputStream not the tests) and the changes to the ZipOutputStream you will notice that only the test case that "stores" entries will fail. The test case that "deflate" the entries will pass successfully which is not what is expected. But if you change the ZipOutputStream so that there is no data descryptor when it is not needed it will fail as expected.

By the way, tools like 7-Zip are also not adding a data descryptor in this case.

Copy link
Member

@piksel piksel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, not adding it when not needed is indeed the right thing to do, I just questioned not allowing them to be there.

I think I understand what happened, but my internal zip emulation is at its limits 😁.
Just a note on the test arguments...

Copy link
Member

@piksel piksel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I get how it works now, more or less. These where just some thoughts when trying to follow the exciting journey of DecryptionLimit.

@pmpbpl
Copy link

pmpbpl commented May 27, 2022

Hi, any updates?

@remogloor remogloor force-pushed the zis_aes branch 2 times, most recently from 769b6b3 to 5972bae Compare October 18, 2022 09:39
@remogloor remogloor requested a review from piksel October 18, 2022 12:06
@lukasweber
Copy link

Hi @piksel, is there any chance you could review the changes from @remogloor anytime soon? We would be very happy to have this feature in the library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants