Fix buffer over-read and memory leaks when using long filepaths in minizip API #69677

Macksaur · 2022-12-06T18:17:31Z

Fixed a buffer over-read when building a List<String> of filepaths due to implicit String(c_str) constructor use.
Fixed a memory leak when using long filepaths (>256 characters) in ZIPReader::get_files().
Added a new unzGetCurrentFileInfo replacement method, godot_unzip_get_current_file_info, that safely gets the file information and filepath as a String without exposing/leaking dangerous C strings.
Added a new unzLocateFile replacement method, godot_unzip_locate_file, that handles long filepaths as the default unzLocateFile doesn't handle paths >256 characters at all (it could if it wanted to...). This method also uses Godot's String class for comparison in order to support Unicode case-comparison.
Improved the robustness of ZIPReader::read_file, it should now safely handle uncompressed blobs of up to 2gb and verify each action it takes before returning data.
Added documentation for ZipAppend enum.

fire · 2022-12-07T14:18:26Z

@RevoluPowered If you're using this, you may want to review.

core/io/zip_io.h

Calinou · 2022-12-30T13:06:06Z

It may be worth checking if this PR helps with #34626. (Not blocking for a merge, but it may be worth looking into for a future PR.)

bitsawer · 2023-02-05T16:15:09Z

Is there anything needed to advance this PR? I have been testing this with some troublesome .zip-files which contain very long file names (file hashes and directories longer than 256/260 characters) and this PR solves the issues very nicely. I also tested that the .zip files in #34626 can be at least read by the ZIPReader now, but that might be also caused by the minizip library updates after that 2019 bug report (or maybe something other than the zip reading causes that bug). At least .zips created using Windows 10 native functionality, 7-zip and Python's zlib seem to work with this PR, long file names and all.

One suggestion I have is tweaking the implementation of godot_unzip_get_current_file_info(). My version is a bit shorter and doesn't use raw memory allocations, lambda functions or the auto-keyword (which is generally banned in Godot codebase). Functionally, they should be identical. Here is my version, feel free to modify:

int godot_unzip_get_current_file_info(unzFile p_zip_file, unz_file_info64 &r_file_info, String &r_filepath) {
	// First, ask only for the file name size. If that succeeds, allocate memory and actually retrieve the name.
	if (unzGetCurrentFileInfo64(p_zip_file, &r_file_info, nullptr, 0, nullptr, 0, nullptr, 0) == UNZ_OK) {
		LocalVector<char> path;
		path.resize(r_file_info.size_filename);

		if (unzGetCurrentFileInfo64(p_zip_file, &r_file_info, path.ptr(), path.size(), nullptr, 0, nullptr, 0) == UNZ_OK) {
			r_filepath = String::utf8(path.ptr(), path.size());
			return UNZ_OK;
		}
	}

	return UNZ_ERRNO;
}

Macksaur · 2023-02-05T16:44:51Z

My version is a bit shorter and doesn't use raw memory allocations, lambda functions...

While I agree that the raw memory allocation is somewhat distasteful I kept it in the shape I found it from the original code because it follows the same style throughout the engine of using minizip on the stack when and where possible. Your version instead does multiple memory allocations (even on the good path!) and always drops the first block on the floor (a good candidate for stack use). The way it is currently written permits that faster path of reading from the zip.

I don't want to dumb the code down because it otherwise cuts corners and doesn't offer much in terms of maintenance. How often should this code change or be modified? I'd the prefer slightly uglier code that does less allocations / repeat reading from the zip.

In saying all of that, I should probably have changed the memory allocation to use LocalVector on the bad path! That's a good call.

Can anyone review this PR to move it forward / identify blockers?

bruvzg

Seems fine to me.

Note: There are few unnecessary // comments, that should be removed. And auto.

core/io/zip_io.cpp

bitsawer · 2023-05-29T09:38:55Z

Anything needed to progress this, @Macksaur? Looks like the only remaining issue is removing the auto-keyword and replacing it with an explicit std::function type, unless I missed something. Lambda can stay as bruvzg mentioned, although I kind of agree it's a pretty weird way to do that. My alternative version posted above is free to use and modify as always if needed :)

I would also reduce the stack buffer size to maybe 2048 or even 1024 instead of 16K. There are several platforms where thread stack size is very small. At RocketChat people porting things to WASM and Nintedo 3DS noticed that minizip blows the stack almost immediately with a 64K stack allocation and they had to patch it.

Macksaur · 2023-05-29T10:39:13Z

Hello! This PR is done and approved as far as I can tell and has been that way for at least three months. I have not been notified of anything outstanding and I have already made any necessary changes. (@akien-mga @YuriSizov ?)

Regarding auto etc: I've no interest in introducing unnecessary allocations into this code (std::function or repeating calls to unzGetCurrentFileInfo64). It's important to me that this code works fast and correct. It doesn't have to be simplified, just reused. The way I have written it does that and optimizes out very well. I've already justified this reasoning above. I don't want to make any more changes, especially if the PR is just going to rot. :(

As for the stack size issue? Wow! That's not good! However, I don't want to special case this PR when it's already just lingering. Searching the codebase for unzGetCurrentFileInfo still lists tonnes of 16k magic constants that have not yet been fixed. I would rather someone else make a PR to fix all of those at once than tack anything extra onto this.

YuriSizov · 2023-05-29T10:52:29Z

especially if the PR is just going to rot

Well, I think the fact that you have auto which we do not allow in Godot's codebase, is why it's been sitting here unmerged. This hasn't been addressed, but it's important to us that it is. There are a few exceptions of it being allowed (or missed during a review), but nobody from the core team has approved it here. Instead you were asked to replace it, and that's where the discussion has ended.

I'm not saying that your reasons are not valid, but that's the reason it's not been merged yet.

YuriSizov · 2023-05-29T11:00:00Z

By the way, before this can be merged, it would be good to rebase it, since it hasn't been updated in 5 months.

modules/zip/zip_reader.cpp

…zip archive and improved robustness of long filepaths and reading files.

akien-mga · 2023-05-31T22:46:19Z

Thanks! And congrats for your first merged Godot contribution 🎉

Macksaur requested a review from a team as a code owner December 6, 2022 18:17

Macksaur force-pushed the robust-minizip branch 2 times, most recently from a18ec5e to 2962452 Compare December 6, 2022 18:44

Macksaur mentioned this pull request Dec 7, 2022

Expose minizip API to allow creating zips using scripts #65281

Merged

Chaosus added bug topic:core labels Dec 7, 2022

Chaosus added this to the 4.0 milestone Dec 7, 2022

fire requested a review from a team December 7, 2022 14:17

Anutrix reviewed Dec 7, 2022

View reviewed changes

core/io/zip_io.h Show resolved Hide resolved

Macksaur force-pushed the robust-minizip branch 2 times, most recently from 906a74c to 555c496 Compare February 5, 2023 17:13

akien-mga requested review from bruvzg, Faless and hpvb February 6, 2023 14:16

bruvzg approved these changes Feb 13, 2023

View reviewed changes

core/io/zip_io.cpp Outdated Show resolved Hide resolved

core/io/zip_io.cpp Outdated Show resolved Hide resolved

core/io/zip_io.cpp Show resolved Hide resolved

YuriSizov modified the milestones: 4.0, 4.1 Feb 17, 2023

YuriSizov added the needs work label Feb 17, 2023

bitsawer mentioned this pull request Mar 4, 2023

Strange code fragments #74344

Closed

Macksaur force-pushed the robust-minizip branch from 555c496 to c37a603 Compare May 29, 2023 11:43

Macksaur requested a review from akien-mga May 29, 2023 12:09

Macksaur force-pushed the robust-minizip branch from c37a603 to 7c601a6 Compare May 31, 2023 11:33

YuriSizov requested review from bruvzg and lawnjelly and removed request for akien-mga May 31, 2023 12:36

bruvzg approved these changes May 31, 2023

View reviewed changes

modules/zip/zip_reader.cpp Outdated Show resolved Hide resolved

Fix buffer over-read and memory leaks when using long filepaths in a …

1326b7e

…zip archive and improved robustness of long filepaths and reading files.

Macksaur force-pushed the robust-minizip branch from 7c601a6 to 1326b7e Compare May 31, 2023 13:12

YuriSizov removed the needs work label May 31, 2023

akien-mga merged commit 180a5eb into godotengine:master May 31, 2023

zedrun00 mentioned this pull request Jun 28, 2023

Set language encoding flag when using ZIPPacker #78732

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix buffer over-read and memory leaks when using long filepaths in minizip API #69677

Fix buffer over-read and memory leaks when using long filepaths in minizip API #69677

Macksaur commented Dec 6, 2022 •

edited

Loading

fire commented Dec 7, 2022

Calinou commented Dec 30, 2022

bitsawer commented Feb 5, 2023

Macksaur commented Feb 5, 2023 •

edited

Loading

bruvzg left a comment

bitsawer commented May 29, 2023

Macksaur commented May 29, 2023

YuriSizov commented May 29, 2023 •

edited

Loading

YuriSizov commented May 29, 2023

akien-mga commented May 31, 2023

Fix buffer over-read and memory leaks when using long filepaths in minizip API #69677

Fix buffer over-read and memory leaks when using long filepaths in minizip API #69677

Conversation

Macksaur commented Dec 6, 2022 • edited Loading

fire commented Dec 7, 2022

Calinou commented Dec 30, 2022

bitsawer commented Feb 5, 2023

Macksaur commented Feb 5, 2023 • edited Loading

bruvzg left a comment

Choose a reason for hiding this comment

bitsawer commented May 29, 2023

Macksaur commented May 29, 2023

YuriSizov commented May 29, 2023 • edited Loading

YuriSizov commented May 29, 2023

akien-mga commented May 31, 2023

Macksaur commented Dec 6, 2022 •

edited

Loading

Macksaur commented Feb 5, 2023 •

edited

Loading

YuriSizov commented May 29, 2023 •

edited

Loading