Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix buffer over-read and memory leaks when using long filepaths in minizip API #69677

Merged
merged 1 commit into from
May 31, 2023

Conversation

Macksaur
Copy link
Contributor

@Macksaur Macksaur commented Dec 6, 2022

  • Fixed a buffer over-read when building a List<String> of filepaths due to implicit String(c_str) constructor use.
  • Fixed a memory leak when using long filepaths (>256 characters) in ZIPReader::get_files().
  • Added a new unzGetCurrentFileInfo replacement method, godot_unzip_get_current_file_info, that safely gets the file information and filepath as a String without exposing/leaking dangerous C strings.
  • Added a new unzLocateFile replacement method, godot_unzip_locate_file, that handles long filepaths as the default unzLocateFile doesn't handle paths >256 characters at all (it could if it wanted to...). This method also uses Godot's String class for comparison in order to support Unicode case-comparison.
  • Improved the robustness of ZIPReader::read_file, it should now safely handle uncompressed blobs of up to 2gb and verify each action it takes before returning data.
  • Added documentation for ZipAppend enum.

@Macksaur Macksaur requested a review from a team as a code owner December 6, 2022 18:17
@Macksaur Macksaur force-pushed the robust-minizip branch 2 times, most recently from a18ec5e to 2962452 Compare December 6, 2022 18:44
@Chaosus Chaosus added this to the 4.0 milestone Dec 7, 2022
@fire fire requested a review from a team December 7, 2022 14:17
@fire
Copy link
Member

fire commented Dec 7, 2022

@RevoluPowered If you're using this, you may want to review.

@Calinou
Copy link
Member

Calinou commented Dec 30, 2022

It may be worth checking if this PR helps with #34626. (Not blocking for a merge, but it may be worth looking into for a future PR.)

@bitsawer
Copy link
Member

bitsawer commented Feb 5, 2023

Is there anything needed to advance this PR? I have been testing this with some troublesome .zip-files which contain very long file names (file hashes and directories longer than 256/260 characters) and this PR solves the issues very nicely. I also tested that the .zip files in #34626 can be at least read by the ZIPReader now, but that might be also caused by the minizip library updates after that 2019 bug report (or maybe something other than the zip reading causes that bug). At least .zips created using Windows 10 native functionality, 7-zip and Python's zlib seem to work with this PR, long file names and all.

One suggestion I have is tweaking the implementation of godot_unzip_get_current_file_info(). My version is a bit shorter and doesn't use raw memory allocations, lambda functions or the auto-keyword (which is generally banned in Godot codebase). Functionally, they should be identical. Here is my version, feel free to modify:

int godot_unzip_get_current_file_info(unzFile p_zip_file, unz_file_info64 &r_file_info, String &r_filepath) {
	// First, ask only for the file name size. If that succeeds, allocate memory and actually retrieve the name.
	if (unzGetCurrentFileInfo64(p_zip_file, &r_file_info, nullptr, 0, nullptr, 0, nullptr, 0) == UNZ_OK) {
		LocalVector<char> path;
		path.resize(r_file_info.size_filename);

		if (unzGetCurrentFileInfo64(p_zip_file, &r_file_info, path.ptr(), path.size(), nullptr, 0, nullptr, 0) == UNZ_OK) {
			r_filepath = String::utf8(path.ptr(), path.size());
			return UNZ_OK;
		}
	}

	return UNZ_ERRNO;
}

@Macksaur
Copy link
Contributor Author

Macksaur commented Feb 5, 2023

My version is a bit shorter and doesn't use raw memory allocations, lambda functions...

While I agree that the raw memory allocation is somewhat distasteful I kept it in the shape I found it from the original code because it follows the same style throughout the engine of using minizip on the stack when and where possible. Your version instead does multiple memory allocations (even on the good path!) and always drops the first block on the floor (a good candidate for stack use). The way it is currently written permits that faster path of reading from the zip.

I don't want to dumb the code down because it otherwise cuts corners and doesn't offer much in terms of maintenance. How often should this code change or be modified? I'd the prefer slightly uglier code that does less allocations / repeat reading from the zip.

In saying all of that, I should probably have changed the memory allocation to use LocalVector on the bad path! That's a good call.


Can anyone review this PR to move it forward / identify blockers?

@Macksaur Macksaur force-pushed the robust-minizip branch 2 times, most recently from 906a74c to 555c496 Compare February 5, 2023 17:13
@akien-mga akien-mga requested review from bruvzg, Faless and hpvb February 6, 2023 14:16
Copy link
Member

@bruvzg bruvzg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me.

Note: There are few unnecessary // comments, that should be removed. And auto.

core/io/zip_io.cpp Outdated Show resolved Hide resolved
core/io/zip_io.cpp Outdated Show resolved Hide resolved
core/io/zip_io.cpp Show resolved Hide resolved
@YuriSizov YuriSizov modified the milestones: 4.0, 4.1 Feb 17, 2023
@bitsawer bitsawer mentioned this pull request Mar 4, 2023
@bitsawer
Copy link
Member

Anything needed to progress this, @Macksaur? Looks like the only remaining issue is removing the auto-keyword and replacing it with an explicit std::function type, unless I missed something. Lambda can stay as bruvzg mentioned, although I kind of agree it's a pretty weird way to do that. My alternative version posted above is free to use and modify as always if needed :)

I would also reduce the stack buffer size to maybe 2048 or even 1024 instead of 16K. There are several platforms where thread stack size is very small. At RocketChat people porting things to WASM and Nintedo 3DS noticed that minizip blows the stack almost immediately with a 64K stack allocation and they had to patch it.

@Macksaur
Copy link
Contributor Author

Hello! This PR is done and approved as far as I can tell and has been that way for at least three months. I have not been notified of anything outstanding and I have already made any necessary changes. (@akien-mga @YuriSizov ?)


Regarding auto etc: I've no interest in introducing unnecessary allocations into this code (std::function or repeating calls to unzGetCurrentFileInfo64). It's important to me that this code works fast and correct. It doesn't have to be simplified, just reused. The way I have written it does that and optimizes out very well. I've already justified this reasoning above. I don't want to make any more changes, especially if the PR is just going to rot. :(

As for the stack size issue? Wow! That's not good! However, I don't want to special case this PR when it's already just lingering. Searching the codebase for unzGetCurrentFileInfo still lists tonnes of 16k magic constants that have not yet been fixed. I would rather someone else make a PR to fix all of those at once than tack anything extra onto this.

@YuriSizov
Copy link
Contributor

YuriSizov commented May 29, 2023

especially if the PR is just going to rot

Well, I think the fact that you have auto which we do not allow in Godot's codebase, is why it's been sitting here unmerged. This hasn't been addressed, but it's important to us that it is. There are a few exceptions of it being allowed (or missed during a review), but nobody from the core team has approved it here. Instead you were asked to replace it, and that's where the discussion has ended.

I'm not saying that your reasons are not valid, but that's the reason it's not been merged yet.

@YuriSizov
Copy link
Contributor

By the way, before this can be merged, it would be good to rebase it, since it hasn't been updated in 5 months.

@YuriSizov YuriSizov requested review from bruvzg and lawnjelly and removed request for akien-mga May 31, 2023 12:36
modules/zip/zip_reader.cpp Outdated Show resolved Hide resolved
…zip archive and improved robustness of long filepaths and reading files.
@akien-mga akien-mga merged commit 180a5eb into godotengine:master May 31, 2023
@akien-mga
Copy link
Member

Thanks! And congrats for your first merged Godot contribution 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants