Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Godot crashes when importing OGG file or renaming folder / OGG Vorbis (release builds only) #87076

Closed
DevMimas opened this issue Jan 11, 2024 · 12 comments · Fixed by #87246
Closed

Comments

@DevMimas
Copy link

DevMimas commented Jan 11, 2024

Tested versions

Godot Engine v4.2.1.stable.mono.official

System information

Windows 10 - Godot v4.2.1.stable.mono.official (Forward+)

Issue description

Godot Editor crashes (v4.2.1),

  • if you try to import an OGG file
  • if you rename the folder in which the OGG file is located. Reimport causes crash

If you remove the OGG file from the project folder, the project is loaded successfully.

Better description needed
I would also like the error text to be better defined. For Lain it is absolutely meaningless.
See also my proposal: godotengine/godot-proposals#8808
My commit in the pull request: #84723

Vide open OGG file project

2024-01-15.23-26-39.mp4

Steps to reproduce

N/A

Minimal reproduction project (MRP)

OGG TestProject.zip

@alessandrofama
Copy link
Contributor

alessandrofama commented Jan 11, 2024

It would have been nice if this issue included an example project or an .ogg file to verify the crash.

I can reproduce the crash with this project ogg_error_spam.zip from #84712 on Godot v4.2.1.stable.mono.official and Godot Engine v4.2.1.stable.official just by importing the file the first time, by double clicking the audio file & renaming the folder in which the file is located, which i guess triggers a re-import?

I can't reproduce the crash on latest 3524346 (v4.3.dev)
Godot will print Error during vorbis synthesis -135 but will not crash.

Important to note that the crash seems to only be triggered by corrupt .ogg files and not any .ogg file as the description implies.

(Windows 10.0.22621)

@bs-mwoerner
Copy link
Contributor

I get an Access Violation in both 4.2.1 and 4.3 Dev 1 when importing the file from that ZIP archive, but only in the release builds, not in my debug build. That may be a bit tricky to debug.

Error -135 by the way means that the file contains non-audio data (comments?). I suppose that's an internal detail and not necessarily something that needs to be reported to the user.

@RichTeaMan
Copy link

RichTeaMan commented Jan 13, 2024

I've also seen this problem manifest when cloning my exsting Godot project. If the .godot/import directory doesn't already have ogg import files the entire Godot editor segfaults. I have observed this on several Ubuntu computers and Github CI.

RichTeaMan added a commit to RichTeaMan/combine-derby-godot that referenced this issue Jan 13, 2024
Will revert when Godot can import a clean workspace without crashing.
See godotengine/godot#87076.
@bs-mwoerner
Copy link
Contributor

I did some assembly-level debugging on the release exe. So far it looks like the crash happens shortly after processing a zero-length packet in the OGG file, even though zero-length packets cause no issues in the debug builds. The actual cause for the crash seems to be trying to decrement the reference count of a zero pointer in a PackedByteArray when it goes out of scope, which should not be possible without an inter-thread race condition. Not sure how that happens, yet. Might be some kind of memory corruption.

@akien-mga akien-mga changed the title Godot crashes when importing OGG file or renaming folder / OGG Vorbis Godot crashes when importing OGG file or renaming folder / OGG Vorbis (release builds only) Jan 15, 2024
@bs-mwoerner
Copy link
Contributor

Uh, this is kind of wild. First of all: I think the crash could be prevented just by adding an if (packet.bytes > 0), but I think the actual problem runs much deeper and this may not be the best place to fix this.

A packet.bytes of 0 gives an empty PackedByteArray with an internal _cowdata._ptr == nullptr, which is well defined and should not cause any problems, so the actual question is: If all methods involved correctly check for nullptr and a debug build has no problem with the data pointer being null, why do the release builds crash? After staring at the code for hours, here's my theory.

The crash happens in CowData<T>::_unref(). This is implemented like this:

template <class T>
void CowData<T>::_unref(void *p_data) {
	if (!p_data) {
		return;
	}

	SafeNumeric<uint32_t> *refc = _get_refcount();

	if (refc->decrement() > 0) {
		return; // still in use
	}
	// clean up

	if (!std::is_trivially_destructible<T>::value) {
		uint32_t *count = _get_size();
		T *data = (T *)(count + 1);

		for (uint32_t i = 0; i < *count; ++i) {
			// call destructors
			data[i].~T();
		}
	}

	// free mem
	Memory::free_static((uint8_t *)p_data, true);
}

At first glance, it appears a bit suspicious that a null check is done only for the method parameter p_data but not for the pointer returned from _get_refcount(), which is derived from this->_ptr. I think the rationale here is that:

  • _unref() is a private method.
  • Every time it is called, this->_ptr is passed in as p_data, so the method works under the implicit assumption that p_data == this->_ptr and thus a check for p_data == nullptr is equivalent to a check for this->_ptr == nullptr.

Spoiler alert: The C++ compiler does not make that assumption.

The crash occurs in an inlined version of the method and the actual code being executed looks more like this:

SafeNumeric<uint32_t> *refc = this->_ptr ?
	reinterpret_cast<SafeNumeric<uint32_t> *>(_ptr) - 2 :
	nullptr;

if (refc->decrement() > 0) {
	return; // still in use
}

Memory::free_static((uint8_t *)p_data, true);

Note that the check for if (!p_data) is optimized out. If the method is called while this->_ptr == nullptr, there is no effective check to prevent a null pointer dereference in refc->decrement(). That's exactly what happens with those specific OGG files.

So is this a bug in the compiler? Isn't this code semantically different from the C++ source, especially, since p_data is used later in the method? That's what I thought initially, but I think the key here is that Memory::free_static() also checks its parameter for null, so the additional check in the caller is indeed redundant. The compiler does not see that checking the function parameter is supposed to be a check of a member variable, because all calls to the function pass the value of that member.

This might be kind of a big deal, since CowData is used all over Godot. Or it might not, because the bug requires the code to be optimized in a certain way to actually trigger. Nevertheless, this issue proves that there are situations where it does happen.

@DevMimas
Copy link
Author

Hello everyone,
Thank you for your feedback.

Sorry, I was busy, so it took a while.
I have now added a test project with my OGG file + a video. The project crashes immediately. Even when I created the test project and dragged the file into the folder, the project crashed immediately.

What is a corrupt .ogg file? A file that is damaged?

I can open and play the file on Windows 10, just as a hint.

@RichTeaMan
Copy link

@bs-mwoerner thanks for your deep dive into the code, it was an interesting read.

I've built your code in the linked pull request (a production build using mono, scons platform=linuxbsd target=editor production=yes module_mono_enabled=yes), but it appears I'm still getting the same error:

ERROR: Caller thread can't call this function in this node (/root). Use call_deferred() or call_thread_group() instead.
   at: propagate_notification (scene/main/node.cpp:2281)

================================================================
handle_crash: Program crashed with signal 11
Engine version: Godot Engine v4.3.dev.mono.custom_build (f6ae492523699cabbe59175f72839b30d0be573e)
Dumping the backtrace. Please include this when reporting the bug to the project developer.
[1] /usr/lib/dotnet/shared/Microsoft.NETCore.App/7.0.15/libcoreclr.so(+0x621a5e) [0x7f5e73621a5e] (??:0)
[2] /usr/lib/dotnet/shared/Microsoft.NETCore.App/7.0.15/libcoreclr.so(+0x6210d5) [0x7f5e736210d5] (??:0)
[3] /lib/x86_64-linux-gnu/libc.so.6(+0x42910) [0x7f5ec6c42910] (??:0)
[4] /home/tom/projects/backup/godot/bin/godot.linuxbsd.editor.x86_64.mono(+0x673b1c) [0x55decc69db1c] (??:0)
[5] /home/tom/projects/backup/godot/bin/godot.linuxbsd.editor.x86_64.mono(+0x674876) [0x55decc69e876] (??:0)
[6] /home/tom/projects/backup/godot/bin/godot.linuxbsd.editor.x86_64.mono(+0x6750b0) [0x55decc69f0b0] (??:0)
[7] /home/tom/projects/backup/godot/bin/godot.linuxbsd.editor.x86_64.mono(+0x1368a5b) [0x55decd392a5b] (??:0)
[8] /home/tom/projects/backup/godot/bin/godot.linuxbsd.editor.x86_64.mono(+0x136b1f6) [0x55decd3951f6] (??:0)
[9] /home/tom/projects/backup/godot/bin/godot.linuxbsd.editor.x86_64.mono(+0x400bfb6) [0x55ded0035fb6] (??:0)
[10] /home/tom/projects/backup/godot/bin/godot.linuxbsd.editor.x86_64.mono(+0x400c76c) [0x55ded003676c] (??:0)
[11] /home/tom/projects/backup/godot/bin/godot.linuxbsd.editor.x86_64.mono(+0x3b3101d) [0x55decfb5b01d] (??:0)
[12] /home/tom/projects/backup/godot/bin/godot.linuxbsd.editor.x86_64.mono(+0x4718803) [0x55ded0742803] (??:0)
[13] /lib/x86_64-linux-gnu/libc.so.6(+0x97ada) [0x7f5ec6c97ada] (??:0)
[14] /lib/x86_64-linux-gnu/libc.so.6(+0x12847c) [0x7f5ec6d2847c] (??:0)
-- END OF BACKTRACE --
================================================================

@bs-mwoerner
Copy link
Contributor

bs-mwoerner commented Jan 15, 2024

Ah! production=yes is probably what I was looking for! Thanks a lot. I'll see if I can get this to build on Windows, that should make it much easier to get to the bottom of this.

Edit: I wasn't able to reproduce the crash in a production=yes build with the Microsoft compiler even without the change, so my new theory is that this is a gcc quirk. The cause of the crash is probably indeed the null check being optimized away. The reason, however, is not that CowData checks a different pointer, but that the caller in ResourceImporterOggVorbis::load_from_buffer previously did a zero-length memcpy into that pointer. And while copying 0 bytes into a pointer is a nop and doesn't break anything execution-wise, gcc takes this as indication that the pointer is not null and removes future null checks on that pointer. Live and learn.

@bs-mwoerner
Copy link
Contributor

bs-mwoerner commented Jan 16, 2024

@RichTeaMan I created a new pull request #87246 that explicitly checks the packet size before calling memcpy. It would be great if you could test this again, as I don't have a working gcc setup.

I'm not 100% sure this fixes your problem, because your backtrace makes it sound as if your segfault happens in libcoreclr.so, but maybe that's just how the exception travels to the handler. It's certainly worth a try.

@DevMimas

I have now added a test project with my OGG file + a video. The project crashes immediately. Even when I created the test project and dragged the file into the folder, the project crashed immediately.

This might be solved by the fix I am proposing, but so far, I've only worked with OGG files that contain only an audio stream. I can't comment on what exactly happens when there's an OGG Theora video stream as well. In theory, the low-level library should separate the streams so that the importer doesn't see a difference.

What is a corrupt .ogg file? A file that is damaged?

In this particular context, the crash is probably triggered by OGG streams that contain zero-length packets. That's not forbidden, so if that's the only issue, then the file is not actually "corrupt" or "damaged", just "unusual".

Edit: I had a look at the Loading-Glitch.ogg you provided, @DevMimas, and I can't play that file in VLC or Windows Media Player and ffmpeg says it's a "Broken file, non-keyframe not correctly marked." So yes, this file is indeed "corrupt/damaged". We can fix the crash, but you won't get proper audio from that file.

@RichTeaMan
Copy link

@bs-mwoerner thanks, the new build seems to import and play ogg files just fine.

As an observation, when one specific sound plays a runtime error is logged:

E 0:00:31:0347   next_ogg_packet: Condition "page_cursor >= ogg_packet_sequence->page_data.size()" is true. Returning: false
  <C++ Source>   modules/ogg/ogg_packet_sequence.cpp:140 @ next_ogg_packet()

The sound is still played and the game continues, but there's clearly something strange with this one particular file. VLC plays it without complaint.

Thank you.

@bs-mwoerner
Copy link
Contributor

bs-mwoerner commented Jan 16, 2024

That's great news, thanks for testing! I get that same message at the end of the OST.ogg file. This also seems to be caused by the fact that the file ends on a zero-length packet, which means that at one point, there are packets left but no bytes. Playback still ends correctly, but the code expresses its astonishment about this fact with a warning message.

Since the importer pre-parses the entire file anyway, it may be cleanest to just drop empty packet from the parsed data. There's a source code comment that says empty pages should still be added to the index, though, so I'll have to think about the implications a bit.

@DevMimas
Copy link
Author

DevMimas commented Jan 21, 2024

@bs-mwoerner

@DevMimas

I have now added a test project with my OGG file + a video. The project crashes immediately. Even when I created the test project and dragged the file into the folder, the project crashed immediately.

This might be solved by the fix I am proposing, but so far, I've only worked with OGG files that contain only an audio stream. I can't comment on what exactly happens when there's an OGG Theora video stream as well. In theory, the low-level library should separate the streams so that the importer doesn't see a difference.

What is a corrupt .ogg file? A file that is damaged?

In this particular context, the crash is probably triggered by OGG streams that contain zero-length packets. That's not forbidden, so if that's the only issue, then the file is not actually "corrupt" or "damaged", just "unusual".

Edit: I had a look at the Loading-Glitch.ogg you provided, @DevMimas, and I can't play that file in VLC or Windows Media Player and ffmpeg says it's a "Broken file, non-keyframe not correctly marked." So yes, this file is indeed "corrupt/damaged". We can fix the crash, but you won't get proper audio from that file.

All right. Thank you very much for your explanation.
I have now also located the damaged OGG. It is strange. In WindowsPlayer the file can be played for the first time without errors, several times in succession. If you open the file via the VLC Player, it cannot be played and then not in WindowsPlayer either. No error is displayed in VLC. An error code is then displayed in WindowsPlayer.

Yes, the file does not contain an audio line.

So now I know that the file contains some kind of error.
Great. Thanks for fixing the bug 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment