-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix crashes with invalid utf-8. #59
base: master
Are you sure you want to change the base?
Conversation
Thanks for the investigation and the fix. Great finding!
That's inconsistent with the answer here, though it's consistent with my observation in #28 (comment). Either way, Emacs's documentation here is too fuzzy, leaving this an undefined behavior. First, we'll need to figure out what Emacs's intended behavior is. Then, we can decide on the suitable approach. Similar to how we dealt with the Emacs's GC bug in #2. |
@Eli-Zaretskii could you please check the answer above? |
Sorry, I don't understand what you are asking me to check. Which "answer above", and what to check in it? |
Can we decide that the answer here is not correct and it's expected that |
I don't know. I didn't yet have time to step through the code with a debugger and see what happens there and why. And FWIW, it doesn't help that the original bug report doesn't provide any reproduction recipe except by using a module written in Rust, which I cannot even compile on my development system. All I can say at this point is that the module-support code in emacs-module.c seems to behave according to the documented interface of the function from coding.c it is using, so this ought to behave as you expect. But the Emacs encoding machinery is very complex, and its design doesn't inherently guarantee that any raw bytes will be flagged; instead, it makes every effort to produce those raw bytes back as they were input, thereby leaving the handling of this to the application. It could very well be that what you see is one consequence of this basic design. I also doubt that this particular behavior was extensively tested, since the assumption was that modules deal with text, not with binary junk. If you are in a hurry to get the answers, I suggest that you step with GDB through the code and tell us what you see, including why Emacs doesn't signal an error in this particular case. But in general, I don't recommend modules to rely on behavior that is only documented in comments to the source code. If you need valid UTF-8 strings, I suggest to verify that in the first place. Having said all that, I have this issue on my todo, and will get to investigating it eventually, as my time permits. |
I think I know what happened: your module called What you should do is make sure the string passed to
and then call You need to realize that And before you ask: no, |
Ping! Would someone please confirm or disprove my guess above? |
Hi! I just checked - passed
You can reproduce it with any module, not necessarily written in rust. Just call |
Ah, wait, I didn't check it properly - I was still calling |
You need to pass a multibyte string, not a unibyte string. That is the important part. |
According to the discussion in emacs bug#74922 it's possible that emacs passes invalid strings to dynamic libraries.
ok, this makes sense, thanks! @ubolonton I updated the MR to just remove |
According to the discussion in emacs bug#74922 it's possible that emacs passes invalid strings to dynamic libraries.
Fixes #58