Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use cppsize when constructing String from StdString #378

Merged
merged 4 commits into from
Oct 4, 2023

Conversation

kleinschmidt
Copy link
Contributor

@kleinschmidt kleinschmidt commented Oct 2, 2023

Currently, std::strings that have null bytes in them cannot be converted to Strings without truncation using the StdLib-provided String constructor. Base.unsafe_string with one argument assumes the pointed-to string is null-terminated, but std::strings are not necessarily null-terminated. Base.unsafe_string supports an optional second argument (thanks @omus for pointing this out!) which is the number of bytes in the pointed-to string; this PR uses cppsize to get that. I've also added some small tests to confirm how null-byte containing strings are handled (from the String-to-std::string direction and vice-versa); I didn't change the default StdString constructor to use the length of the julia String, although IMO whether that default should be changed is IMO up for debate, but given that it's already supported and the length-less method is the default I'm assuming that's for a good reason.

This change is technically breaking: if you were relying on the (IMO bugged) behavior of String(::StdString) discarding anything starting from the first null byte, you'll get different behavior. How you want to handle the version bump here is up to you; I'd probably release it as a patch (bugfix) but for something that's deep in some fiddly codebases it might be better to be safe rather than sorry here.

If you do want to make a breaking release with this, I'd also change the handling of the String-to-StdString method to use the string length by default, but as with this change that's breaking.

The handling of null bytes has bitten us quite a few times though, interacting with APIs taht generate strings with null bytes in them introduces very weird and difficult to debug errors.


@barche
Copy link
Collaborator

barche commented Oct 2, 2023

Thanks for this, I have to admit I never gave the "null character containing" string any thought, so I do prefer your solution. I have added the length argument also to three functions that convert to StdString, is that what you mean? Potentially we could also go for an overload like this:

StdString(x::String) = StdString(x,ncodeunits(x))

I changed length to ncodeunits as I was testing this last change locally, I think it needs to be ncodeunits in the 3 changes I committed too.

@kleinschmidt
Copy link
Contributor Author

Yeah, I think that's more or less what I had in mind! And yes ncodeunits does seem appropriate; I'm not 100% sure that covers all the weird unicode cases but I also don't think std::string is meant to handle those either so seems fine to me. I don't think we want to be doing something like checking whether the string is well-formed UTF-8 since a lot of times strings are used as "a buncha bytes" (at least that's the cases we're dealing with)

@barche barche merged commit ecd5279 into JuliaInterop:main Oct 4, 2023
9 checks passed
@barche
Copy link
Collaborator

barche commented Oct 4, 2023

Thanks, I will probably release this as non-breaking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants