Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

escape_string / unescape_string not symmetric #2867

Closed
WestleyArgentum opened this issue Apr 15, 2013 · 4 comments
Closed

escape_string / unescape_string not symmetric #2867

WestleyArgentum opened this issue Apr 15, 2013 · 4 comments

Comments

@WestleyArgentum
Copy link
Member

I have this issue when trying to escape / unescape strings that were originally parsed expressions containing floats (integers work fine, haven't tried much else).

julia> i = IOBuffer()
IOBuffer([],true,true,true,false,0,9223372036854775807,1)

julia> serialize(i, :(r = 88.9))

julia> seekstart(i)
true

julia> str = readall(i)
"\x16\x02?1\\\x0e�����9V@"

julia> estr = escape_string(str)
"\\x16\\x02?1\\\\\\x0e�����9V@"

julia> ustr = unescape_string(estr)
"\x16\x02?1\\\x0e�����9V@"

julia> str == ustr
false

And, indeed, if you try to parse ustr:

julia> o = IOBuffer(ustr)
IOBuffer([0x16, 0x02, 0x3f, 0x31, 0x5c, 0x0e, 0xef, 0xbf, 0xbd, 0xef  …  0xef, 0xbf, 0xbd, 0xef, 0xbf, 0xbd, 0x39, 0x56, 0x40],true,false,true,false,24,9223372036854775807,1)

julia> deserialize(o)
:(r = -0.9919128115125185)
@JeffBezanson
Copy link
Member

This happens because the string is not valid utf-8. In many ways the same issue as #1792 .
It would probably be better to throw an error than to return an invalid UTF8String object, but then that might leave you with no way to read text in an unsupported encoding. We could also add readbytes, to do a readall as a byte array instead of as text.

@WestleyArgentum
Copy link
Member Author

So, actually, it looks like IOBuffer has readbytes and that readall is defined using it.

What I get back is of type UTF8String, I notice that ByteString is really a union of that and ASCIIString, is it possible that there needs to be more sophisticated magic to determine which member of the union should be used?

I imagine what's happening now is that the thing reading the string recognizes valid utf8 characters and so bumps the type of the whole thing to UTF8String... maybe it needs to realize that it there are also invalid characters and then take it down a notch (here meaning ASCIIString)?

@JeffBezanson
Copy link
Member

This particular data is neither valid UTF-8 nor ASCII, so to pursue that we'd need to add Latin1String, of course not being sure whether the data is truly latin-1 either.

@JeffBezanson
Copy link
Member

I think you will have to use readall(i).data and use something like base64 designed for sending arbitrary binary data, instead of the text API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants