Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid UTF8 strings can be created but not displayed on the repl? #11744

Closed
malmaud opened this issue Jun 18, 2015 · 4 comments
Closed

Invalid UTF8 strings can be created but not displayed on the repl? #11744

malmaud opened this issue Jun 18, 2015 · 4 comments

Comments

@malmaud
Copy link
Contributor

malmaud commented Jun 18, 2015

Puzzled by this:

b=IOBuffer()
serialize(b, SharedArray(Float64, (10,)))
seek(b,0)
s=readall(b);  # Works
s  # Error
ERROR: invalid character index
 in next at ./utf8.jl:69
 in print_escaped at string.jl:880
 in print_quoted at string.jl:897
 in show at string.jl:74
 in anonymous at show.jl:1254
 in with_output_limit at ./show.jl:1231
 in showlimited at show.jl:1253
 in writemime at replutil.jl:4
 in display at REPL.jl:113
 in display at REPL.jl:116
 in display at multimedia.jl:151
 in print_response at REPL.jl:133
 in print_response at REPL.jl:120
 in anonymous at REPL.jl:600
 in run_interface at ./LineEdit.jl:1566
 in run_frontend at ./REPL.jl:841
 in run_repl at ./REPL.jl:165
 in _start at ./client.jl:446
@JeffBezanson
Copy link
Member

Basically a dup of #1792. We are fairly saturated with issues about when to validate text encodings. As it stands, this behavior makes sense to me: you can store invalid data in a UTF8String, but operations on it might have undefined behavior.

@malmaud
Copy link
Contributor Author

malmaud commented Jun 18, 2015

Thanks, makes sense.

@ScottPJones
Copy link
Contributor

Sorry, but I'd disagree on that, @JeffBezanson... I still think that if you want to store invalid data, it should be to a Vector{UInt8}, or an IOBuffer, while you are constructing it, but that any immutable strings of type ASCIIString, UTF*String should be completely valid. There are security issues with that hole of "undefined" behavior...

@nalimilan
Copy link
Member

@JeffBezanson That comment sounds in opposition with what you said in #1792. I agree that the current situation has some consistency, but as you argued there ensuring that strings are valid in some future version would allow for performance improvements and prevents bugs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants