strip() fails on string read from non-utf8 file #7525

swadey · 2014-07-05T01:24:43Z

test.txt contains a single character:

\377

I get this error:

julia> f = open("test.txt")
IOStream(<file test.txt>)

julia> x = readline(f)
"\ufffd"

julia> strip(x)
ERROR: BoundsError()
 in getindex at ./array.jl:267
 in getindex at ./utf8.jl:111
 in lstrip at string.jl:1414
 in lstrip at string.jl:1410
 in strip at string.jl:1434

The text was updated successfully, but these errors were encountered:

stevengj · 2014-07-05T08:26:13Z

Can you run readbytes(f) instead so that we can see the actual bytes in the file?

Is it UTF-16, or...?

swadey · 2014-07-05T09:33:45Z

Sure it's 0xff that causes the issue. It's from the 20 newsgroup dataset.

julia> x = readbytes(f)
2-element Array{Uint8,1}:
 0xff
 0x0a

JeffBezanson · 2014-07-05T16:17:34Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

strip() fails on string read from non-utf8 file #7525

strip() fails on string read from non-utf8 file #7525

swadey commented Jul 5, 2014

stevengj commented Jul 5, 2014

swadey commented Jul 5, 2014

JeffBezanson commented Jul 5, 2014

strip() fails on string read from non-utf8 file #7525

strip() fails on string read from non-utf8 file #7525

Comments

swadey commented Jul 5, 2014

stevengj commented Jul 5, 2014

swadey commented Jul 5, 2014

JeffBezanson commented Jul 5, 2014