UnicodeDecodeError running test_avro.py on Windows #19

JBPressac · 2014-01-10T19:27:06Z

Hello,
I have a UnicodeDecodeError running test_avro.py on WIndows 7. Here is a copy of my Windows terminal, thank you for you help:

C:\Users\me\Documents\Agile_Data>python test_avro.py
Traceback (most recent call last):
  File "test_avro.py", line 50, in <module>
    for record in df_reader:
  File "c:\Python27\lib\site-packages\avro\datafile.py", line 362, in next
    datum = self.datum_reader.read(self.datum_decoder)
  File "c:\Python27\lib\site-packages\avro\io.py", line 445, in read
    return self.read_data(self.writers_schema, self.readers_schema, decoder)
  File "c:\Python27\lib\site-packages\avro\io.py", line 490, in read_data
    return self.read_record(writers_schema, readers_schema, decoder)
  File "c:\Python27\lib\site-packages\avro\io.py", line 690, in read_record
    field_val = self.read_data(field.type, readers_field.type, decoder)
  File "c:\Python27\lib\site-packages\avro\io.py", line 468, in read_data
    return decoder.read_utf8()
  File "c:\Python27\lib\site-packages\avro\io.py", line 233, in read_utf8
    return unicode(self.read_bytes(), "utf-8")
UnicodeDecodeError: 'utf8' codec can't decode byte 0x9c in position 0: invalid start byte

Jean-Baptiste

The text was updated successfully, but these errors were encountered:

rjurney · 2014-01-12T00:36:02Z

Working on reproducing, thanks

gh4yarli · 2014-05-06T15:53:49Z

The problem is with the white spaces in the topic: string. Avro is trying to read the string and the white spaces are causing issues due to decoding. So if you change strings to remove spaces, it works with no errors.

df_writer.append( {"message_id": 11, "topic": "Hellogalaxy", "user_id": 1} )
df_writer.append( {"message_id": 12, "topic": "Jimissilly!", "user_id": 1} )
df_writer.append( {"message_id": 23, "topic": "Ilikeapples.", "user_id": 2} )
df_writer.close()

This does not bode well for the next chapters, so I am trying to looking into avro's io.py file to see if I can change something. Or see if I can do some encoding in the test_avro.py file.

jiujiu18 · 2016-01-11T11:23:04Z

Try 'rb' option when open file.

# Create a 'data file' (avro file) reader
df_reader = datafile.DataFileReader(
  open(OUTFILE_NAME, 'rb'),
  rec_reader
)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeDecodeError running test_avro.py on Windows #19

UnicodeDecodeError running test_avro.py on Windows #19

JBPressac commented Jan 10, 2014

rjurney commented Jan 12, 2014

gh4yarli commented May 6, 2014

jiujiu18 commented Jan 11, 2016

UnicodeDecodeError running test_avro.py on Windows #19

UnicodeDecodeError running test_avro.py on Windows #19

Comments

JBPressac commented Jan 10, 2014

rjurney commented Jan 12, 2014

gh4yarli commented May 6, 2014

jiujiu18 commented Jan 11, 2016