Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError running test_avro.py on Windows #19

Open
JBPressac opened this issue Jan 10, 2014 · 3 comments
Open

UnicodeDecodeError running test_avro.py on Windows #19

JBPressac opened this issue Jan 10, 2014 · 3 comments

Comments

@JBPressac
Copy link

Hello,
I have a UnicodeDecodeError running test_avro.py on WIndows 7. Here is a copy of my Windows terminal, thank you for you help:

C:\Users\me\Documents\Agile_Data>python test_avro.py
Traceback (most recent call last):
  File "test_avro.py", line 50, in <module>
    for record in df_reader:
  File "c:\Python27\lib\site-packages\avro\datafile.py", line 362, in next
    datum = self.datum_reader.read(self.datum_decoder)
  File "c:\Python27\lib\site-packages\avro\io.py", line 445, in read
    return self.read_data(self.writers_schema, self.readers_schema, decoder)
  File "c:\Python27\lib\site-packages\avro\io.py", line 490, in read_data
    return self.read_record(writers_schema, readers_schema, decoder)
  File "c:\Python27\lib\site-packages\avro\io.py", line 690, in read_record
    field_val = self.read_data(field.type, readers_field.type, decoder)
  File "c:\Python27\lib\site-packages\avro\io.py", line 468, in read_data
    return decoder.read_utf8()
  File "c:\Python27\lib\site-packages\avro\io.py", line 233, in read_utf8
    return unicode(self.read_bytes(), "utf-8")
UnicodeDecodeError: 'utf8' codec can't decode byte 0x9c in position 0: invalid start byte

Jean-Baptiste

@rjurney
Copy link
Owner

rjurney commented Jan 12, 2014

Working on reproducing, thanks

@gh4yarli
Copy link

gh4yarli commented May 6, 2014

The problem is with the white spaces in the topic: string. Avro is trying to read the string and the white spaces are causing issues due to decoding. So if you change strings to remove spaces, it works with no errors.

df_writer.append( {"message_id": 11, "topic": "Hellogalaxy", "user_id": 1} )
df_writer.append( {"message_id": 12, "topic": "Jimissilly!", "user_id": 1} )
df_writer.append( {"message_id": 23, "topic": "Ilikeapples.", "user_id": 2} )
df_writer.close()

This does not bode well for the next chapters, so I am trying to looking into avro's io.py file to see if I can change something. Or see if I can do some encoding in the test_avro.py file.

@jiujiu18
Copy link

Try 'rb' option when open file.

# Create a 'data file' (avro file) reader
df_reader = datafile.DataFileReader(
  open(OUTFILE_NAME, 'rb'),
  rec_reader
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants