Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use binary read for stdin on python3 #16

Merged
merged 1 commit into from
Sep 15, 2018
Merged

Conversation

ps2
Copy link
Contributor

@ps2 ps2 commented Sep 11, 2018

I encountered this error trying to pipe sox to auditok on mac, with python 3:

$ sox -t coreaudio -d -t raw -r 16000 -c 1 -b 16 -e signed - | auditok -i -

Input File     : 'default' (coreaudio)
Channels       : 1
Sample Rate    : 48000
Precision      : 32-bit
Sample Encoding: 32-bit Signed Integer PCM

In:0.00% 00:00:00.26 [00:00:00.00] Out:2.45k [      |      ]        Clip:0    Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.6.4_3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.6/site-packages/auditok-0.1.6-py3.6.egg/auditok/cmdline.py", line 370, in run
    self.tokenizer.tokenize(data_source=self, callback=notify_observers)
  File "/usr/local/lib/python3.6/site-packages/auditok-0.1.6-py3.6.egg/auditok/core.py", line 300, in tokenize
    frame = data_source.read()
  File "/usr/local/lib/python3.6/site-packages/auditok-0.1.6-py3.6.egg/auditok/cmdline.py", line 384, in read
    return self.ads.read()
  File "/usr/local/lib/python3.6/site-packages/auditok-0.1.6-py3.6.egg/auditok/util.py", line 547, in read
    return self.audio_source.read(self.block_size)
  File "/usr/local/lib/python3.6/site-packages/auditok-0.1.6-py3.6.egg/auditok/io.py", line 390, in read
    data = sys.stdin.read(to_read)
  File "/usr/local/Cellar/python/3.6.4_3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 12: invalid start byte

The error is caused by python3 attempting to interpret the sys.stdin.read bytes as utf8. This patch uses sys.stdin.buffer.read on python3, and continues to use the existing call on python 2.

@amsehili amsehili merged commit 52ebf33 into amsehili:dev Sep 15, 2018
@amsehili
Copy link
Owner

Works fine, thank you for this fix!

amsehili added a commit that referenced this pull request Dec 11, 2024
Use binary read for stdin on python3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants