Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError when _tty_in=True and output is stdout #346

Closed
brainpower opened this issue Dec 6, 2016 · 2 comments
Closed

UnicodeDecodeError when _tty_in=True and output is stdout #346

brainpower opened this issue Dec 6, 2016 · 2 comments
Labels

Comments

@brainpower
Copy link

brainpower commented Dec 6, 2016

After updating to 1.12.6 I get sometimes the following error when running commands:

Exception in thread STDOUT/ERR thread for pid 2123:
Traceback (most recent call last):
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.5/site-packages/sh.py", line 2243, in output_thread
    done = stream.read()
  File "/usr/lib/python3.5/site-packages/sh.py", line 2688, in read
    self.write_chunk(chunk)
  File "/usr/lib/python3.5/site-packages/sh.py", line 2663, in write_chunk
    self.should_quit = self.process_chunk(chunk)
  File "/usr/lib/python3.5/site-packages/sh.py", line 2565, in process
    handler.write(encode(chunk))
  File "/usr/lib/python3.5/site-packages/sh.py", line 2558, in 
    encode = lambda chunk: chunk.decode(handler.encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 0: unexpected end of data

I get the same error with python 2.7.

The error also only appears when _in_tty=True is set.
The following simple script triggers this:

#!/usr/bin/python

import sys
from sh import echo

echo("äääääääääääääääääääääääääääääääääääääääääää", _out=sys.stdout, _out_bufsize=0, _tty_in=True)

Maybe the splitting issue described in the following answer is the reason for the error? The name 'chunk' suggests that the string gets split, after all...
http://stackoverflow.com/a/24004343/1563584

@amoffat amoffat added the bug label Dec 6, 2016
amoffat pushed a commit that referenced this issue Dec 7, 2016
amoffat pushed a commit that referenced this issue Dec 7, 2016
@amoffat
Copy link
Owner

amoffat commented Dec 7, 2016

Thanks for reporting. There were actually a couple of bugs knotted up into the behavior you were seeing. One problem, from a user's perspective, is that you shouldn't be able to set _out_bufsize to a value and set _out to be a pipe or tty. The reason is that pipes and ttys already have their own buffering that you have no control over, so sh saying "apply some buffering" is more or less useless. The read ends of pipes and ttys will receive data according to your OS buffering.

I've added an argument validator to prevent people from setting both options at once.

The other issue that you helped discover was an internal issue where some internal processing threads (the ones that did the chunk splitting) were running they shouldn't have been running. So I pushed up a fix for that.

Version 1.12.7 is live and has those fixes.

The thing to keep in mind though in the future is that if you are using _out targets that can be controlled with _out_bufsize buffering, like functions or queues, etc, then you should be careful of your encoding, or you can still end up trying to decode split bytes, which will raise the error you're seeing.

@brainpower
Copy link
Author

Thanks for the fix and the background information.

0-wiz-0 added a commit to NetBSD/pkgsrc-wip that referenced this issue Dec 12, 2016
*   added `_out` and `_out_bufsize` validator [#346](amoffat/sh#346)
*   bugfix for internal stdout thread running when it shouldn't [#346](amoffat/sh#346)

*   regression bugfix on timeout [#344](amoffat/sh#344)
*   regression bugfix on `_ok_code=None`

*   further improvements on cpu usage

*   regression in cpu usage [#339](amoffat/sh#339)

*   fd leak regression and fix for flawed fd leak detection test [#337](amoffat/sh#337)

*   support for `io.StringIO` in python2

*   added support for using raw file descriptors for `_in`, `_out`, and `_err`
*   removed `.close()`ing `_out` handler if FIFO detected

*   composed commands no longer propagate `_bg`
*   better support for using `sys.stdin` and `sys.stdout` for `_in` and `_out`
*   bugfix where `which()` would not stop searching at the first valid executable found in PATH
*   added `_long_prefix` for programs whose long arguments start with something other than `--` [#278](amoffat/sh#278)
*   added `_log_msg` for advanced configuration of log message [#311](amoffat/sh#311)
*   added `sh.contrib.sudo`
*   added `_arg_preprocess` for advanced command wrapping
*   alter callable `_in` arguments to signify completion with falsy chunk
*   bugfix where pipes passed into `_out` or `_err` were not flushed on process end [#252](amoffat/sh#252)
*   deprecated `with sh.args(**kwargs)` in favor of `sh2 = sh(**kwargs)`
*   made `sh.pushd` thread safe
*   added `.kill_group()` and `.signal_group()` methods for better process control [#237](amoffat/sh#237)
*   added `new_session` special keyword argument for controlling spawned process session [#266](amoffat/sh#266)
*   bugfix better handling for EINTR on system calls [#292](amoffat/sh#292)
*   bugfix where with-contexts were not threadsafe [#247](amoffat/sh#195)
*   `_uid` new special keyword param for specifying the user id of the process [#133](amoffat/sh#133)
*   bugfix where exceptions were swallowed by processes that weren't waited on [#309](amoffat/sh#309)
*   bugfix where processes that dupd their stdout/stderr to a long running child process would cause sh to hang [#310](amoffat/sh#310)
*   improved logging output [#323](amoffat/sh#323)
*   bugfix for python3+ where binary data was passed into a process's stdin [#325](amoffat/sh#325)
*   Introduced execution contexts which allow baking of common special keyword arguments into all commands [#269](amoffat/sh#269)
*   `Command` and `which` now can take an optional `paths` parameter which specifies the search paths [#226](amoffat/sh#226)
*   `_preexec_fn` option for executing a function after the child process forks but before it execs [#260](amoffat/sh#260)
*   `_fg` reintroduced, with limited functionality.  hurrah! [#92](amoffat/sh#92)
*   bugfix where a command would block if passed a fd for stdin that wasn't yet ready to read [#253](amoffat/sh#253)
*   `_long_sep` can now take `None` which splits the long form arguments into individual arguments [#258](amoffat/sh#258)
*   making `_piped` perform "direct" piping by default (linking fds together).  this fixes memory problems [#270](amoffat/sh#270)
*   bugfix where calling `next()` on an iterable process that has raised `StopIteration`, hangs [#273](amoffat/sh#273)
*   `sh.cd` called with no arguments no changes into the user's home directory, like native `cd` [#275](amoffat/sh#275)
*   `sh.glob` removed entirely.  the rationale is correctness over hand-holding. [#279](amoffat/sh#279)
*   added `_truncate_exc`, defaulting to `True`, which tells our exceptions to truncate output.
*   bugfix for exceptions whose messages contained unicode
*   `_done` callback no longer assumes you want your command put in the background.
*   `_done` callback is now called asynchronously in a separate thread.
*   `_done` callback is called regardless of exception, which is necessary in order to release held resources, for example a process pool
@amoffat amoffat closed this as completed Dec 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants