Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError #49

Closed
Nando-bog opened this issue May 21, 2014 · 7 comments
Closed

UnicodeDecodeError #49

Nando-bog opened this issue May 21, 2014 · 7 comments
Assignees
Labels
Milestone

Comments

@Nando-bog
Copy link

I am getting a UnicodeDecodeError when decrypting a text that was encrypted using the library. However, the text is decypted properly despite the error.

Details:
OS: Mac OS Mavericks 10.9.2
Python: 2.7.6
gnupg: 1.2.5

Sample error from my Terminal:

from gnupg import GPG
g=GPG(homedir='MY GPG HOME DIR')
c=g.encrypt('hola', 'KEY ID')
p=g.decrypt(str(c), passphrase='MYPASSWORD')
Exception in thread Thread-8:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 763, in run
self.__target(_self.__args, *_self.__kwargs)
File "/Library/Python/2.7/site-packages/gnupg/_meta.py", line 532, in _read_response
line = stream.readline()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 530, in readline
data = self.read(readsize, firstline=True)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 477, in read
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xed in position 0: invalid continuation byte
print(p)
hola

As the last line shows, decryption worked, but it still threw the error.

Thanks!

@isislovecruft
Copy link
Owner

Hello @Nando-bog! Thanks for reporting this bug.

I don't have a Mac to test it on, but on a Linux machine I get the following:

>>> from gnupg import GPG
>>> g = GPG(homedir='foobar')
>>> c = g.encrypt('hallo', '50CC7744')
>>> print c.data
-----BEGIN PGP MESSAGE-----

hIwDz4uqK8zd5zkBA/962lezKEAsh157nZsiR+KYd/PW1jdxPG2u1RD4BaSEpkGF
cUlIkJmpliC0qiYvjA2ssnP4DPQ582z4rYAWVmbGjbrBIuQ3FBJBWxWbCkbDqCyu
tFzoCFkmILRQo6DLNgjNtXZPHiqYrP9ll5BaeteE1ooroJ0x3YSDMxbayX61OtJA
Aa2ST3t7iBU6xe6vRr8+4n3stbAwYB2H0RDh5/S/buVJQCI0tbmVSwLxdLZwadFF
XEq8W1X7iWPcGEmKlOkEag==
=9Rge
-----END PGP MESSAGE-----
>>> c.status
'encryption ok'
>>> d = g.decrypt(c.data)
>>> d.data
'hallo'

This is with Python 2.7.6 and python-gnupg-1.2.6.

It could have something to do with locale settings in your terminal. python-gnupg tries to be smart and respect them when necessary, otherwise it defaults to one with utf8. Or, it could be because you did str(c). I'm not sure.

@Nando-bog, do you think you could try again, using c.data and d.data, rather than str(c) and str(d), please?

@isislovecruft isislovecruft added this to the 1.2.8 milestone Jul 9, 2014
@isislovecruft isislovecruft self-assigned this Jul 9, 2014
isislovecruft added a commit that referenced this issue Oct 28, 2014
 * CHANGE gnupg._meta.GPGBase.__init__() to register the builtin
   `codecs.replace_errors` handler and a global codecs "strict" error
   handler.
 * FIXES Issue #49:
   #49
@isislovecruft isislovecruft modified the milestones: 1.3.2, 1.3.x, 1.3.3 Oct 28, 2014
@isislovecruft
Copy link
Owner

I believe this issue was fixed in my fix/49-unicode-decode-on-readline branch, which I've merged into develop and will be available in the next upcoming version (1.3.3).

Please reopen if the issue persists.

@extempore
Copy link

I got a lot of UnicodeDecode errors while doing gpg.recv_keys() for a bunch of keys. It seems they were imported in the pubring successfully though.

Here is a key that gave me one of these errors: A0D180F35F45D0A0FBED9CD36E68F80607AF1977

I was using the current pypi version, should I switch to the develop branch or is it unstable?

Exception in thread Thread-683:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 808, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 761, in run
    self.__target(*self.__args, **self.__kwargs)
  File ""/home/ba/env_deed/local/lib/python2.7/site-packages/gnupg/_meta.py", line 564, in _read_response
    line = stream.readline()
  File "/home/ba/env_deed/lib/python2.7/codecs.py", line 530, in readline
    data = self.read(readsize, firstline=True)
  File ""/home/ba/env_deed/lib/python2.7/codecs.py", line 477, in read
    newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xf6 in position 0: invalid start byte

@pwfff
Copy link

pwfff commented May 17, 2016

This solution needs to be rethought. Changing this global callback is affecting other code, most notably the DataStax Cassandra driver. The exception in the following code is never raised, so what is already valid UTF-8 has its characters replaced with garbage: https://github.com/datastax/python-driver/blob/master/cassandra/cqltypes.py#L675

@sawall
Copy link

sawall commented Oct 24, 2016

This solution definitely needs to be rethought. It breaks MIME encodings of attachments even if I am not applying PGP to them. For example, the base64 representation of an image will be munged here:

import gnupg
gpg = gnupg.GPG('/path/to/gpg')

import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.image import MIMEImage

def send_my_email():
    msg = MIMEMultipart()
    msg['Subject'] = 'subject'
    msg['From'] = '[email protected]'
    msg['To'] = '[email protected]'
    with open('/tmp/image.jpg', mode='rb') as image_file:
        image = MIMEImage(image_file.read())
    msg.attach(image)
    s = smtplib.SMTP('smtp.gmail.com', 587)
    s.starttls()
    s.login('[email protected]', 'password')
    s.send_message(msg)
    s.quit()

@sawall
Copy link

sawall commented Oct 25, 2016

Note that a workaround of the monkey-patch is to define gpgon and gpgoff functions and use them around any gpg calls. Presumably an approach like this could be used in a decorator that could be injected into python-gnupg.

If code like this is used when a package is loaded, it will wrangle this situation:

import codecs
default_strict_func = codecs.lookup_error('strict')
import gnupg
gpg = gnupg.GPG('/path/to/gpg')
gpg_strict_func = codecs.lookup_error('strict')
def gpgon(): codecs.register_error('strict', gpg_strict_func)
def gpgoff(): codecs.register_error('strict', default_strict_func)

@e3rd
Copy link

e3rd commented Jan 4, 2018

Thanks for posting this workaround!! I've expanded it so that you may use GPGSafe class instead of gnupg.GPG without having to manage anything.

gpg = GPGSafe(use_agent=False, homedir="~/.gnupg/") # (instead of gpg = gnupg.GPG(...))
gpg.sign(text)
...

https://gist.github.com/e3rd/45aed2e93ac20843b6790b6b642da396

(Since this issue remains closed, I've also created a pull request so that it is noted by project visitors.)

jannschu added a commit to jannschu/python-gnupg that referenced this issue Oct 7, 2018
This removes the monkey-patch from isislovecruft/python-gnupg@d9116ba and instead uses a local modification of the StreamReader by switching from ›strict‹ error handlers (the default) to ›replace‹ error handlers.

This should resolve isislovecruft#219 and isislovecruft#49, as well as email attachments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants