Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception if GIT_COMMITTER_NAME contains UTF-8 encoded Umlaut #237

Closed
guettli opened this issue Jan 19, 2015 · 8 comments
Closed

Exception if GIT_COMMITTER_NAME contains UTF-8 encoded Umlaut #237

guettli opened this issue Jan 19, 2015 · 8 comments

Comments

@guettli
Copy link

guettli commented Jan 19, 2015

If I run the python script using gitpython with this env var:

export GIT_COMMITTER_NAME="Thomas Müller"

I get this exception:


Error
Traceback (most recent call last):
  File "/home/foo_eins_d/src/scmtools/scm/tests/test_git.py", line 25, in test_git_status
    repo.index.commit('initial commit')
  File "/home/foo_eins_d/local/lib/python2.7/site-packages/git/index/base.py", line 900, in commit
    head, author=author, committer=committer)
  File "/home/foo_eins_d/local/lib/python2.7/site-packages/git/objects/commit.py", line 349, in create_from_tree
    new_commit._serialize(stream)
  File "/home/foo_eins_d/local/lib/python2.7/site-packages/git/objects/commit.py", line 392, in _serialize
    altz_to_utctz_str(self.committer_tz_offset))).encode(self.encoding))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 18: ordinal not in range(128)

Version: GitPython==0.3.5

@Byron Byron added this to the v0.3.6 - Features milestone Jan 19, 2015
@Byron
Copy link
Member

Byron commented Jan 19, 2015

This is an interesting one ! git-python does handle non-ascii encodings, and defaults to utf-8.
In your case, there either is a non-obvious implicit conversion, or self.encoding ended up being 'ascii'.

I will review how encodings are handled to make sure something like this can't easily happen.

In the meanwhile, can you post the test in test_git.py here to make sure I can reproduce the issue ?

@Byron
Copy link
Member

Byron commented Jan 19, 2015

You can watch the archived development stream on youtube

@guettli
Copy link
Author

guettli commented Jan 26, 2015

thank you very much for the fast fix.

@saheel1115
Copy link

@Byron A similar issue occurs in GitPython-1.0.1 (tested with Python 2.7 and 3.4) when I try to do Repo.git.blame() and the blame output contains some funny characters. Is this issue fixed somewhere upstream, so that I use a later version?

@Byron
Copy link
Member

Byron commented Aug 8, 2015

@saheel1115 Repo.git.blame() executes the git command and attempts to interpret the returned bytes as UTF-8 string. If that is not expected to work, you can keep the bytes unchanged like so. Repo.git.blame(stdout_as_string=False).

@saheel1115
Copy link

@Byron Cool, sounds perfect. Thanks!

@jrydberg
Copy link

jrydberg commented Sep 4, 2018

The stdout_as_string=False workaround for blame isn't working anymore since that is already passed to self.git.blame. defencis utf-8 but the funny character in the committer name gets the parser off track.

@Byron
Copy link
Member

Byron commented Oct 2, 2018

@jrydberg It could be that whatever it is trying to parse is not actually encoded in UTF-8. The current implementation doesn't handle encodings properly at all, unfortunately, and I am afraid there is no fix for that unless one changes all str usages into bytes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants