Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding errors in non-ASCII locales #153

Closed
gflohr opened this issue Apr 1, 2014 · 3 comments
Closed

Encoding errors in non-ASCII locales #153

gflohr opened this issue Apr 1, 2014 · 3 comments
Assignees
Labels

Comments

@gflohr
Copy link

gflohr commented Apr 1, 2014

Current git checkout with python 2.7.5 on gentoo, running a Gnome session in the locale de_DE.UTF-8:

$ /usr/bin/hamster list

Traceback (most recent call last):
File "/usr/bin/hamster", line 391, in
getattr(hamster_client, command)(_args)
File "/usr/bin/hamster", line 254, in list
self._list(start_time, end_time)
File "/usr/bin/hamster", line 313, in list
print fact_line.format(
*headers)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 1: ordinal not in range(128)
$

The string "Activity" from in /usr/bin/hamster line 313 translates to "Tätigkeit" in German, and 0xe4 is the code point for the "ä". The issue vanishes by calling gettext.install() in src/hamster/lib/i18n.py with the "unicode" argument set to False.

Now without the "list" command:

$ /usr/bin/hamster
...
File "/usr/lib64/python2.7/site-packages/hamster/lib/stuff.py", line 91, in format_range
title = (u"%(start_B)s %(start_d)s – %(end_B)s %(end_d)s, %(end_Y)s") % dates_dict
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

After changing the u"" strings into byte strings without prefix, I get this:

...
self.label.set_markup('<b>%s</b>' % stuff.format_range(start_date, end_date).encode("utf-8"))

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

Note that 0xc3 is the code point "Ã". So this is obviously a double encoding error for utf-8. The problem vanishes after removing the encode('utf-8') from set_range() in src/hamster/widgets/dates.py:

def set_range(self, start_date, end_date=None):
    end_date = end_date or start_date
    self.start_date, self.end_date = start_date, end_date
    #self.label.set_markup('<b>%s</b>' % stuff.format_range(start_date, end_date).encode("utf-8"))
    self.label.set_markup('<b>%s</b>' % stuff.format_range(start_date, end_date))

After those modifications the window shows up and the application seems to work.

I can send you a real patch if you want but to me it looks like a crude workaround for a more general problem.

@toupeira toupeira added the bug label Sep 5, 2014
nchachereau pushed a commit to nchachereau/hamster that referenced this issue Nov 9, 2014
nchachereau pushed a commit to nchachereau/hamster that referenced this issue Nov 9, 2014
@nchachereau
Copy link

A few notes:

First, the title is slightly misleading. As a GTK+ app, Hamster is using a “non-ASCII locale” all the time (UTF-8). The problem is rather that you get errors when you use characters that are not part of ASCII-7 in your locale and/or your activity names. This is important for most languages other than English...

This bug report mentions two errors: the second one, in format_range, was fixed in commit b7733c5. The first one is still there. Furthermore, as @gflohr noted, it is a more general problem. With the code in the trunk, you cannot even start an activity with an accent in it (try “Boire mon thé” through the Add Activity dialog).

As @kazeevn argues in PR #174, porting to Python 3 would definitely make things easier. As this is a huge transition, however, I have started working on a smaller set of fixes to these encoding issues in my fix-encoding-issues branch.

I will propose a pull request once it seems done.

@stedybg
Copy link

stedybg commented Apr 29, 2015

I use repositories version in Ubuntu 14.04 and all seems to work fine except that when I write tasks and tags using Cyrillic letters in the reports just the headings are readable, and even translated in Bulgarian, but the real data is displayed with incorrect encoding. Any workarounds or resolutions?
screenshot from 2015-04-29 15 26 40

@nchachereau nchachereau self-assigned this May 22, 2015
ederag added a commit to ederag/hamster that referenced this issue Aug 22, 2018
Hopefully, fix utf-8 related issues projecthamster#153, projecthamster#239, projecthamster#317

Co-authored-by: Nikita Kazeev
Thanks for showing the way in PR projecthamster#174 !
@ederag
Copy link
Collaborator

ederag commented Nov 30, 2018

Thanks a lot.
This utf-8 issue has hopefully been fixed by the python3 migration.

@ederag ederag closed this as completed Nov 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants