Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--update-user-info and special characters #167

Closed
Gor3t3x opened this issue Apr 21, 2017 · 5 comments
Closed

--update-user-info and special characters #167

Gor3t3x opened this issue Apr 21, 2017 · 5 comments
Assignees
Labels
Milestone

Comments

@Gor3t3x
Copy link

Gor3t3x commented Apr 21, 2017

Hi,

Last bug i found is using --update-user-info with specials characters like "ç" "é" "è" has a strange behavior.

those characters are pretty common in countries where we speak french...

2017-04-21 14:19:54 2712 INFO processor - Updating info for user key: federatedID,[email protected], changes: {'firstname': 'Andr\xc3\xa9'}
2017-04-21 14:19:54 2712 INFO processor - Updating info for user key: federatedID,[email protected], changes: {'firstname': 'Aur\xc3\xa9lie'}
2017-04-21 14:19:54 2712 INFO processor - Updating info for user key: federatedID,[email protected], changes: {'firstname': 'Andr\xc3\xa9'}
2017-04-21 14:19:54 2712 INFO processor - Updating info for user key: federatedID,[email protected], changes: {'firstname': 'Andr\xc3\xa9'}

Any idea how to resolve this?

Is this related => http://stackoverflow.com/questions/6956799/working-with-unicode-encoded-strings-from-active-directory-via-python-ldap)

EDIT: The problem seems to be present only in console, in adobe dashboard, the special characters are well encoded but still getting this error on each sync with --update-user-info with sames users to modfy

2017-04-21 16:00:08 3368 INFO processor - ---------- Start Sync Umapi --------------------------------
user_sync-2.0-py2-none-any.whl.58f0d6835e0dec629e1283c1839d8f5ad6f21614\user_sync-2.0-py2-none-any.whl\user_sync\rules.py:844:
UnicodeWarning: Unicode unequal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
if (value != umapi_value):

@ianmak
Copy link
Contributor

ianmak commented Apr 22, 2017

This appears to be a limitation of python printing/logging... when you log a dictionary value, sub values containing symbols or non-english characters don't seem to show up properly. I haven't found a quick solution to this, I'll try digging into this a bit more next week.

The Unicode warning you're seeing is sort of a separate problem... I think it's because the directory loads string values, whereas umapi works with unicode strings, and it gets confused when it compares strings with symbols to unicode strings. I'll look into that next week as well...

@adobeDan
Copy link
Contributor

adobeDan commented May 4, 2017

The problem here is that we are not explicitly converting the strings fetched from the directory to unicode via utf-8 encoding, so Python is using the default (ascii) encoding assumed for 2.7 strings. The strings that come back from the umapi-client (and the UMAPI server) are always unicode strings, which is why Python is attempting to convert the directory strings to unicode when doing the comparison.

This should be fairly easy to fix. I will take a look.

@adobeDan
Copy link
Contributor

adobeDan commented May 4, 2017

Not such a simple fix after all: it revealed that the UMAPI client couldn't handle unicode strings: adobe-apiplatform/umapi-client.py#41. So first I had to go fix that and release it; now I can use the new version to fix this bug.

@adobeDan
Copy link
Contributor

adobeDan commented May 5, 2017

Wow, did this ever turn out to be a rabbit hole! Having allowed non-ascii strings everywhere, I got to find out where they are and are not allowed:

Yes, allowed:

  • in people's first and last names
  • in adobe group names (both PCs and user groups)

No, not allowed:

  • in email addresses
  • in federated usernames (since they are the local part of email addresses)

Since non-ascii chars are allowed in adobe group names, that means they can show up in config files, in the directory group mapping! So in addition to allowing non-ascii input from ldap and csv, I had to allow for non-ascii config files as well! So there is a new, optional command-line parameter --config-file-encoding whose first arg specifies the encoding of the config files (default ascii).

adobeDan added a commit that referenced this issue May 5, 2017
Fix #167: allow non-ascii unicode chars in user and group names.  Also fix #159 and fix #173, both for the second time :(.
@adobeDan
Copy link
Contributor

adobeDan commented Jun 2, 2017

So this turns out not to be completely fixed, for two reasons:

  • csv input should be done in binary mode to handle all encodings properly
  • LDAP format strings are unicode, so you can't do string formatting unless you decode what goes into them.

@adobeDan adobeDan reopened this Jun 2, 2017
@adobeDan adobeDan modified the milestones: v2.1.1, v2.1 Jun 2, 2017
adobeDan added a commit that referenced this issue Jun 6, 2017
* Modularize the CSV handling into an object that's unicode-aware.  This not only fixes a file mode bug, and does catching of unicode issues, but it also makes us ready for py3 where the CSV module actually handles unicode strings.
* NOTE: because emails cannot contain non-ascii chars, the stray files don't need encoding on input or output.
* Make the LDAP attribute formatters fully unicode aware.  Before they didn't realize that the format strings were themselves unicode, so they were re-encoding the results of formatting.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants