-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoding and shibboleth #77
Comments
We have the very same problem with our installations at Uni Düsseldorf. This is showcased by the names and surnames of our users. This information is retrieved from our Shibboleth server. For example, my full name gets displayed like this: As far as I know, our Shibboleth server provides all data UTF-8 encoded. Apparently, the default database encoding causes this issue as MySQL (or some versions of it) uses latin1 as default encoding where it should use UTF-8 instead to fully work with RDMO and its libraries. We changed our MySQL database encoding to UTF-8 with no effect however. This is definitely a very visual issue as it causes both the names and content to not be displayed (and stored) correctly! |
Is a bugfix already scheduled? There are more umlauts in names than one might think. |
Yes we are working on it, but Shibboleth Problems are always hard to reproduce. Sorry for the delay. |
@maurice-schleussinger @rosenke can you tell me which versions of the IdP and SP you are using currently. |
IdP 3.3.0 |
Ok, I could reproduce the bug and this is tricky. The Shibboleth middleware gets its information from the service provider via HTTP headers. In my test case a user 'ü[email protected]' results in {
...
'REMOTE_USER': '\xc3\x83\xc2\xbc[email protected]'
...
} A UTF8 'ü' should be >>> 'ü'.encode()
b'\xc3\xbc' I can reproduce the weird encoding with >>> 'ü'.encode().decode('latin1').encode()
b'\xc3\x83\xc2\xbc' So the problem is that the Shibboleth SP uses latin1 as encoding. This can also be seen when pasting https://sp.test.rdmo.org/Shibboleth.sso/Session into I don't know how to teach the SP to use UTF8. What i could do is fork https://github.com/Brown-University-Library/django-shibboleth-remoteuser and add an encoding/decoding step. But the cleaner solution would be to configure the SP correctly. |
Yes, so if this occurs with certain (or even all?) versions of Shibboleth, explicit decoding for these versions should be implemented. Changing the encoding of Shibboleth would probably break many existing configurations. It would indeed be good if they would switch default encoding or even enforce UTF-8 in a future release. This would be a slow transition in any case, so RDMO would continue to display all text wrong for quite some time, which should be avoided. Personally I consider everything non-UTF-8 legacy software, but other encodings are still widely used… 🤷♂ |
I would like to check with other people who are using shibboleth first. Maybe it is just a missing option in the service provider. |
I can confirm, that we observe the same behaviour in our RDMO/Shibboleth installation (running RDMO 0.14.6 with Shibboleth SP 3.0.4). What puzzles me is that I see the garbled umlaut in the SAML-Token (using SAML-Tracer-Addon for Firefox), while /Shibboleth.sso/Session shows the correct umlaut (after changing the value of the showAttributeValues attribute of the Handler type="Session" element in the shibboleth2.xml). It seems that the information transmitted between Shibboleth IdP and Shibboleth SP is UTF-8-encocded (as is the default behaviour according to https://wiki.shibboleth.net/confluence/display/SP3/RequestMap and https://wiki.shibboleth.net/confluence/display/SP3/ContentSettings -- description of the encoding attribute) and the problem lies in the django-shibboleth-remoteuser Middleware. |
As a follow-up: the correct umlaut can be seen in the shibd.log (after setting the log level to DEBUG), so this indicates as well that the transmission between IdP and SP is not the culprit. |
Ok I found a solution based on this: https://shibboleth.1660669.n2.nabble.com/Url-encode-http-header-values-in-Shibboleth-SP-td7629614.html
The attributes are now URL quoted before handing them over to Django and unquoted again by my change to the django-shibboleth-remoteuser lib. Important: If these changes are made, and you have unicode characters in the username, those users would get a new user object in RDMO, thus loosing all their Project memberships. Hopefully your |
Hello, I can confirm that this fix works for our installation - thank you very much! However, on first try, I've got a server error, since the username (eppn in our case) was already present in the database:
After deleting the account via the Django Admin Interface, everything worked fine so far. |
There was still something wrong with my fix to
The REMOTE_USER was not unquoted and this conflicts with the username, obtaines in the second step of the shibboleth auth process ... |
Ok, thanks for the fast reply. I can confirm that it works for our environment. |
Closing. Seems to be fixed. |
this fix should also be included in the docs or not? https://github.com/rdmorganiser/rdmo-docs-en/blob/master/docs/configuration/authentication/shibboleth.md and in the Line 62 in a54867e
|
Packages which are not on pypi do not work in |
we can make our own release on the fork and include it in the shibboleth = [
"django-shibboleth-remoteuser @ https://github.com/rdmorganiser/django-shibboleth-remoteuser/archive/refs/tags/v0.11.zip"
] ... that works |
ok, good to know. I am not sure if we still need the patch. I will look into it. |
Yes, we still need the patch for the right encoding. |
UTF-8 strings from the IdP are converted to latin1 and stay like this in the RDMO database, which is bad.
The error message in
shibd.log
:might be related.
The text was updated successfully, but these errors were encountered: