Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding and shibboleth #77

Closed
jochenklar opened this issue Apr 13, 2018 · 20 comments
Closed

Encoding and shibboleth #77

jochenklar opened this issue Apr 13, 2018 · 20 comments
Labels

Comments

@jochenklar
Copy link
Member

jochenklar commented Apr 13, 2018

UTF-8 strings from the IdP are converted to latin1 and stay like this in the RDMO database, which is bad.

The error message in shibd.log:

2018-04-12 11:08:12 WARN OpenSAML.MessageDecoder.SAML2SOAP [3]: ignoring incorrect content type (application/x-www-form-urlencoded)

might be related.

@maurice-schleussinger
Copy link

We have the very same problem with our installations at Uni Düsseldorf. This is showcased by the names and surnames of our users. This information is retrieved from our Shibboleth server. For example, my full name gets displayed like this:
image

As far as I know, our Shibboleth server provides all data UTF-8 encoded. Apparently, the default database encoding causes this issue as MySQL (or some versions of it) uses latin1 as default encoding where it should use UTF-8 instead to fully work with RDMO and its libraries.
Django 2.2 should use UTF-8 (also see https://docs.djangoproject.com/en/2.2/ref/unicode/) and handles unicode input from the browser just fine. I can change my name via the web admin interface, and it's displayed correctly until I refresh my session.

We changed our MySQL database encoding to UTF-8 with no effect however.

This is definitely a very visual issue as it causes both the names and content to not be displayed (and stored) correctly!

@rosenke
Copy link

rosenke commented Jun 27, 2019

Is a bugfix already scheduled? There are more umlauts in names than one might think.

@jochenklar
Copy link
Member Author

Yes we are working on it, but Shibboleth Problems are always hard to reproduce. Sorry for the delay.

@jochenklar
Copy link
Member Author

@maurice-schleussinger @rosenke can you tell me which versions of the IdP and SP you are using currently.

@maurice-schleussinger
Copy link

maurice-schleussinger commented Jul 5, 2019

@jochenklar

IdP 3.3.0
SP 2.5.3

@jochenklar
Copy link
Member Author

Ok, I could reproduce the bug and this is tricky. The Shibboleth middleware gets its information from the service provider via HTTP headers. In my test case a user 'ü[email protected]' results in

{
...
    'REMOTE_USER': '\xc3\x83\xc2\xbc[email protected]'
...
}

A UTF8 'ü' should be

>>> 'ü'.encode()
b'\xc3\xbc'

I can reproduce the weird encoding with

>>> 'ü'.encode().decode('latin1').encode()
b'\xc3\x83\xc2\xbc'

So the problem is that the Shibboleth SP uses latin1 as encoding. This can also be seen when pasting https://sp.test.rdmo.org/Shibboleth.sso/Session into chardet3 to detect the encoding.

I don't know how to teach the SP to use UTF8.

What i could do is fork https://github.com/Brown-University-Library/django-shibboleth-remoteuser and add an encoding/decoding step. But the cleaner solution would be to configure the SP correctly.

@maurice-schleussinger
Copy link

Yes, so if this occurs with certain (or even all?) versions of Shibboleth, explicit decoding for these versions should be implemented.

Changing the encoding of Shibboleth would probably break many existing configurations. It would indeed be good if they would switch default encoding or even enforce UTF-8 in a future release. This would be a slow transition in any case, so RDMO would continue to display all text wrong for quite some time, which should be avoided.

Personally I consider everything non-UTF-8 legacy software, but other encodings are still widely used… 🤷‍♂

@jochenklar
Copy link
Member Author

I would like to check with other people who are using shibboleth first. Maybe it is just a missing option in the service provider.

@mstuehrenberg
Copy link

I can confirm, that we observe the same behaviour in our RDMO/Shibboleth installation (running RDMO 0.14.6 with Shibboleth SP 3.0.4).

What puzzles me is that I see the garbled umlaut in the SAML-Token (using SAML-Tracer-Addon for Firefox), while /Shibboleth.sso/Session shows the correct umlaut (after changing the value of the showAttributeValues attribute of the Handler type="Session" element in the shibboleth2.xml).

It seems that the information transmitted between Shibboleth IdP and Shibboleth SP is UTF-8-encocded (as is the default behaviour according to https://wiki.shibboleth.net/confluence/display/SP3/RequestMap and https://wiki.shibboleth.net/confluence/display/SP3/ContentSettings -- description of the encoding attribute) and the problem lies in the django-shibboleth-remoteuser Middleware.

@mstuehrenberg
Copy link

As a follow-up: the correct umlaut can be seen in the shibd.log (after setting the log level to DEBUG), so this indicates as well that the transmission between IdP and SP is not the culprit.

@jochenklar
Copy link
Member Author

jochenklar commented Oct 24, 2019

Ok I found a solution based on this: https://shibboleth.1660669.n2.nabble.com/Url-encode-http-header-values-in-Shibboleth-SP-td7629614.html

  1. Add ShibRequestSetting encoding URL to the LocationMatch part of your vhost:

    <LocationMatch /(account|domain|options|projects|questions|tasks|conditions|views)>
        AuthType shibboleth
        require shibboleth
        ShibRequireSession On
        ShibUseHeaders On
        ShibRequestSetting encoding URL
    </LocationMatch>
    
  2. Uninstall django-shibboleth-remoteuser:

    pip uninstall django-shibboleth-remoteuser
    
  3. Install our fork:

    pip install git+https://github.com/rdmorganiser/django-shibboleth-remoteuser
    
  4. Add SHIBBOLETH_UNQUOTE_ATTRIBUTES = True

  5. Restart shibd and apache2

  6. Rejoice!

The attributes are now URL quoted before handing them over to Django and unquoted again by my change to the django-shibboleth-remoteuser lib.

Important: If these changes are made, and you have unicode characters in the username, those users would get a new user object in RDMO, thus loosing all their Project memberships. Hopefully your eppn or similar are ascii only...

@mstuehrenberg
Copy link

Hello, I can confirm that this fix works for our installation - thank you very much!

However, on first try, I've got a server error, since the username (eppn in our case) was already present in the database:

  File "/data/rdmo/rdmo-app/env/lib/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute 
    return self.cursor.execute(sql, params)
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "auth_user_username_key"
DETAIL:  Key (username)=([email protected]) already exists.

After deleting the account via the Django Admin Interface, everything worked fine so far.

@jochenklar
Copy link
Member Author

There was still something wrong with my fix to django-shibboleth-remoteuser, please reinstall:

pip install -I git+https://github.com/rdmorganiser/django-shibboleth-remoteuser

The REMOTE_USER was not unquoted and this conflicts with the username, obtaines in the second step of the shibboleth auth process ...

@mstuehrenberg
Copy link

Ok, thanks for the fast reply. I can confirm that it works for our environment.

@triole
Copy link
Member

triole commented Mar 18, 2020

Closing. Seems to be fixed.

@MyPyDavid
Copy link
Member

MyPyDavid commented Nov 9, 2023

this fix should also be included in the docs or not? https://github.com/rdmorganiser/rdmo-docs-en/blob/master/docs/configuration/authentication/shibboleth.md

and in the pyproject.toml?

[project.optional-dependencies]

@jochenklar
Copy link
Member Author

Packages which are not on pypi do not work in [project.optional-dependencies].

@MyPyDavid
Copy link
Member

we can make our own release on the fork and include it in the pyproject.toml

shibboleth = [
  "django-shibboleth-remoteuser @ https://github.com/rdmorganiser/django-shibboleth-remoteuser/archive/refs/tags/v0.11.zip"
]

... that works

@jochenklar
Copy link
Member Author

ok, good to know. I am not sure if we still need the patch. I will look into it.

@MyPyDavid
Copy link
Member

Yes, we still need the patch for the right encoding.
I will put the rdmorganiser fork in rdmo optional-depencies.
Jochen wants to post a PR to the source at some point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants