Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design Document for SAML2 Implementation as a Service Provider #1928

Closed
6 tasks done
ThibHrrd opened this issue Nov 3, 2021 · 29 comments
Closed
6 tasks done

Design Document for SAML2 Implementation as a Service Provider #1928

ThibHrrd opened this issue Nov 3, 2021 · 29 comments
Labels
rfc A request for comments to discuss and share ideas. stale Feedback from one or more authors is required to proceed.
Milestone

Comments

@ThibHrrd
Copy link

ThibHrrd commented Nov 3, 2021

Preflight checklist

Context and scope

The current options proposed by Kratos to authenticate a user are :

  • Asking the user directly for a password
  • Using OpenID Connect as a form of Single Sign On

Many companies want to be able to use Single Sign On (SSO) to be able to use their own employee database as their single source of truth for authentication of users across all the services they use. This allows to easily manage newcomers in the company and departures as it instantly registers new users on every services used by the company, and remove them when they depart.

A protocol commonly used to provide this feature is SAML2. Currently Kratos does not provide any way to use SAML2 as a way to authenticate a user. Many companies already have a SAML2 ecosystem, and cannot transition easily to OpenID Connect as an alternative protocol of SSO.

Goals and non-goals

In SAML2, there are 2 actors :

  • The Service Provider (SP)
  • The Identity Provider (IdP).

The Identity Provider is the single source of truth for the identities that can able to connect. The Service Provider is any service that wishes to log in using the Identity Provider as their source of user. In an ecosystem using SAML, there is usually 1 IdP and several SPs that turn to the IdP to know which users can login.

The goal is to implement the SAML2 protocol in Kratos as a Service Provider that could be connected to an external Identity Provider. Thanks to the SAML implementation, users who already have a SAML2 ecosystem will be able to use Kratos to provide a consistent login protected session to an application. Users who already have an Identity Provider (like Active Directory) will only have to plug their Identity Provider to Kratos.

However the goal is absolutely not to be able to use Kratos as an Identity Provider. We assume that the users of this feature already use an external IDP.

The design

The go library crewjam/saml handles most of the complexity of the SAML protocol. Using this library adds an external dependency, but SAML is a complex protocol that is a project on its own, and is not that simple to implement in its entirety.

In this library, routes that needs to be protected call the RequireAccount method. When using this method, a SAML request is sent to the IdP to ensure that the user is logged in with the IdP (if he's not, it starts a login flow on the IdP). Once this call is completed, the login flow is completed on the SP side (Kratos here), and the user has 2 sessions : one with the SP, and one with the IdP.

There are 2 possibilities regarding when to use the RequireAccount method:

  • Option 1 : Call the RequireAccount method every time that a route protected by Kratos is accessed (so every time the route /sessions/whoami is called)
  • Option 2 : Call the RequireAccount only once when the user logs in and trust only the Kratos session for access to protected resource (and maybe call RequireAccount on strategic points, such as when updating user settings).

Option 1 has the maximum level of security with 2 sessions that are being checked every time a protected resource is accessed. It ensures that the Kratos session is always up to date with the session on the IdP. It also comes with a performance overhead, since there are now twice the session checks to do.
Option 2 sacrifices a bit of security for better performance. Not making the check on every resource allows a mismatch in sessions, for example if the user has its access to the IdP revoked, he would still be able to access content protected by Kratos as long as his Kratos session does not expire.

One part that is not handled by the crewjam/saml library is the configuration of the metadata file. For SAML to work, the service provider (here Kratos) must provide an XML file which contains all the attributes that the SP need the IdP to provide when logging a user in (such as username, email, name, first name etc.). On the Kratos side, all these attributes correspond to the Identity Schema. That means that there needs to be way to automatically generate this XML SAML metadata from the identity schemas configured in Kratos.

When logging out of the IdP, a logout SAML assertion can be sent by the IdP. This needs to trigger and complete the log out flow on Kratos

About provisioning: When the SP has identity data that is different from what is received in the SAML assertion (eg. first login on the SP, name changed), the SP data needs to be updated. This is called just-in-time (JIT) provisioning. On the Kratos side, this means that whenever a SAML assertion is received, the attributes received need to be compared to what's in Kratos' database.

If there is a difference, it needs to update the DB to be in sync with the data given by the IdP. This synchronization from the IdP raises the question of what to do when a user updates their settings. If nothing is done, the settings will be modified in the Kratos DB. Next time a SAML assertion, there will be a difference with the DB, since settings were just changed, and the JIT provisioning mechanism will revert the change that was done.

Just-in-time provisioning also require the previous RequireAccount call to be called every time a page with personal data is accessed (which can be any page protected by Kratos) in order to be sure that this data is up to date with the IdP. This means that option 1 mentioned earlier has to be selected when enforcing the session check.

To transport messages between the SP and the IDP, SAML supports 3 types of bindings :

  • HTTP Post Bindings
  • HTTP Redirect Bindings
  • HTTP Artifact Bindings

The library that we want to use only supports 2 bindings:

  • HTTP Post Bindings : HTTP POST Binding enables SAML protocol messages to be transmitted within an HTML form by using base64-encoded content.
  • HTTP Redirect Bindings : SAML protocol messages are carried directly in the URL query string of an HTTP GET request.

These two protocols are the most used so the fact that the library only supports these two is not really constraining.

APIs

In our case, we would have to add to Kratos everything needed to act as a SAML Service Provider by adding all the necessary endpoints and associated handlers :

  • /saml/acs (POST) = Receives SAML assertion, processes them, and check signature. (SAML AssertionConsumerService)
  • /saml/metadata (GET) = Return Kratos SAML metadata

The crewjam SAML library in Go will handle these routes.

Data storage

The SAML implementation will NOT add new database or tables. It will just use the existing data models and make sure to synchronize users from the IDP database with those of the Kratos database when adequate.

One important thing to note is that both the SP and the IDP have a database with the same users but with (probably) a different nomenclature. This also means that there is no obligation to have the two databases in the exact same state regarding the users present in them, and the attributes associated with them.

Degree of constraint

Concerning the constraints, the first one is obviously SAML. It is a very complex protocol that is not really maintained anymore. Of course, it has many advantages but also some disadvantages. We must be sure to master this protocol to avoid any security problems.

Then, there is the constraint concerning the crewjam/saml library. At the time where this document is being written, the contribution on this project are infrequent. We don't know if the library is very secure and if all known exploit are properly patched (like XXE exploit for example). Before using it, we must ensure that the library does not include CVEs or any other exploit.

Alternatives considered

A very important part of SSO is provisioning. This topic deals with how the SP's database should (or should not) match the IDP's database.

Provisioning is not obligatory, it is entirely possible to accept that there is a difference between the two database.

In our case, we want to implement this provisioning. To do this, there are two possible options :

  • Just in time provisioning (When the SP receive message from the IDP, it verifies that the user concerned has the same information that those in its database. If it is not the case, it modifies them.)
  • Real time provisioning. (The IDP detects modification of user's information in its database and instantly modify SP user's database by using different protocols)

Example of Just-in-Time provisioning : A user logs into the Identity Provider of his company and wants to access a service provider for the first time. At this moment, the user exists in the IDP database but not yet in the SP database. The IDP then sends a SAML Response to the SP. The SP receives this response and notes that the user indicated is not present in its database. It creates the associated account automatically.

Example of Real-Time provisioning : There is an external system that is constantly syncing the databases of the IdP with the SP. When a user is added to the IdP's database, it is also added the the SP's database. When the user logs in for the first time on the SP, they are already in the SP's database, and no particular action is required.

The real time provisioning require the usage of an external protocol/application. So we don't think to use this method because it is out-of-scope.

We are thinking about using the JITP (Just in time provisioning). This implies that for each message sent by the IDP, Kratos has to check that the user is present in its DB and if the information sent matches. If there is no match, the Kratos DB has to be updated accordingly.

@ThibHrrd ThibHrrd added the rfc A request for comments to discuss and share ideas. label Nov 3, 2021
@psauvage0
Copy link

This is a first draft, if there is something is unclear don't hesitate to point it out

@psauvage0
Copy link

Also there was already an issue (#275) about implementing SAML authentication, but we had to create a new one to use the design document form. Maybe we should copy paste and close, or mark one as duplicate ?

@aeneasr
Copy link
Member

aeneasr commented Nov 7, 2021

I did not have time yet to look over this, I am sorry. This week might also be difficult for me, as it definitely takes some thinking time. I will try my best though and encourage everyone interested to participate :)

@ThibHrrd
Copy link
Author

No problem, feel free to look when you have time. I understand that it it can take a little time to understand because of the comlexity of saml :)

@aeneasr
Copy link
Member

aeneasr commented Nov 13, 2021

Thank you! The proposal sounds really good and is well explained. Regarding sync, maybe we could use JsonNet as we do with OIDC to select which attributes to update?

@ThibHrrd
Copy link
Author

Thanks a lot! And concerning attributes, we had looked at how the attributes were handled with OIDC and thought of using the same system so it's perfect.

Thanks for your feedback :)

@aeneasr
Copy link
Member

aeneasr commented Nov 19, 2021

Cool! In that case, I think this proposal is pretty much complete! I think JITP is the right approach! I can also imagine to disable certain update types (e.g. updating traits) or profile updates in general if a SAML connection is established for an identity.

Does it still make sense to have 2fa or is that solved at the SAML IDP?

@psauvage0
Copy link

Indeed, disabling some updates is a good idea since allowing to change settings that are synced by the IdP would result in those changes being reset as soon as the next SAML assertion is received, which makes for a poor user experience.

I'm not sure what you mean with the 2fa, do you mean disabling kratos's 2fa when using SAML as a connection method ?

@aeneasr
Copy link
Member

aeneasr commented Nov 26, 2021

I'm not sure what you mean with the 2fa, do you mean disabling kratos's 2fa when using SAML as a connection method ?

Yes, that was my question :) So does SAML address 2FA, or would we still offer 2FA for SAML users?

@psauvage0
Copy link

I don't think there is a reason to prevent doing 2FA with SAML. 2FA could be solved on the IdP or on the SP side (or even both if you really wanted, even though I doubt that would actually increase security). Although it's an unusual use case, you could set it up so that you would need to log in with the IdP, then provide 2FA on the SP side. This allows to have some SP that require higher AAL than other SP. For example, it seems Github allows that : https://docs.github.com/en/organizations/granting-access-to-your-organization-with-saml-single-sign-on/about-two-factor-authentication-and-saml-single-sign-on

So I would say that there is no need to disable 2FA with SAML, unless it turns out that it introduces complications that are too much to handle, given how unusual the use case is.

@kszafran
Copy link
Contributor

I believe that's also how the current OIDC implementation works. You can set up both OIDC and a second factor (e.g. TOTP) and when you log in, you'll first go through the OIDC flow, and then provide e.g. a TOTP code to get to AAL2. So unless SAML is fundamentally different, I think it should work the same.

@ThibHrrd
Copy link
Author

ThibHrrd commented Dec 16, 2021

Our technical approach

As you may have seen in this Design Document, I would like to implement SAML in Ory Kratos. We have to approval of @aeneasr so I will try to explain our approach in a more technical way. The purpose of this message is to present our idea in a more technical way and propose to interested people to help us develop all this.

First of all, we have no knowledge of the security of the SAML library we want to use. It is more than necessary to perform a security audit on this library to validate its security.

Completion Progress

  • First Part
  • Second Part
  • Third Part
  • Fourth Part

We thought about 4 main tasks :

First Part

Concerning the first part, the goal is to develop the two main endpoints :

  • /metadata (GET) : Generate the metadata of the SP (Kratos)
  • /acs (POST) : Handle SAML request

Second Part

The second part will deal with the way endpoints work. The library already implements what we want to make these endpoints work. The library allows you to create a metadata file very easily so we will need to incorporate it into Kratos to allow the endpoint /metadata to create them easily. Concerning the endpoint /acs, the Crewjam library allows to receive the SAML requests, to understand them and to treat them accordingly.

Third Part

Now that the endpoints are created, the SAML requests must be processed by Kratos. This means that the endpoint /acs must receive the SAML requests, understand them and translate them into a language that Kratos can understand. More clearly, this endpoint must allow Kratos to support SAML requests and to perform the actions associated with these requests.

It is also in this part that you must check if the session has not expired (according to the duration indicated in option). If it is the case, you have to send a SAML Request to the IDP.

Fourth part

Finally, the last part will concern the configuration. Not everyone wants to use SAML so we will have to use the YAML and Kratos configuration system to adapt it to SAML by adding new options to indicate if we want to use SAML and fill in the endpoints. The objective here is to make the final link between Kratos and SAML and thus be able to create instances of Kratos implementing SAML.

Concerning the options, here are the variables we can modify :

  • Bindings
  • Session duration
  • Level of security (Call the RequireAccount method every time that a route protected by Kratos is accessed or not)
  • Traits update

Conclusion

This is only a first draft and is not intended to represent the final work. It allows you to see the surface of the workload and the action plan.

Do not hesitate to tell me if you are interested in the project and if you want to contribute. All comments are welcome!

@akshay196
Copy link

akshay196 commented Jan 14, 2022

Hi team, I would like to work on this. Let me know in case someone already picked this up.

I have recently worked on a SAML implementation for one of my project and I used crewjam/saml library for that. The project is still in progress but I have tested basic SAML functionality and it is working fine. Thanks to @ThXb35 for the initial design which helped me to understand SAML and how we should go ahead for implementation. :)

I am new to Ory Kratos, so I will begin with Kratos doc and contribution guide. Let me know if there something else that I should read before I start.

@ThibHrrd
Copy link
Author

Hello @akshay196 ! I'm glad you're interested in this project! I've already made a good progress on it but I'm a beginner in Golang and SAML so some help would be welcome.

You can find my progress here https://github.com/ThXb35/kratos/tree/saml

Don't hesitate to contact me, it would be a pleasure to work together on it!

@akshay196
Copy link

@ThXb35 Okay. I will look into your code. Feel free to open draft PR once you are ready, we can collaborate there if anything is needed.

@ThibHrrd
Copy link
Author

This is it @akshay196

#2148

@ThibHrrd
Copy link
Author

Hi @akshay196, if you want to help, I am currently trying to create a method to generate session from SAML Assertion by using Crewjam library. Don't hesitate to tell me if you want us to work on it!

@akshay196
Copy link

akshay196 commented Jan 18, 2022

if you want to help, I am currently trying to create a method to generate session from SAML Assertion by using Crewjam library. Don't hesitate to tell me if you want us to work on it!

@ThXb35 Sure. We are trying to solve There is also the very important problem of converting the session created by the Crewjam/saml library into a Kratos session to remain homogeneous. from Third part of your completion progress, right? I am just trying to understand where we are.

Crewjam/saml library's samlsp.SessionProvider interface is used inside samlsp.Middleware struct. So we can write our own KratosSessionProvider which satisfies samlsp.SessionProvider interface and then let's use it in instantiating Middleware. Hope it works? Any thoughts?

I still need to understand kratos/session, how it works, what it includes etc. I will take a look at it, if we are fine with above approach.

@ThibHrrd
Copy link
Author

ThibHrrd commented Jan 19, 2022

@akshay196 Yes exactly, we are trying to modify a little bit the use of the library to create a Kratos session after receiving a SAML assertion.

I'll explain you a little bit my idea : currently, in crewjam/saml, we have this method which receives a SAML assertion and create a session : https://github.com/crewjam/saml/blob/main/samlsp/middleware.go#L185. This session works with a JWT and therefore not recognized by Kratos.

The goal would be to modify it to allow to create a Kratos session.

I already had a small discussion with the developer of Ory on slack: https://ory-community.slack.com/archives/C012USDT5QQ/p1642431769008200 What he says to me is that one can be inspired by the code of creation of session of OIDC which already exists in the code because the operation approaches SAML.

It will therefore probably be necessary to create three new classes: strategy_login.go strategy_register.go strategy.go like what already exists here for OIDC: https://github.com/ory/kratos/tree/master/selfservice/strategy/oidc.

I like your idea of writing our own KratosSessionProvider that satisfies the samlsp.SessionProvider interface but I'm afraid it's more complicated to integrate

I hope my explanation is clear! :)

@aeneasr aeneasr added this to the v0.10.0-alpha.1 milestone Mar 7, 2022
@siennathesane
Copy link

I just wanted to touch base and see how this is going :)

@ThibHrrd
Copy link
Author

ThibHrrd commented Apr 5, 2022

@mxplusb Everything is going well for the moment, we have a rather functional version and we are currently writing unit tests :)

ThibHrrd added a commit to ovh/kratos that referenced this issue May 6, 2022
Signed-off-by: ThibaultHerard <[email protected]>

Co-authored-by: sebferrer <[email protected]>
Co-authored-by: alexGNX <[email protected]>
ThibHrrd added a commit to ovh/kratos that referenced this issue May 6, 2022
Signed-off-by: ThibaultHerard <[email protected]>

Co-authored-by: sebferrer <[email protected]>
Co-authored-by: alexGNX <[email protected]>
ThibHrrd added a commit to ovh/kratos that referenced this issue May 6, 2022
Signed-off-by: ThibaultHerard <[email protected]>

Co-authored-by: sebferrer <[email protected]>
Co-authored-by: alexGNX <[email protected]>
ThibHrrd added a commit to ovh/kratos that referenced this issue May 6, 2022
Signed-off-by: ThibaultHerard <[email protected]>

Co-authored-by: sebferrer <[email protected]>
Co-authored-by: alexGNX <[email protected]>
alexGNX added a commit to ovh/kratos that referenced this issue May 9, 2022
Signed-off-by: ThibaultHerard <[email protected]>

Co-authored-by: sebferrer <[email protected]>
Co-authored-by: alexGNX <[email protected]>
@kopancek
Copy link

Hi, I see that there is an open PR on a fork ovh#3. How far is it from being merged upstream? Currently considering using ory/kratos, but this is a blocker feature for us.

@psauvage0
Copy link

@kopancek We're currently waiting for a review from the Ory team on #2148. Unfortunately the PR is rather large so they may take some time to review. In the mean time, if you need it really fast, the fork is usable on the branch saml (https://github.com/ovh/kratos/tree/saml). Although you might need to dig around in the code to figure out how to set up the SAML federation.

@mmellison
Copy link

This is awesome work and my team is really excited for this, as it was one of the main challenges with implementing a new auth service based on Kratos. We rely heavily on SAML for our enterprise customers of our platform.

I think a crucial missing part of this design is for the multi-tenant/enterprise use-case where the authentication stack (in this case Kratos) supports dynamic IdP configuration for a sub-set of users based on affiliation. This is typically done by looking at the user's email address domain.

For example, users under the @companya.com domain name will be sent to the IdP for Company A while users with @companyb.com will be sent to the IdP for Company B. Ideally, administrators of Company A and Company B would also be able to manage SAML configuration (perhaps through the admin API) without having to update any YAML configuration files on the backend.

Curious if this has been thought about / accounted for in any follow up design or work? If not, I'm happy to look into having my team contribute here. We've been enjoying working with the Ory stack!

@psauvage0
Copy link

@seglberg Currently, in what we're developing, we can only federate to one IdP. Adding the possibility to configure multiple IdP should not be too hard.

However there is a significant problem with that : Kratos does not and will not handle software multi-tenancy. This is a choice from the Ory team and they do not seem to want to change that (seeing how complicated it would be to add this feature, I understand them). This means that even though the users come from different IdP, they would end up in the same Kratos database, as if they were coming from the same IdP. Admins on this Kratos instance would be able to see every users from every IdP, as if it was coming from a single IdP.

With that limitation in mind, I don't know if the Ory team would approve of this. @aeneasr should be able to get back to you on that.

Regardless of that, the PR to add SAML federation currently spans over 60 files (and we're still missing several things) and is pretty complicated on its own. I would say that the feature you are suggesting should be the object of a new design document, after this PR is finally merged.

@mmellison
Copy link

Thank you for the insightful response!

This means that even though the users come from different IdP, they would end up in the same Kratos database, as if they were coming from the same IdP. Admins on this Kratos instance would be able to see every users from every IdP, as if it was coming from a single IdP.

This is the behavior we're trying to replicate, where we have a single Kratos tenant/realm which powers external authentication for our entire SaaS platform. Our users are similar to that of the GitHub.com or GitLab.com, where despite the IdP, they are still part of our single user directory, on top of which they are given access to teams, projects etc. So ultimately I don't think we're looking to support true multi-tenancy, but just the ability to federate to multiple IdPs based on the user's email address domain.

I'm eagerly watching the existing PR and will be happy to have further discussions once that lands! 😄

@theogravity
Copy link

Lack of SAML support is the reason why we would be unable to use Ory and have to use an alternative auth stack instead as our customer base wants to be able to use SSO + SAML.

It's really cool to see the community coming together to work on support for it.

@timdrysdale
Copy link

This means that even though the users come from different IdP, they would end up in the same Kratos database, as if they were coming from the same IdP. Admins on this Kratos instance would be able to see every users from every IdP, as if it was coming from a single IdP.

This is the behavior we're trying to replicate, where we have a single Kratos tenant/realm which powers external authentication for our entire SaaS platform.

Delighted to see this SAML integration underway - well done to the team on getting the PR ready in recent days.

FWIW, in terms of multiple IdP, I have a similar use case to the above - acting as an SP to the UK University shibboleth federation. This will require interacting with a different IdP at each university, for which offloading the discovery process is probably the path of least resistance - at the cost of requiring a DiscoveryResponse endpoint. There is also seamless access as a possibility e.g. see thiss.io.

I suppose an alternative to supporting multiple SAML IdP is to have a single Ory Kratos instance with above PR that can do a single SAML IdP and use OAuth 2.0 via hydra for sharing resources between them all, but that adds a barrier to on-boarding a new institution/organisation and adds complexity around managing users across multiple Kratos instances, and seems out of step with the current one-SP-to-many-IdP model in use for SAML already.

Happy to discuss/maybe contribute within bandwidth limits if we do indeed go with Ory - just a team of one on the backend but would be good to be able to stick with a golang stack and get the other cool features in wider Ory like ACL etc.

@github-actions
Copy link

Hello contributors!

I am marking this issue as stale as it has not received any engagement from the community or maintainers for a year. That does not imply that the issue has no merit! If you feel strongly about this issue

  • open a PR referencing and resolving the issue;
  • leave a comment on it and discuss ideas on how you could contribute towards resolving it;
  • leave a comment and describe in detail why this issue is critical for your use case;
  • open a new issue with updated details and a plan for resolving the issue.

Throughout its lifetime, Ory has received over 10.000 issues and PRs. To sustain that growth, we need to prioritize and focus on issues that are important to the community. A good indication of importance, and thus priority, is activity on a topic.

Unfortunately, burnout has become a topic of concern amongst open-source projects.

It can lead to severe personal and health issues as well as opening catastrophic attack vectors.

The motivation for this automation is to help prioritize issues in the backlog and not ignore, reject, or belittle anyone.

If this issue was marked as stale erroneously you can exempt it by adding the backlog label, assigning someone, or setting a milestone for it.

Thank you for your understanding and to anyone who participated in the conversation! And as written above, please do participate in the conversation if this topic is important to you!

Thank you 🙏✌️

@github-actions github-actions bot added the stale Feedback from one or more authors is required to proceed. label May 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rfc A request for comments to discuss and share ideas. stale Feedback from one or more authors is required to proceed.
Projects
None yet
Development

No branches or pull requests

10 participants