Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harden OIDC migration and make optional #2170

Merged
merged 14 commits into from
Nov 23, 2024

Conversation

kradalby
Copy link
Collaborator

@kradalby kradalby commented Oct 4, 2024

This commit hardens the migration part of the OIDC from the old username based approach to the new sub based approach and makes it possible for the operator to opt out entirely.

Fixes #1990

@kradalby kradalby mentioned this pull request Oct 4, 2024
@kradalby kradalby marked this pull request as ready for review October 4, 2024 10:31
@kradalby
Copy link
Collaborator Author

kradalby commented Oct 4, 2024

Followup on @micolous comment on #2020 (comment)

@kradalby kradalby force-pushed the kradalby/oidc-takeover branch from 54984c8 to e11b4b7 Compare October 4, 2024 12:27
Copy link

@micolous micolous left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting onto this so quickly!

It looks like this addresses the concerns I previously raised, but I think that username collisions are going to be a problem with this change.

On further inspection, I've noticed a couple of other issues with #2020 where user migration might not work correctly and user renames might not work correctly.

The more I think about this, the more that I think the pre-#2020 Headscale OIDC behaviour might be impossible (or at least, extremely difficult) to automatically, securely and reliably recover from for an existing installation.

There may be IdP-specific ways for an installation to recover (involving manually querying and editing databases), but not much within the scope of the OIDC protocol and Headscale's pre-#2020 OIDC schema.

I still think that switching to sub is the right thing to do, even if it causes some short term pain.

CHANGELOG.md Outdated
@@ -9,6 +9,8 @@
- Redo OpenID Connect configuration [#2020](https://github.com/juanfont/headscale/pull/2020)
- `strip_email_domain` has been removed, domain is _always_ part of the username for OIDC.
Copy link

@micolous micolous Oct 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct, because OIDCClaims.Username (as of #2020) maps to the preferred_username field.

This could contain several things, depending on the IdP (and potentially, how it is configured):

  • A short username (eg: user) which matches the user part of their email address, which is what post-Redo OIDC configuration #2020 Headscale assumes
  • A short username (eg: u1234) which doesn't match their email address (eg: a user signing in to an organisation's IdP which has their personal email address on file)
  • An email address (eg: [email protected]), which may or may not match the email field
  • An SPN (eg: [email protected], \ad.example.com\user, \AzureAD\[email protected])
  • A phone number (noted in Entra docs)

The previous behaviour was to either use the "user" part of the email address (with the default strip_email_domain = true) or use the whole email address (with strip_email_domain = false).

On an installation which explicitly set strip_email_domain = false, the mapping would be easy and pretty reliable, provided the IdP could be trusted to only provide verified email addresses (which is not always true).

However, with the default strip_email_domain = true, the mapping would have treated [email protected] and [email protected] as the same user -- another account takeover vector.

Depending on how much Headscale can trust the User.Email field (is it possible for users to change this without verification?), it might be possible to use that to map users without worrying about what strip_email_domain was set to, or needing to restore that flag.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will need to think and research this a bit more. The main goal is that I want to go towards using "email style" for users in the Policy, so either the email as provided, or maybe just the "preferred_username" with @ added to the end.

Copy link

@micolous micolous Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't modify the preferred_username in any way – it's intended to be used as-is.

As for policy preferences, Kanidm defaults to using an SPN that looks like an email address (eg: [email protected]), but is probably not an actual email address. This can be changed on an application configuration level with kanidm system oauth2 prefer-short-username $client_id, which makes Kanidm return the Person's name field as a preferred_username (eg: user) 1.

Footnotes

  1. Kanidm's name field is similar to a "username" field. There are also displayname and legalname fields which are different again.

@@ -274,6 +275,7 @@ func LoadConfig(path string, isFile bool) error {
viper.SetDefault("oidc.only_start_if_oidc_is_available", true)
viper.SetDefault("oidc.expiry", "180d")
viper.SetDefault("oidc.use_expiry_from_token", false)
viper.SetDefault("oidc.map_legacy_users", true)
Copy link

@micolous micolous Oct 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make map_legacy_users = false the default, so that migration is opt-in:

  • map_legacy_users = false is the only secure option, though it'd break any account created pre-0.24.0.

    While this PR only allows a user to migrate if the account has no recorded sub, there's still a window for account takeovers between an administrator upgrading Headscale and those users logging in.

  • Requiring a Headscale administrator to explicitly set it to true means that if/when the time comes to retire that option, Headscale could issue advance warnings or make it a hard error to include it, so administrators can discover if their installation depended on it.

Making this opt-in means that administrators need to explicitly acknowledge the issue, and could mitigate the risks (eg: defining a limited window for automated migrations, or manually updating Headscale's database with known-correct values).

When strip_email_domain was removed, the default was true, and generally the only reason it'd be included is an administrator setting it to false. #2020 removed the option entirely, and assumed the new behaviour was as if it was effectively set to true (though depending on the IdP and how it is configured, it could be as if it were set to false, or completely wrong).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I agree that this feature would be problematic, and the best might be to do a clean cut (but that requires us getting it "right" this time), I am thinking of an alternative approach:

No automatic migration, remove this option and all logins will create new users.
Either:

  • a new "merge" command, moving all nodes (and preauthkeys) from one user to another (so the admin can script the merger them selves, knowing the variables of their environment)
  • Not add a new command, but show an example of how this can be done with the already existing headscale nodes move command.

Feels like this will be messy either way, but this approach is more explicit than implicit.

Copy link

@micolous micolous Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a fan of an explicit approach, but mindful of the overhead if it needs multi-party (administrator + user) intervention.

No automatic migration, remove this option and all logins will create new users. Either:

  • a new "merge" command, moving all nodes (and preauthkeys) from one user to another (so the admin can script the merger them selves, knowing the variables of their environment)

  • Not add a new command, but show an example of how this can be done with the already existing headscale nodes move command.

I think with both of those approaches, the process would be something like:

  1. Administrator upgrades Headscale, and announces to users that things will break.

    This is effectively the start of a service outage for all users.

  2. User attempts to sign in to Headscale with OIDC.

  3. Headscale creates a new account, and the user discovers that they are locked out of everything.

    This is when the user discovers the outage, if they didn't get the administrator's announcement.

  4. Administrator (somehow?) confirms that the new user account created with OIDC belongs to some old account.

  5. Administrator runs a command to merge the new account to their old account.

    This is when the outage is resolved for the user.

There's some back and forth with the administrator needing to do things for the user – so there's a potential for delays and overheads.

Alternative

Another way to do this is to have users authenticate without OIDC (ie: with their Headscale password) if an account already exists for that email address.

That way the linking can happen automatically, but it is checked against "something they know"... rather than trusting the email provided by the IdP implicitly or relying on the contents of preferred_username.

Then the migration process for an existing user would be:

  1. Administrator upgrades Headscale, opts in to auto-migration.

    This is effectively the start of a service outage for all users.

  2. User attempts to sign in to Headscale with OIDC.

  3. Headscale can't find any account with that sub, so falls back to email address matching. It finds an account, which has no sub recorded.

  4. Headscale prompts the user to either provide the Headscale password for the existing account (to re-establish linking), or to create a new account.

    Note: it may be best to always to prompt to setup a new linking, even if the account doesn't exist or there is an existing linking, to defend against account enumeration / information leakage.

    If the account exists, there is a linking, and the user provides the correct password, then the linking attempt could be rejected.

  5. User enters the Headscale password for the existing account.

  6. Headscale records the provided sub, and logs the user in normally.

    This is when the outage is resolved for the user.

  7. Going forward, user can sign in with OIDC just as they always did, and is matched on sub. If details changed, they're automatically updated on log-in.

Under the hood, Headscale could make a new (temporary) user account when signing in with an unknown sub, but put it in a "pending creation" state which allows it to be either associated with an existing account with the same email address (after proving knowledge of the password) or promoted into a new "full" account.

The benefit of that is it's entirely self-service. But it couldn't handle things like:

  • if an account does not have a Headscale password set
  • if multiple accounts are associated with the same email address
  • if a user's email address changed between the last time they logged in pre-upgrade, and the next time they next logged in post-upgrade (unless they were allowed to link it to an unlinked account with a different email address, and they could prove control with Headscale password).

An administrator could also do the migration manually themselves:

  1. Administrator dumps a list of users with their validated email address and sub from the IdP.

  2. Administrator dumps a list of users from Headscale.

  3. Administrator creates a list of known valid linkages for both sides. If there's any unlinked entries, they can be followed up separately, and they can also handle things like renamed users.

  4. Administrator upgrades Headscale, and does not opt in to auto-migration.

    This is effectively the start of a service outage for all users.

  5. Administrator runs some command like headscale user ${username} set-oidc-identifier "${issuer}/${sub}" for each validated user.

    This is when the outage is resolved.

That manual method would work regardless of whether the users have a Headscale password set.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh, I just realised I put in bunch of commentary about a "master password" which doesn't apply to Headscale. That's a Bitwarden/Vaultwarden thing... and I'm getting my apps mixed up. 🙃

I've updated my previous comment accordingly, but I suspect having a Headscale password may not be a guarantee.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh (again), I've now also realised that #2020 added the email address field in the first place.

So the only way you could reliably auto-migrate is if strip_email_domain = false.

// to be updated with the new OIDC identifier inexplicitly which might be the cause of an
// account takeover.
if user.ProviderIdentifier != "" {
log.Info().Str("username", claims.Username).Str("sub", claims.Sub).Msg("user found by username, but has provider identifier, creating new user.")
Copy link

@micolous micolous Oct 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be marked as a unique key constraint in the ORM:

// Username for the user, is used if email is empty
// Should not be used, please use Username().
Name string `gorm:"unique"`

Per OpenID Connect Basic Client Implementer's Guide 1.0, Draft 47, Section 2.5.3: Claim Stability and Uniqueness (emphasis added):

The sub (subject) and iss (issuer) Claims, used together, are the only Claims that an RP can rely upon as a stable identifier for the End-User, since the sub Claim MUST be locally unique and never reassigned within the Issuer for a particular End-User, as described in Section 2.2. Therefore, the only guaranteed unique identifier for a given End-User is the combination of the iss Claim and the sub Claim.

All other Claims carry no such guarantees across different issuers in terms of stability over time or uniqueness across users, and Issuers are permitted to apply local restrictions and policies. For instance, an Issuer MAY re-use an email Claim Value across different End-Users at different points in time, and the claimed email address for a given End-User MAY change over time. Therefore, other Claims such as email, phone_number, and preferred_username and MUST NOT be used as unique identifiers for the End-User.

It looks like Headscale might use User.Name as a foreign key in other parts of the application, eg:

type PreAuthKey struct {
ID uint64 `gorm:"primary_key"`
Key string
UserID uint
User User `gorm:"constraint:OnDelete:CASCADE;"`
Reusable bool
Ephemeral bool `gorm:"default:false"`
Used bool `gorm:"default:false"`
Tags []string `gorm:"serializer:json"`
CreatedAt *time.Time
Expiration *time.Time
}
func (key *PreAuthKey) Proto() *v1.PreAuthKey {
protoKey := v1.PreAuthKey{
User: key.User.Name,

I recognise fixing this may be difficult; so maybe an interim workaround should be that Headscale does not create a new account if the username clashes?

if user.ProviderIdentifier != "" {
log.Info().Str("username", claims.Username).Str("sub", claims.Sub).Msg("user found by username, but has provider identifier, creating new user.")
user = &types.User{}
}
}

user.FromClaim(claims)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per above, are there any relations which could be broken if user.Name changed?

(I'm not familiar with gorm; so it might handle it for you.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GORM stuff is based on an underlying ID, so that should be fine.

if user == nil {
// This branch will presist for a number of versions after the OIDC migration and
// then be removed following a deprecation.
if a.cfg.MapLegacyUsers && user == nil {
user, err = a.db.GetUserByName(claims.Username)
Copy link

@micolous micolous Oct 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this some more, and I don't think this migration is reliable either, because preferred_username might not be the same as the "user" part of an email address.

See my comments on CHANGELOG.md.

Copy link

@aalmenar aalmenar Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have cases where username or preferred_username are not the same as user part of email address. But also user aren't able to change their username or email address. So i believe for the migration to have less issues, if strip_email_domain is true, then the user part should be used for the migration.

@kradalby
Copy link
Collaborator Author

kradalby commented Oct 7, 2024

@micolous I highly appreciate that you have spent time helping me research and get this right, I need some time before I can sit down properly, but I will try to start researching and think about solutions, I've answered some of the parts where I have some ideas already.

@micolous
Copy link

micolous commented Oct 8, 2024

I've been poking at this in another branch, based on this PR: https://github.com/micolous/headscale/tree/oidc-takeover-suggestions:

Consider that a "work in progress", feel free to incorporate things into this PR. 😄

I've now realised a few more problems which make some of my suggestions (including in that branch) infeasible:

  • Existing OIDC users may not have a password set, so we can't verify them based on "something they know".
  • I wrongly assumed the email address field was already there, but Redo OIDC configuration #2020 added it, so we can't use that for automatic mapping (though I attempted it in 27856fb)
  • This also means we can't add a uniqueness constraint to User.Email, because it is unset for all users created before Redo OIDC configuration #2020.
  • Before Redo OIDC configuration #2020, if you changed the oidc.strip_email_domain setting, it would invalidate all your existing user accounts, so we can't use that to migrate accounts.

I think the auto-migration will have to be:

  1. Reintroduce the strip_email_domain setting, with the same logic as before Redo OIDC configuration #2020.
  2. If map_legacy_users: false (the default), panic if the strip_email_domain option is set explicitly (we don't want to use that if not mapping!)
  3. new check: Require allowed_domains contain exactly 1 entry if strip_email_domain: true (the best we can do to lock this down)
  4. new check: Require the email_verified OIDC claim to be true (it's not much...)
  5. If map_legacy_users: true and there's no account with the sub, attempt to get user by username in the old way (taking into account strip_email_domain and ignoring accounts where sub is already recorded).
  6. After migration, update the account to use the preferred_username

There's still ways this could fail or be exploited, but it's at least something.

@kradalby
Copy link
Collaborator Author

kradalby commented Oct 8, 2024

I'm replying a bit before I have managed to grasp everything as I am not well versed in these topics but I am trying to learn fast, a couple of notes and some comments:

I've been poking at this in another branch, based on this PR: micolous/headscale@oidc-takeover-suggestions

This pr and the proposed migration, with reintroducing strip_email_domain sounds like the most sensible way right now.
I dont want neither the migration code, option nor strip email around for long, so I propose that when we get this sorted, we release 0.24.0 with this code, then everyone who uses OIDC should use this release to migrate.
Then main will remove all this code, and 0.25.0 will contain no migration code and represent the state we want to be the new "base" for OIDC, that we then can build upon.

if an account does not have a Headscale password set

Headscale does not have any passwords, I am not sure if I am missing something from #2170 (comment)

I wouldn't modify the preferred_username in any way – it's intended to be used as-is.

As for policy preferences, Kanidm defaults to using an SPN that looks like an email address (eg: [email protected]), but is probably not an actual email address. This can be changed on an application configuration level with kanidm system oauth2 prefer-short-username $client_id, which makes Kanidm return the Person's name field as a preferred_username (eg: user) 1.

That is find I think, an important goal with this work tho (eliminating strip_email_domain) is to ensure all users "DisplayName" which is used in headscale nodes list for owners, in headscale users and in the Policy file must include an @. This is for future ACL/Policy work.

I dont have any particular opinion if it should be:

Email makes the most sense to me.

@kradalby
Copy link
Collaborator Author

kradalby commented Oct 9, 2024

Another thought on the level of effort to put into this migration,

It is currently "quite broken" or at least insecure as we have established since it is vulnerable to these kind of email/username takeover attacks.

Based on this, I think we can make the argument that by using the current setup, you are equally at risk compared to a "inperfect" migration strategy. I feel like we can make the argument for using the Username based migration given that admins already trust what is fed into the username currently, and that should be sufficient to trust it to do the migration.
With the addition that strip_email_domain has to be reintroduced so it can be matched exactly like we currently do.

It sounds to me like the ideal scenario is to get this fixed, and maybe the best is to keep the simple migration, then tell all your users to login again and disable the migration path after it has been completed.

@micolous
Copy link

micolous commented Oct 10, 2024

That is find I think, an important goal with this work tho (eliminating strip_email_domain) is to ensure all users "DisplayName" which is used in headscale nodes list for owners, in headscale users and in the Policy file must include an @. This is for future ACL/Policy work.

The claim stability and uniqueness constraints apply here too. If an OIDC account is renamed or changes email address in the IdP, Headscale needs to be able to track that and continue to apply the same policies.

There's nothing stopping Headscale recording or consuming attributes like email address and display name in appropriate contexts (like user lists). However, it should not rely on those attributes to be stable, or use them to apply access controls.

What does Tailscale do?

According to Tailscale's ACL policy docs and grant docs, users can be referenced by:

I couldn't find anywhere in the docs that mentions if you can change your email address on an account, or the name of a passkey.

However, you can rename your GitHub account. I haven't tested whether someone taking over a renamed account could be used as an account take-over vector, or if that would just create a new Tailscale account (keyed by sub) and then apply the wrong ACLs/grants to anything using the old name (much like many other references in GitHub).

Tailscale's Entra SCIM sync and Okta SCIM provisioning docs suggest that Tailscale maps SSO users by email address (rather than sub).

Other parts of Tailscale's docs state that SCIM groups can be referenced by name or ID. They also state that if you rename a group referenced by name, then those references in ACLs or grants will break. This suggests that they're not internally converted to IDs (though this may not be possible, depending on how this is all provisioned).

All of this seems to suggest that Tailscale does not handle user (or group) renames over OIDC or SCIM correctly, though I don't have a Tailscale Enterprise account to verify this.

What should Headscale do instead?

Headscale should support referencing:

  • Users by local user ID (which seems to be a uint64), eg: 1234@headscale
  • OIDC users by iss + sub, eg: idm.example.com/97fb4c49-c3b2-4e13-a8b9-cbe4d6525dc5@oidc

That way it wouldn't matter if an OIDC user's account was renamed, changed email address, or even had an email address set – the iss + sub would remain the same.

Also, Headscale would need to:

  • parse user references ending in a reserved identifier (eg: @headscale or @oidc) only with that provider, rather than checking it as an username or email address
  • prevent setting a username or email address to a reserved identifier

It could also convert any references to an email address in ACLs/policies to a local user ID or iss+sub, and then emit policy which only contains one of those. It may be better to use the external identifier where available, because then it may be easier to use with configuration management tools that'll be working with IdP identifiers.

Based on this, I think we can make the argument that by using the current setup, you are equally at risk compared to a "inperfect" migration strategy. I feel like we can make the argument for using the Username based migration given that admins already trust what is fed into the username currently, and that should be sufficient to trust it to do the migration.

Yeah, I don't like it either... but I don't see any other way to let you automatically migrate a pre-#2020 auth database to sub without re-introducing the same security risks that were there before.

It sounds to me like the ideal scenario is to get this fixed, and maybe the best is to keep the simple migration, then tell all your users to login again and disable the migration path after it has been completed.

It's important to clearly communicate what's happened, give administrators options for remediation, and they can make a choice of what to do next based on their appetite for risk (whether they attempt to auto-migrate, or they want to audit and patch the user database to fix the problems).

If you have a single-domain configuration (or multi-domain with strip_email_domain: false), and your IdP does not allow user-triggered email address changes, and no users have changed email address, then the bug probably doesn't affect you.

We can't know all of these things for sure, so the "do nothing" option should fail "safe" – that is, no migration at all, only using sub. That way, the bug is definitely fixed, and an administrator must take explicit action to either potentially re-introduce a vulnerable configuration, or to manually migrate users using known-safe data.

Or maybe there could be an oidc.configuration_version: 2 attribute that you need to set to be able to use OIDC in the new version, and it crashes on start-up (with a reference to the docs about this issue) if that's not set and OIDC support is enabled.

There should be some way for an admin to know which users are pending migration, and how many. That way they know when they can safely turn it off, see how remediation is tracking and who they need to follow up.

@kradalby
Copy link
Collaborator Author

ok, I this is how I will approach this based on the information out of hand. I will implement this will a tradeoff between usability, putting constraints on the OIDC operator and documentation:

OIDC implementation

  • The identifier used in the database to identify a user will be iss + sub.
  • We will make email required from OIDC
  • Email will not be updated beyond the first login, this will be documented
  • Email will have to be unique, this will be documented
  • An admin command for manually updating the Email will be implemented
  • pref username, display name, picture etc will be updated on all logins

In the Policy, email will be the identifier for a user. For non OIDC implementations, it will be username@

Migration

To get away from the current insecure implementation, we will implement the migration that works as implemented in this PR:

  • Re-add strip_email_domain so we can correctly match the usernames mirroring the current behaviour
  • If the user has already been migrated, create a new user.

I deem this acceptable as a migration because it is no more insecure than the current implementation, and the trade off of getting off this implementation is more important than making it perfect.

It will be possible to opt-out of the migration, but it will be default true for one version (0.24.0),
then it will be default false for 0.25.0, and removed completely in 0.26.0 (including the strip email functionality).

New columns can be added to headscale users list with Email, which will indicated if a user has been migrated so an Admin can track it.

We will document a recommendation of turning off the migration when all users are migrated.


Considering the high level of configurability of OIDC, providers etc, it is not an option for us to implement the vast level of configurability. It is not an option to sacrifice the usability of the policy, and headscale by forcing users to use the full identifier in Policy. We are therefore happy with requiring certain behaviour from the OIDC.
If an OIDC cannot be configured to provide an email, then those OIDCs will not be supported.

I think this tradeoff will cover the concerns raised:

  • We use a stable ID which will solves the issue of account take over
  • To be able to use the email internally, we lock it to the stable ID
  • We empower administrators to override this manually, allowing them to do their diligence

We are balancing the user impact and user experience and allow administrators to manage their own risk appetite.
It balance the timeline by choosing a simpler migration that is attainable now and will start getting people over to the correct solution.

@kradalby
Copy link
Collaborator Author

Does this sound like it should address the concerns @micolous?

@micolous
Copy link

Oops, I accidentally pressed "Comment" before I was finished. I will try again soon. 😄

@micolous
Copy link

micolous commented Oct 15, 2024

Comments inline, but I've re-ordered some parts to make it a little easier to follow.

OIDC implementation

  • The identifier used in the database to identify a user will be iss + sub.
  • pref username, display name, picture etc will be updated on all logins

Agreed.

  • An admin command for manually updating the Email will be implemented

This is a good idea to support non-OIDC logins.

  • We will make email required from OIDC

I couldn't find anywhere that Headscale sends emails, so it should not require an email address, except for migrations.

Non-OIDC users are already identified by username@.

  • Email will not be updated beyond the first login, this will be documented

The whole point of having an identity provider is that it is a single source of truth, and that all attributes are updated automatically in all applications, eliminating the administrative burden of maintaining identities across many applications for many users.

  • Email will have to be unique, this will be documented

This is akin to requiring email be a stable and unique claim – exactly the issue we're trying to avoid.

The fundamental issue is that a person's email address can always change. OIDC at least gives a way for an identity provider to signal to an application that it is the same user (sub) regardless, and it's up to an application to handle that correctly.

An email address could change for many reasons:

  • Someone moving to another business unit or department which has its own domain (or subdomain) for its users (but still rolls up through an organisation-wide identity management platform)
  • An IdP could automatically mask email address, so that each (user + service) gets its own email address (eg: Apple does this). This could even change over time, such as to responding to a specific service's user list being leaked publicly.
  • Someone's email address was based on their name, and they've changed their name. Continuing to use someone's old name, even indirectly via an email address, can be impolite, offensive, or even life threatening – depending on the circumstances.

Preventing email addreesses from being updated (or making it difficult) doesn't make the issue go away – if anything, it makes things worse.

Considering the high level of configurability of OIDC, providers etc, it is not an option for us to implement the vast level of configurability. It is not an option to sacrifice the usability of the policy, and headscale by forcing users to use the full identifier in Policy. We are therefore happy with requiring certain behaviour from the OIDC.

I acknowledge that using a sub isn't the friendliest thing to use in policy – but it's also the only thing that's actually stable, which means it's also the only thing that's actually secure.

I also acknowledge that this stems from a Tailscale design flaw, but there is some latitude for Headscale to correct this issue, as it already does for the local case (eg: username@).

If non-OIDC users can be renamed, it'll have to be handled there too anyway.

Headscale could support unstable identifiers (like email or preferred_username) and translate them into stable ones based on its user database, throwing an error if none match or there is more than one match.

This would make an email or preferred_username change automatically propagate to any existing policy without the user having to be aware of it, and uses a familiar paradigm in a fairly-safe way.

Local filesystems on UNIX-like systems already do something similar – if you chown user file, file's owner will be set to user's UID at the time of execution, not their username. If user is later renamed, the UID stays the same, and the file's effective ownership will be retained. When you run ls -l file, it translates the UID back to user's new username. Only things which reference user by username (of which there are many) break.

My understanding is that Headscale doesn't even support these policies today. While it's important to acknowledge these potential later issues, this can be a later problem. 😄

Migration

To get away from the current insecure implementation, we will implement the migration that works as implemented in this PR:

  • Re-add strip_email_domain so we can correctly match the usernames mirroring the current behaviour

Sounds good.

  • If the user has already been migrated, create a new user.

Assuming that on first login for migrating users, the user's full email address is recorded in Headscale's user database (ie: fixing strip_email_domain: true); the way we know an account is already migrated is if:

  • An account has already been migrated with the same full email address in Headscale's user database.

  • strip_email_domain: true, and an account with the same "user" part has already been migrated (eg: [email protected] and [email protected])

If both cases are handled as "create new account":

  • If a user already exists with the same full email address: then it suggests two IdP accounts could have the same email address now, or an email may be reassigned to a different user at some point in time.

  • A mis-matched domain with strip_email_domain: true suggests that two users could have been sharing the same Headscale account pre-migration.

This could be flagged for the administrator to take action.

Automatically locking both accounts could be used as a denial of service vector, if Headscale is configured to use a public identity provider which erroneously allows public logins.

That all said, creating a new user is still a reasonable answer. It just contradicts the prior stated constraints.

I deem this acceptable as a migration because it is no more insecure than the current implementation, and the trade off of getting off this implementation is more important than making it perfect.

I agree with you it's important to move forward in some way on this, and that there is no perfect solution.

It's important that an administrator can see what the automated migration process has done, any exceptional circumstances it has detected, and manually reassign things if needed.

It will be possible to opt-out of the migration, but it will be default true for one version (0.24.0), then it will be default false for 0.25.0, and removed completely in 0.26.0 (including the strip email functionality).

We will document a recommendation of turning off the migration when all users are migrated.

What happens if an administrator skips v0.24.0 or v0.25.0, and there are other critical bug/security fixes in the interim?

As much as I dislike having the option around any longer than needed, I think migration should be decoupled from point releases, and be "opt in":

  • The risk with "opt in" migration is that it'll block OIDC logins for existing users until the administrator changes a flag.

    But this means that an administrator is definitely aware a migration is taking place, and they are empowered to choose whether they want to migrate using an automatic-but-imperfect process (probably majority) or manually with verified data sources (probably a minority).

    It also means that new installations don't require email addresses at all, and won't have this issue from day 1.

  • The risk with "opt out" migration is that it's possible for the migration to take place without the administrator even being aware of it.

    Then "suddenly" some point release later, any user who did not log in while the flag was default-enabled will be locked out.

"Opt in" is the only option which empowers administrators to make a choice, because "opt out" is not consent.

New columns can be added to headscale users list with Email, which will indicated if a user has been migrated so an Admin can track it.

Sounds good.

@micolous
Copy link

So in summary, I'd suggest something similar to what you had, with a few tweaks:

OIDC implementation

  • The identifier used in the database to identify an OIDC user will be iss + sub.
    • This is expected to be a stable and unique identifier, per OIDC specs.
  • If email_verified != true, then Headscale disregards any email claim.
  • The email claim (and scope):
    • Is required once to migrate an old OIDC user to iss + sub
      • Email addresses are expected to be stable and unique for unmigrated accounts only.
      • If strip_email_address: true, the "user" part must also be unique until the account is migrated.
    • Otherwise, it's optional, and not required to be unique.
      • ie: it should be possible to run Headscale without an email scope.
    • If the email claim is present, it will always update Headscale's user database when changed.

Other claims should be update Headscale's database when changed, and are not treated as unique:

  • (Required) preferred_username
  • (Optional) name, the user's preferred name for display in UI elements
  • (Optional) picture, the user's preferred avatar for display in UI elements

Policy implementation (for later)

  • stable unique user identifiers (eg: Headscale user ID, OIDC sub + iss):
    • can always be used in policies
  • unstable user identifiers (eg: email, preferred_username):
    • are automatically resolved to a stable user identifier on use
    • if an identifier is ambiguous (does not refer to exactly 1 user), Headscale will reject it

Migrations

Same as you had it, but only "opt in" migration (for reasons mentioned in my previous comment).


This implementation and migration strategy will:

  • Use a stable user identifier, which solves the issue of account take-over (after migration).
  • Makes new installations using OIDC secure by default.
  • Makes migration no less secure than it is today, while keeping implementation fairly simple.
  • Requires administrators with insecure setups to explicitly acknowledge the issue, eliminating any surprises.
  • Allows administrators to perform the migration on their own schedule and risk profile, decoupled from the project's release cadence and any patching process.
  • Allows administrators to use their OIDC IdP as a single source of truth for all user information.
  • Uses OIDC claims consistent with well-established OIDC security practices wherever possible1, while mitigating Tailscale's design flaws, eliminating the need for any IdP-level constraints for secure usage.
  • Removes the hard requirement to use email addresses to use OIDC1, while still allowing it to be used to make things easier.

Footnotes

  1. when auto-migration is disabled 2

@kradalby kradalby force-pushed the kradalby/oidc-takeover branch from bcb3e02 to 462d063 Compare October 15, 2024 16:46
@kradalby
Copy link
Collaborator Author

@micolous I've pushed the new migration, probably need some tidying but please take a look.

#2170 (comment), but only "opt in" migration (for reasons mentioned in my previous comment).

There will have to be one version with it opt out so people who dont read changelogs and then have a bunch of users created run down the issue tracker. A compromise could be a check at start up which sees if all users have oidc identity and automatically turns it off.

I will look at the Policy resolving after i get back from a coffee, it needs to go in at the same time I believe.

@kradalby kradalby force-pushed the kradalby/oidc-takeover branch 5 times, most recently from 94b82d5 to cc52dff Compare October 21, 2024 22:34
@kradalby
Copy link
Collaborator Author

@micolous Can you take a look now?

I think I have implemented the migration and info as discussed in this PR.

Then I have split "resolve" users from ACL into #2205 and I would appreciate if you can take a look that it covers the concerns.

@kradalby kradalby requested a review from micolous October 22, 2024 04:35
@micolous
Copy link

micolous commented Oct 23, 2024

Thinking about things some more, enabling migration should have no effect on the security of new Headscale installations where every user has an iss and sub recorded. I like the idea of making you choose – but I can also see the support angle for this.

I still need to have a look at the code, but I've attempted to reword the change log (and ended up spending quite a long time on it). What's there ("domain is always part of the username for OIDC") is not always true, and some bits are misleading.

I think writing the change log first will make sure we agree on the goal. 😄

I suspect #2205 will block some of what goes into this changelog – but I'll focus on OIDC for now.


Security fix: OIDC changes in Headscale 0.24.0

Headscale v0.23.0 and earlier identified OIDC users by the "username" part of their email address (when strip_email_domain: true, the default) or whole email address (when strip_email_domain: false).

Depending on how Headscale and your Identity Provider (IdP) were configured, only using the email claim could allow a malicious user with an IdP account to take over another Headscale user's account, even when strip_email_domain: false.

This would also cause a user to lose access to their Headscale account if they changed their email address.

Headscale v0.24.0 now identifies OIDC users by the iss and sub claims. These are guaranteed by the OIDC specification to be stable and unique, even if a user changes email address. A well-designed IdP will typically set sub to an opaque identifier like a UUID or numeric ID, which has no relation to the user's name or email address.

This issue only affects Headscale installations which authenticate with OIDC.

Headscale v0.24.0 and later will also automatically update profile fields with OIDC data on login. This means that users can change those details in your IdP, and have it populate to Headscale automatically the next time they log in. However, this may affect the way you reference users in policies.

Migrating existing installations

Headscale v0.23.0 and earlier never recorded the iss and sub fields, so all legacy (existing) OIDC accounts from need to be migrated to be properly secured.

Headscale v0.24.0 has an automatic migration feature, which is enabled by default (map_legacy_users: true). This will be disabled by default in a future version of Headscale – any unmigrated users will get new accounts.

Headscale v0.24.0 will ignore any email claim if the IdP does not provide an email_verified claim set to true. What "verified" actually means is contextually dependent – Headscale uses it as a signal that the contents of the email claim is reasonably trustworthy.

Headscale v0.23.0 and earlier never checked the email_verified claim. This means even if an IdP explicitly indicated to Headscale that its email claim was untrustworthy, Headscale would have still accepted it.

What does automatic migration do?

When automatic migration is enabled (map_legacy_users: true), Headscale will first match an OIDC account to a Headscale account by iss and sub, and then fall back to matching OIDC users similarly to how Headscale v0.23.0 did:

  • If strip_email_domain: true (the default): the Headscale username matches the "username" part of their email address.
  • If strip_email_domain: false: the Headscale username matches the whole email address.

On migration, Headscale will change the account's username to their preferred_username. This could break any ACLs or policies which are configured to match by username.

Like with Headscale v0.23.0 and earlier, this migration only works for users who haven't changed their email address since their last Headscale login.

A successful automated migration should otherwise be transparent to users.

Once a Headscale account has been migrated, it will be unavailable to be matched by the legacy process. An OIDC login with a matching username but non-matching iss and sub will instead get a new Headscale account.

Because of the way OIDC works, Headscale's automated migration process can only work when a user tries to log in after the update. Mass updates would require Headscale implement a protocol like SCIM, which is extremely complicated and not available in all identity providers.

Administrators could also attempt to migrate users manually by editing the database, using their own mapping rules with known-good data sources.

Legacy account migration should have no effect on new installations where all users have a recorded sub and iss.

What happens when automatic migration is disabled?

When automatic migration is disabled (map_legacy_users: false), Headscale will only try to match an OIDC account to a Headscale account by iss and sub.

If there is no match, it will get a new Headscale account – even if there was a legacy account which could have matched and migrated.

We recommend new Headscale users explicitly disable automatic migration – but it should otherwise have no effect if every account has a recorded iss and sub.

When automatic migration is disabled, the strip_email_domain setting will have no effect.

Other OIDC changes

Headscale now uses the standard OIDC claims to populate and update user information every time they log in:

Headscale profile field OIDC claim Notes / examples
email address email Only used when "email_verified": true
display name name eg: Sam Smith
username preferred_username Varies depending on IdP and configuration, eg: ssmith, [email protected], \\example.com\ssmith
profile picture picture URL to a profile picture or avatar

These should show up nicely in the Tailscale client.

This will also affect the way you reference users in policies.


@kradalby kradalby force-pushed the kradalby/oidc-takeover branch from cc52dff to fbf6f24 Compare October 23, 2024 15:47
@kradalby kradalby force-pushed the kradalby/oidc-takeover branch from fbf6f24 to 950d062 Compare October 23, 2024 19:24
@kradalby
Copy link
Collaborator Author

Thinking about things some more, enabling migration should have no effect on the security of new Headscale installations where every user has an iss and sub recorded. I like the idea of making you choose – but I can also see the support angle for this.

That makes sense.

I still need to have a look at the code, but I've attempted to reword the change log (and ended up spending quite a long time on it). What's there ("domain is always part of the username for OIDC") is not always true, and some bits are misleading.

I think writing the change log first will make sure we agree on the goal. 😄

I've included your well-written changes, thank you, I think it covers it and what I have done, we can review and comment on that as part of this PR.

I suspect #2205 will block some of what goes into this changelog – but I'll focus on OIDC for now.

Lets merge this first, then we adopt it to #2205 as part of the process there, while we are improving that in parallel.

Thank you, and let me know how the code review goes.

@jirutka
Copy link

jirutka commented Nov 4, 2024

OIDC users by iss + sub, eg: idm.example.com/97fb4c49-c3b2-4e13-a8b9-cbe4d6525dc5@oidc

Do you plan to implement support for using multiple OIDC issuers (which practically means multiple OIDC providers) simultaneously? If not, then iss is unnecessary and just makes the user identifier impractically long (e.g. for using in ACLs).

Please let the administrator decide what they want to use as the user identifier. Now I mapped the username (centrally managed and so unique within the organisation) to sub (in Keycloak), so I can use normal user identification in the ACLs (just to find out that it still uses email, but I assume that it’s not the final design). It would be even better to make the claim used as the user identifier configurable (for admins who don’t control their IdP). It’s the admin’s responsibility to choose an appropriate identifier (which depends on the organisation’s policy) and make sure that it’s unique and immutable.

BTW, why have an email address in the first place? And why email verification? Is it just for the migration process from the previous OIDC implementation? Headscale just needs some user identifier. Maybe the user (full) name (in a free form) for a nicer UI. Collecting and storing personal data is regulated by law (at least in the EU), so it might even be problematic for some users.

@oneingan
Copy link

oneingan commented Nov 5, 2024

In my opinion multiple OIDC provider will be a must in the short time, considering how zero-secrets workload identification is evolving, for example with GitHub Actions OIDC provider.

@micolous
Copy link

micolous commented Nov 7, 2024

Please let the administrator decide what they want to use as the user identifier. Now I mapped the username (centrally managed and so unique within the organisation) to sub (in Keycloak), so I can use normal user identification in the ACLs (just to find out that it still uses email, but I assume that it’s not the final design).

@jirutka Your installation now makes it impossible for username and email changes to take effect – exactly the issue we're trying to fix.

Name changes happen, and failure to handle that properly is an ethical problem.

As an anecdote: I know of a very large technology company where username changes were possible, but caused you to be locked out of all systems for 1 - 2 weeks while the change propagated to all systems, often through manual intervention. People would plan to change teams (which would result in ACL reassignments anyway) or take vacation when they wanted to change username.

This was because many of the company's systems primarily worked with usernames, email addresses or UPNs, rather than a sub-like identifier.

It would be even better to make the claim used as the user identifier configurable (for admins who don’t control their IdP). It’s the admin’s responsibility to choose an appropriate identifier (which depends on the organisation’s policy) and make sure that it’s unique and immutable.

With respect: administrators consistently shoot themselves in the foot on this one. Mapping username to sub is a great example of that.

The OIDC specification is very clear about what is and isn't a stable identifier. It also states what a sub should look like:

REQUIRED. Subject Identifier. A locally unique and never reassigned identifier within the Issuer for the End-User, which is intended to be consumed by the Client, e.g., 24400320 or AItOawmwtWwcT0k51BayewNvutrJUqsvl6qs7A4. It MUST NOT exceed 255 ASCII [RFC20] characters in length. The sub value is a case-sensitive string.

If your sub looks like jirutka or micolous, then you're doing it wrong.

Instead, lets focus on the goal: to be able to assign policies based on a familiar identifier.

I already proposed a mechanism by which an administrator can continue to reference a user by a familiar-yet unstable identifier, and have it automatically converted into a policy which uses a stable identifier.

That will require some more work to make it happen, which won't land in this PR.

Headscale's previous use of the email claim mimicked Tailscale's design. Unfortunately, Tailscale's design does not appear to consider this issue.

It will take many steps to fully unravel and address this design problem in a safe and secure way. 😄

@micolous
Copy link

micolous commented Nov 7, 2024

In my opinion multiple OIDC provider will be a must in the short time, considering how zero-secrets workload identification is evolving, for example with GitHub Actions OIDC provider.

While multiple OIDC providers probably won't be implemented in the short term, I agree that it's important to consider this in the present design, so that we don't need another migration process to unlock that functionality. 😄

Copy link

@micolous micolous left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With User.Name no longer being a unique constraint (which is good!), there are a number of other issues this opens up:

  • users.GetUserByUsername() assumes that this is a unique way to address a user:

    It should return an error if more than one user matching a username, rather than returning the first record.

  • There are many internal methods which take a username as a parameter that depend on users.GetUserByUsername(), or call it multiple times:

    • users.DestroyUser
    • users.RenameUser (oldName parameter)
    • users.ListNodesByUser
    • users.AssignNodeToUser
    • preauth_keys.CreatePreAuthKey (you've already marked that one)
    • preauth_keys.ListPreAuthKeys
  • preauth_keys.GetPreAuthKey tests a provided username for equality with PreAuthKey.User.Name

  • The DeleteUser CLI command makes a GetUser RPC call first to confirm deletion, but the DeleteUser RPC addresses the user by username again, rather than ID.

    This can cause the command to delete an unexpected user if things change while waiting for confirmation (already an issue without this change), or if there are multiple users with the same username.

    This should instead use the Headscale user ID returned by the GetUser RPC for the DeleteUser RPC, so that it will only try to delete the user that the confirmation prompt indicated.

    There should also be a way to delete users by Headscale user ID.

  • There are methods which provide User.Name as an output, but that won't uniquely identify the user any more:

    • preauth_key.Proto (exposed over RPCs)
    • routes.EnableAutoApprovedRoutes (log line gives user by User.Name)

These are some ACL/policy related issues (we can look at #2205 for this), because they can evaluate only based on User.Username(). I suspect will be security holes in policy because it is both not unique and potentially under user control, but this is a gnarlier part because we need to work around Tailscale design issues.

But otherwise, this is on the right track. 😅

hscontrol/types/users.go Outdated Show resolved Hide resolved
hscontrol/types/users.go Outdated Show resolved Hide resolved
hscontrol/types/users.go Outdated Show resolved Hide resolved
@kradalby
Copy link
Collaborator Author

With User.Name no longer being a unique constraint (which is good!), there are a number of other issues this opens up:

I believe all of these have been addressed now. They will still resolve from username to an ID, but fail if there is more than one user.

I have filed #2246 as a followup to add ID flags to all CLI commands as this PR is already big enough.

@kradalby kradalby requested a review from micolous November 18, 2024 16:34
@kradalby kradalby force-pushed the kradalby/oidc-takeover branch from 2a4a884 to 0f43285 Compare November 22, 2024 11:23
@kradalby kradalby added this to the v0.24.0 milestone Nov 22, 2024
This commit hardens the migration part of the OIDC from
the old username based approach to the new sub based approach
and makes it possible for the operator to opt out entirely.

Fixes juanfont#1990

Signed-off-by: Kristoffer Dalby <[email protected]>
Signed-off-by: Kristoffer Dalby <[email protected]>
Signed-off-by: Kristoffer Dalby <[email protected]>
Signed-off-by: Kristoffer Dalby <[email protected]>
Signed-off-by: Kristoffer Dalby <[email protected]>
Signed-off-by: Kristoffer Dalby <[email protected]>
Signed-off-by: Kristoffer Dalby <[email protected]>
Signed-off-by: Kristoffer Dalby <[email protected]>
@kradalby kradalby force-pushed the kradalby/oidc-takeover branch from 9b37e31 to 4f57410 Compare November 22, 2024 16:51
This commit fixes the constraint syntax so it is both valid for
sqlite and postgres.

To validate this, I've added a new postgres testing library and a
helper that will spin up local postgres, setup a db and use it in
the constraints tests. This should also help testing db stuff in
the future.

postgres has been added to the nix dev shell and is now required
for running the unit tests.

Signed-off-by: Kristoffer Dalby <[email protected]>
@juanfont juanfont merged commit f6276ab into juanfont:main Nov 23, 2024
126 of 127 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] OIDC with permanent ID
6 participants