Skip to content
This repository has been archived by the owner on Apr 28, 2020. It is now read-only.

Email addresses are case sensitive #221

Open
jace opened this issue Sep 21, 2017 · 2 comments
Open

Email addresses are case sensitive #221

jace opened this issue Sep 21, 2017 · 2 comments

Comments

@jace
Copy link
Member

jace commented Sep 21, 2017

In #215 we enforced a lowercase index for email addresses. This is pragmatic as it is extremely unlikely that a given email domain will have the same account with different casings. Most providers prohibit this.

However, email addresses are case sensitive per RFC 5321:

The local-part of a mailbox MUST BE treated as case sensitive. Therefore, SMTP implementations MUST take care to preserve the case of mailbox local-parts. In particular, for some hosts, the user "smith" is different from the user "Smith". However, exploiting the case sensitivity of mailbox local-parts impedes interoperability and is discouraged. Mailbox domains follow normal DNS rules and are hence not case sensitive.

A lower-cased email address may not actually be reachable. We should:

  1. Enforce the lowercase unique index, but
  2. Preserve case in email addresses everywhere (UserEmail, UserEmailClaim and other apps like Hasjob and Boxoffice).

Rather than a SQL index on LOWER(email), we should have a normalised_email column on the UserEmail model and place the unique index on that. By moving normalisation into the app, we handle special cases:

  1. Removing periods in @gmail.com addresses, as Gmail disregards them.
  2. Removing + suffixes as those are the same mailbox (optional).

Pending issues:

  1. Does Gmail's period-ignoring behaviour apply to all G Suite domains? (Update: it doesn't)
  2. Gravatar requires the MD5sum to be of the lowercase email. This is the primary use of MD5sum (once Switch from md5sum to sha256 #165 is resolved). What's the data source for calculating MD5? If we take normalised_email, we're also losing periods in gmail.com addresses, which is not what Gravatar is expecting.
@jace
Copy link
Member Author

jace commented Oct 12, 2017

Possible solution to the second problem with Gravatar, etc: store two normalised emails:

  1. Lowercase normalised version, which has a unique constraint (in UserEmail only). This is the reference for queries. It could be the existing LOWER(email) index instead of a distinct column.

  2. Application normalised version in which:

    1. + suffixes are removed,
    2. @googlemail.com is replaced with @gmail.com (ref), and
    3. periods are stripped from @gmail.com addresses

The application normalised version is used for discovery of a re-used email address, but uniqueness is not enforced, since there are legitimate reasons for users to re-use addresses. It may be used for relevant security checks if, for example, one version is used as a user's email address and another as a organisation's or team's, to warn the user that shared access to the email address may compromise their own account.

@jace
Copy link
Member Author

jace commented Oct 12, 2017

The logic for application normalised emails could be in the mxsniff library, although that one currently involves a DNS lookup. mxsniff's provider list could be modified to include the primary domain instead of the MX target, and a custom normalisation function.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant