Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backoffice anti-forgery token can't be decrypted after deployments #16107

Open
enkelmedia opened this issue Apr 19, 2024 · 12 comments
Open

Backoffice anti-forgery token can't be decrypted after deployments #16107

enkelmedia opened this issue Apr 19, 2024 · 12 comments
Assignees
Labels
state/needs-investigation This requires input from HQ or community to proceed type/bug

Comments

@enkelmedia
Copy link
Contributor

enkelmedia commented Apr 19, 2024

Which Umbraco version are you using? (Please write the exact version, example: 10.1.0)

v11-v13 (latest)

Bug summary

We've noticed lately that there's some kind of issue with the backoffice anti-forgery token in some scenarios like app pool recycles and app pool switches causing the backoffice to not work and the server to return HTTP Status: 417 The anti-forgery token could not be decrypted.

This happens for us during deployments and has surfaced because customers have worked on a page that was loaded before our deployment. Then, due to the issue, they could not save their work. The only error indication is the X in the save button (no notification) so sometimes editors think that they have saved.

For the record: We do store the key ring on disk (using AddDataProtection()) and the issue does not affect anti-forgery tokens for the front end at all, only the backoffice. This also works for the Auth-cookie so we do not have to log in again in any of the scenarios.

On v12-v13 we consistently get "HTTP Status: 417 The anti-forgery token could not be decrypted" after app pool recycle on the production machines configured with default App Pool settings.

Specifics

The symptoms indicate that there is something special about how the key ring for the backoffice anti-forgery token is handled, almost as if it does not use the storage configured by ´AddDataProtection()` but creates something that seems to be dependent on the App Pool / App Pool Identity.

The anti forgery token on the front end works in all scenarios below, for both app pool recycle and app pool switches. I've also tested a vanilla MVC site with a form+anti forgery token without being able to reproduce the problem, this indicates that the backoffice token is handled in a different way causing the issue.

There are two types of actions that we've managed to find that would trigger this issue:

  • App Pool recycle in certain scenarios, not all the time a recyle would invalidate the backoffice anti forgery token.
  • App Pool switch, that is having two identical App Pools and switching the Websites App Pool between when the token was created and when the HTTP Post request is sent. This forum thread seems to be talking about something similar that happens with Azure Slot Swap.

Also, to be clear this does not happen:

  • When running via Kesterl (dotnet run)
  • When running via IIS Express from Visual Studio

We've tested on some of our production environments to identify if there were some changes in the CMS that might give us insights. This is the results:

Machine Recycle App Pool Switch Umb-version Runtime App-Pool Identity
Windows Server 2019 #1 Works 9.3.1 6.0.4 ApplicationPoolIdentity
Windows Server 2022 #1 Works 10.3.2 6.0.10 ApplicationPoolIdentity
Windows Server 2022 #2 Works Error 11.2.1 7.0.11 ApplicationPoolIdentity
Windows Server 2019 #2 Works 11.2.2 7.0.14 ApplicationPoolIdentity
Windows 2019 #1 Works Error 11.3.1 7.0.5 ApplicationPoolIdentity
Windows 2019 #1 Works 12.0.1 7.0.5 ApplicationPoolIdentity
Windows Server 2022 #2 Error 12.3.5 7.0.11 ApplicationPoolIdentity
Windows Server 2022 #2 Error 12.3.5 7.0.11 ApplicationPoolIdentity
Windows Server 2022 #2 Error 13.11.1 8.0.4 ApplicationPoolIdentity
Windows Server 2019 #2 Error 13.2.2 8.0.0 ApplicationPoolIdentity

Noticing that recycle worked in our tests up until 12.3.5 which might indicate that changes introduced to facilitate the content delivery API impacted this?

The "app pool switch" seems to have been an issue at least back to v 11.2.1.

Here is a screen share demonstrating the issue:

umb13-recycle

  • First triggering the issue in the backoffice
  • Demonstrating that the front end validates the anti forgery token
  • Demonstrating that the front end anti forgery token works after the recyle, indicating that the issue is isolated to the backoffice token.

Steps to reproduce

I've created a zip for the project that I use for testing: https://www.dropbox.com/scl/fi/bd6wcnm3v19v6f6js4b1v/share-Umb-417-issue-MyProject.zip?rlkey=u1e2ezbzgqsmar30tow1vhx5f&dl=0

Here are the basic steps:

1. Create a new Umbraco website

dotnet new install Umbraco.Templates::13.2.2 --force && dotnet new sln --name "MySolution" && dotnet new umbraco --force -n "MyProject" --friendly-name "Administrator" --email "[email protected]" --password "1234567890" --development-database-type SQLite && dotnet sln add "MyProject" && dotnet add "MyProject" package clean && dotnet run --project "MyProject"

2. Build a standard "MVC" form using a surface controller

This is to test the difference between the front-end anti-forgery token and the one used in the backoffice.

3. Publish the project

In the same folder as the .csproj, publish the project

dotnet publish

4. Setup IIS

In IIS:

  • Create two identical App Pools (e.g. test-pool1 and test-pool2).
  • Create a website and point the publish folder (bin/release/net8.0/publish) of the project, use App Pool "test-pool1".

(Notice: In some scenarios, the recycle error is triggered when running the App Pool as ApplicationPoolIdentity and sometimes we've had to change the App Pool identity to a local user to trigger the problem).

5.1. Replicate "App Pool Recycle error"

  • Run the website and login to the backoffice, open a page and press "save and publish", notice that this works and returns a HTTP status of 200.
  • Keep the browser open, recycle the app pool
  • Without refreshing the browser, press "Save and publish" - notice the 417 error.

5.2. Replicate "App Pool Switch error"

  • Run the website and login to the backoffice, open a page and press "save and publish", notice that this works and returns a HTTP status of 200.
  • Keep the browser open, in IIS: Open the website properties and switch App Pool from "test-pool1" to "test-pool2".
  • Without refreshing the browser, press "Save and publish" - notice the 417 error.

6 Front end

Perform step 5.1 and 5.2 with the form on the front-end to notice that the error is not present here.

Expected result / actual result

The backoffice anti-forgery token should be decrypted even after app pool recycle or app pool switch, without having to reload the backoffice page.

Copy link

Hi there @enkelmedia!

Firstly, a big thank you for raising this issue. Every piece of feedback we receive helps us to make Umbraco better.

We really appreciate your patience while we wait for our team to have a look at this but we wanted to let you know that we see this and share with you the plan for what comes next.

  • We'll assess whether this issue relates to something that has already been fixed in a later version of the release that it has been raised for.
  • If it's a bug, is it related to a release that we are actively supporting or is it related to a release that's in the end-of-life or security-only phase?
  • We'll replicate the issue to ensure that the problem is as described.
  • We'll decide whether the behavior is an issue or if the behavior is intended.

We wish we could work with everyone directly and assess your issue immediately but we're in the fortunate position of having lots of contributions to work with and only a few humans who are able to do it. We are making progress though and in the meantime, we will keep you in the loop and let you know when we have any questions.

Thanks, from your friendly Umbraco GitHub bot 🤖 🙂

@enkelmedia
Copy link
Contributor Author

enkelmedia commented Apr 19, 2024

Okey, I think I've figured out the problem.

The Umbraco.Cms.Web.BackOffice.Security.BackOfficeAntiforgery class contains some interesting hacks =D

A custom ServiceCollection is created to get around some issues with creating a custom AntiForgerty service, the fact that this DI container only exists inside this class means that any configuration made during startup will be ignored. This actually resonates with my initial findings that the backoffice anti forgery token does not use the key store configured with AddDataProtection.

I figured that we could probably pass on the global configuration into the class-isolated DI container to be able to respect the data protection configuration.

I've created a simple work around that can be applied to any site to replace the current implementation:

public class PatchedBackofficeAntiforgery : IBackOfficeAntiforgery
{
    private readonly IAntiforgery _internalAntiForgery;
    private readonly CookieBuilder _angularCookieBuilder;

    public PatchedBackofficeAntiforgery(
        IOptionsMonitor<GlobalSettings> globalSettings,
        ILoggerFactory loggerFactory,
        IServiceProvider serviceProvider)
    {
        CookieSecurePolicy cookieSecurePolicy = globalSettings.CurrentValue.UseHttps ? CookieSecurePolicy.Always : CookieSecurePolicy.SameAsRequest;

        // PATCH: Getting the IDataProtectionProvider from the "real" DI container to pass on to below.
        var dpProvider = serviceProvider.GetRequiredService<IDataProtectionProvider>();

        // NOTE: This is the only way to create a separate IAntiForgery service :(
        // Everything in netcore is internal. I have logged an issue here https://github.com/dotnet/aspnetcore/issues/22217
        // but it will not be handled so we have to revert to this.
        _internalAntiForgery = new ServiceCollection()
            .AddSingleton(loggerFactory)
            .AddSingleton<IDataProtectionProvider>((f) => dpProvider)
            .AddAntiforgery(x =>
            {
                x.HeaderName = Constants.Web.AngularHeadername;
                x.Cookie.Name = Constants.Web.CsrfValidationCookieName;
                x.Cookie.SecurePolicy = cookieSecurePolicy;
            })
            .BuildServiceProvider()
            .GetRequiredService<IAntiforgery>();

        // Configure cookie builder using defaults from antiforgery options
        _angularCookieBuilder = new AntiforgeryOptions().Cookie;
        _angularCookieBuilder.HttpOnly = false; // Needs to be accessed from JavaScript
        _angularCookieBuilder.SecurePolicy = cookieSecurePolicy;
    }

    /// <inheritdoc />
    public async Task<Attempt<string?>> ValidateRequestAsync(HttpContext httpContext)
    {
        try
        {
            await _internalAntiForgery.ValidateRequestAsync(httpContext);
            return Attempt<string?>.Succeed();
        }
        catch (Exception ex)
        {
            return Attempt.Fail(ex.Message);
        }
    }

    /// <inheritdoc />
    public void GetAndStoreTokens(HttpContext httpContext)
    {
        AntiforgeryTokenSet set = _internalAntiForgery.GetAndStoreTokens(httpContext);

        if (set.RequestToken == null)
        {
            throw new InvalidOperationException("Could not resolve a request token.");
        }

        // We need to set 2 cookies:
        // The cookie value that angular will use to set a header value on each request - we need to manually set this here
        // The validation cookie value generated by the anti-forgery helper that we validate the header token against - set above in GetAndStoreTokens
        httpContext.Response.Cookies.Append(Constants.Web.AngularCookieName, set.RequestToken, _angularCookieBuilder.Build(httpContext));
    }
}

Then during service registration, after AddUmbraco():

services.AddUnique<IBackOfficeAntiforgery,PatchedBackofficeAntiforgery>();

This implementation will have to add a new ctor (to get the "real" DI container) so I guess it would be considered a breaking change, maybe if someone from HQ want to have a look at my proposed implementation (that solves the problems for us) and let me know if you would like to have a PR =D

@Migaroez
Copy link
Contributor

Hey Markus, thank you so much for the detailed report #h5yr!

This is going to take a bit more work to verify and investigate, but at first glance it indeed seems wrong that we do not supply a way to configure DataProtection on the isolated servicescollection for the backoffice AF.

Since the backoffice AF is clearly separated from the normal AF, it would make sense to supply a mechanism where you could do services.AddBackofficeDataProtection() so you would have full control over the isolated AF like you do with with the global one instead of us just copying over the same settings by depending on the (globally) configured IDataProtectionProvider.

I will have a chat with the team and hopefully get back to you on this next week.

@Migaroez Migaroez self-assigned this Apr 30, 2024
@Migaroez Migaroez added the state/needs-investigation This requires input from HQ or community to proceed label Apr 30, 2024
@enkelmedia
Copy link
Contributor Author

Hi!

Thanks for looking at this!

I've been running my workaround in production for a while and it works fine, no problems since that update.

Let's establish a common "vocabulary" for the different parts here =D

  • Global configuration = The stuff configured in the "regular" IOC during startup.
  • Anti forgery configuration = The local little IOC-container inside the BackofficeAntiforgery class.

In the current state of the backoffice, the backoffice auth cookies are encrypted using the global configuration and only the anti forgery token uses this local IOC that basically ignores any configuration made during startup.

Personally I would just find it confusing to have to configure data protection two times. I'm guessing that the vast majority of sites would use the same data protection configuration for the frontend and the backoffice. However if it would be implemented in a way so that it reuses the "global configuration" but also make it possible to set backoffice-specific settings if needed that sounds like a good solution to me.

There are plenty of examples out there about how to configure data protection so it would probably be good one could just follow them to "make it work" and then if someone needs to fiddle with different settings for the backoffice they could dive into the Umbraco docs to find out how.

Cheers!

@robertjf
Copy link
Contributor

robertjf commented May 27, 2024

We are seeing this issue in Umbraco 13.3.2 and I've applied the above patch like so:

var umbracoBuilder = builder.CreateUmbracoBuilder()
    .AddBackOffice()
    .AddWebsite()
    .AddDeliveryApi()
    .AddComposers();

umbracoBuilder.Services.AddUnique<IBackOfficeAntiforgery, PatchedBackofficeAntiforgery>();

umbracoBuilder.Build();

I can get a breakpoint in the PatchedBackofficeAntiforgery code and it's being hit, but I'm still getting the following in the logs:

[11:15:12 WRN] Error unprotecting the session cookie.
System.Security.Cryptography.CryptographicException: The payload was invalid. For more information go to https://aka.ms/aspnet/dataprotectionwarning
   at Microsoft.AspNetCore.DataProtection.Managed.ManagedAuthenticatedEncryptor.Decrypt(ArraySegment`1 protectedPayload, ArraySegment`1 additionalAuthenticatedData)
   at Microsoft.AspNetCore.DataProtection.KeyManagement.KeyRingBasedDataProtector.UnprotectCore(Byte[] protectedData, Boolean allowOperationsOnRevokedKeys, UnprotectStatus& status)
   at Microsoft.AspNetCore.DataProtection.KeyManagement.KeyRingBasedDataProtector.Unprotect(Byte[] protectedData)
   at Microsoft.AspNetCore.Session.CookieProtection.Unprotect(IDataProtector protector, String protectedText, ILogger logger)

Even if I inject the new service further up the chain (directly after AddBackOffice()) it doesn't resolve the issue.

@enkelmedia
Copy link
Contributor Author

@robertjf I'm not sure that my hack fixes all kinds of scenarios where this is an issue. It worked for us in our very specific scenario.

Please not that you would probably have to delete any existing cookies for this to work, did you try that?

@ctolkien
Copy link
Contributor

Noting that we are seeing this when storing our data protection keys in blob storage, protected via keyvault (we're running in containers).

@seanrockster
Copy link

How does this apply to front-end (member) login? Not to concerned about backoffice logins however all our customers get logged out on slot swap.

@enkelmedia
Copy link
Contributor Author

@seanrockster the exact issue that I have should not impact the members on the front end. From your description, it sounds more like you are getting new keys for the data protection on each new release?

Some more info here: https://learn.microsoft.com/en-us/aspnet/core/security/data-protection/configuration/overview?view=aspnetcore-8.0

How are you storing the keys?

@seanrockster
Copy link

@enkelmedia Hi, we're not doing anything specific. We haven't configured any data protection other than what comes out of the box with umbraco.

@enkelmedia
Copy link
Contributor Author

@seanrockster The data protection stuff is not a ”Umbraco-thing”, its a .NET Core concept.

The keys used to encrypt/decrypt the cookies will be stored on the local machine by default, when you do a slot wrap there is a new machine, hence new keys that can’t decrypt your members cookies.

You need to persist the keys somewhere so that the new machine can use the same keys.

Have a look at the link from my previous answer.

Cheers!

@mzhang-asi
Copy link

mzhang-asi commented Dec 23, 2024

@enkelmedia Thanks so much! PatchedBackofficeAntiforgery approach solved the Backoffice anti-forgery issues on our production balance loader. We upgraded from Umbraco8 to Umbraco 13.5.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
state/needs-investigation This requires input from HQ or community to proceed type/bug
Projects
None yet
Development

No branches or pull requests

6 participants