Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#2119 Review load balancing (2nd round) and redesign DefaultConsulServiceBuilder with ConsulProviderFactory refactoring to make it thread safe and friendly #2151

Merged
merged 20 commits into from
Oct 3, 2024

Conversation

raman-m
Copy link
Member

@raman-m raman-m commented Sep 13, 2024

Fixes #2119

Root Cause

In commit 34cb3eb by PR #2067, new static definitions were introduced in ConsulProviderFactory due to the DefaultConsulServiceBuilder constructor needing a factory argument to obtain a ConsulRegistryConfiguration object for implementing the IConsulServiceBuilder.Configuration property. The static design of ConsulProviderFactory necessitated retrieving the service builder from the service provider to inject into the Consul object. Consequently, a Func<ConsulRegistryConfiguration> service was added to the DI to enable the acquisition of the config object in the DefaultConsulServiceBuilder constructor. However, the delegate was static, which was a significant flaw in attempting to design the feature with static definitions, leading to only the first request receiving the correct config object for the first service name. Subsequent requests with another service name failed to receive the correct config object.

To be honest, my grasp of this bug is not yet complete... Here is clean branch created from develop with tests that replicate the issue: raman-m/2119-steps-to-repro-clean-experiment.
However, the acceptance test is now passing, and I am optimistic that we can proceed with the merge.

Proposed Changes

Predecessors

@raman-m raman-m self-assigned this Sep 13, 2024
@raman-m raman-m added bug Identified as a potential bug Service Discovery Ocelot feature: Service Discovery high High priority Consul Service discovery by Consul Oct'24 October 2024 release Load Balancer Ocelot feature: Load Balancer labels Sep 13, 2024
@raman-m raman-m added this to the v23.3 Hotfixes milestone Sep 13, 2024
@raman-m raman-m requested review from RaynaldM and ggnaegi and removed request for RaynaldM September 24, 2024 13:48
@raman-m
Copy link
Member Author

raman-m commented Sep 25, 2024

@ggnaegi Hi Gui,
I would value your input on a code review.

I've established a clean branch from develop with tests that replicate the issue: raman-m/2119-steps-to-repro-clean-experiment
Please focus on the ShouldReturnDifferentServicesWhenConcurrentlyRequestingToDifferentServices test, which consistently fails, so it replicates the bug.
This branch will be used to dissect the concurrency bug and better articulate the root cause.
If appropriate, we could utilize this branch for additional analysis and compare it to the bugfix changes in the feature branch.
I'll conduct a thorough analysis later; however, you may commence the code review immediately.

@raman-m raman-m changed the title #2119 Review load balancing (round 2) and redesign DefaultConsulServiceBuilder with ConsulProviderFactory refactoring #2119 Review load balancing (2nd round) and redesign DefaultConsulServiceBuilder with ConsulProviderFactory refactoring to make it thread safe and friendly Sep 25, 2024
@raman-m
Copy link
Member Author

raman-m commented Sep 25, 2024

Task

Comment on lines 29 to 30
.RemoveAll(typeof(IFileConfigurationPollerOptions))
.AddSingleton<IFileConfigurationPollerOptions, ConsulFileConfigurationPollerOption>();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggnaegi Fascinating design, isn't it? 😄 We're upgrading from the In-Memory poller to this advanced one. Exciting!

@raman-m
Copy link
Member Author

raman-m commented Sep 29, 2024

@ggnaegi With the addition of commit eb9be3e, I am now fully confident that the proposed hotfix is effective. 😃 Looking forward to your review... 🙏
📓 The primary acceptance test is ShouldReturnDifferentServicesWhenConcurrentlyRequestingToDifferentServices.

@raman-m
Copy link
Member Author

raman-m commented Sep 30, 2024

Development Complete❗

@ggnaegi @RaynaldM Please review and approve officially.

@minnocenti901 Your assistance with testing would be valuable. Could you please create a local build from the feature branch, deploy it, and conduct basic routing tests from #2119 ?

Copy link
Member

@ggnaegi ggnaegi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, good idea to pass the configuration to the http context items. Are we sure it's sound in case of multiplexing too?

src/Ocelot.Provider.Consul/Consul.cs Show resolved Hide resolved
public const string PollConsul = nameof(Provider.Consul.PollConsul);

private static readonly List<PollConsul> ServiceDiscoveryProviders = new();
private static readonly object LockObject = new();
private static readonly List<PollConsul> ServiceDiscoveryProviders = new(); // TODO It must be scoped service in DI-container
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, it's not good at the minute.

Copy link
Member Author

@raman-m raman-m Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it is necessary. However, there is no time to refactor further because it's beyond the scope of the current bug fix. For now, the comments in src are noted for future refactoring. The present solution may not be elegant, but it functions. I have prepared a TODO task for the next phase of refactoring, which has been added to #2141 (comment) (item 5).

Not an issue.

configuration = new ConsulRegistryConfiguration(config.Scheme, config.Host, config.Port, route.ServiceName, config.Token);
var configuration = new ConsulRegistryConfiguration(config.Scheme, config.Host, config.Port, route.ServiceName, config.Token);
var contextAccessor = provider.GetService<IHttpContextAccessor>();
contextAccessor.HttpContext.Items[nameof(ConsulRegistryConfiguration)] = configuration;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea to use the http context items as storage of the consul registry configuration. Just a stupid question. Could we have some side effects with the multiplexer (copies of the http context)?

Copy link
Member Author

@raman-m raman-m Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, you worry about this baby, your beloved multiplexer... 🤣 ❤️
Good question... The IServiceDiscoveryProviderFactory is incorporated within the LoadBalancerFactory, which is invoked by the LoadBalancerHouse and is a component of the LoadBalancingMiddleware. Consequently, all service discovery providers, including our lovely Consul provider, are part of the LoadBalancingMiddleware.
To address your question, we must analyze the differences between LoadBalancingMiddleware and MultiplexingMiddleware, as well as their respective positions in the Ocelot pipeline.

public static RequestDelegate BuildOcelotPipeline(this IApplicationBuilder app,

app.UseMultiplexingMiddleware();

app.UseLoadBalancingMiddleware();

In the method on line 190, the early bird multiplexer invokes the load-balancing middleware, as seen here:
private async Task<HttpContext> ProcessRouteAsync(HttpContext sourceContext, DownstreamRoute route, List<PlaceholderNameAndValue> placeholders = null)
{
var newHttpContext = await CreateThreadContextAsync(sourceContext, route);
CopyItemsToNewContext(newHttpContext, sourceContext, placeholders);
newHttpContext.Items.UpsertDownstreamRoute(route);
await _next.Invoke(newHttpContext);
return newHttpContext;
}

Subsequently, the next delegate is passed a new context object, correct? Thus, the LoadBalancingMiddleware is provided with a new HttpContext object. However, it appears inconsequential whether a new or old context object is processed due to the injection of IHttpContextAccessor?...
The questions could be phrased as: "Do the IHttpContextAccessor instances in both ConsulProviderFactory and DefaultConsulServiceBuilder reference the same HttpContext objects?'
It seems irrelevant whether they are new or old. Ideally, given the load balancing context, the same HttpContext should be maintained.

I've reviewed the DI container setup, and it appears that our IHttpContextAccessor service is indeed a singleton as shown here:

Services.TryAddSingleton<IHttpContextAccessor, HttpContextAccessor>();

It seems we're processing an old context object, not the one from the multiplexer, which is disappointing. However, this should not be an issue since we use the same context object in both ConsulProviderFactory and DefaultConsulServiceBuilder, so there shouldn't be any exceptions or empty configuration objects during the read operation: DefaultConsulServiceBuilder Line#30

And...

Services.TryAddSingleton<IHttpContextAccessor, HttpContextAccessor>();

Ups❗ Why is it added as singleton? 🙄 I thought it is scoped service. 🙈 "Houston, we have a problem!" 🤣

Oh, no....
https://github.com/ThreeMammals/Ocelot/blame/develop/src/Ocelot/DependencyInjection/OcelotBuilder.cs#L125

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tom added this service as singleton in 2017... 🤯

Oh, no!... This problem exists since ages ❗ 🤕

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My Visual Studio disassembler shows this code:

    /// <summary>
    /// Provides an implementation of <see cref="IHttpContextAccessor" /> based on the current execution context.
    /// </summary>
    public class HttpContextAccessor : IHttpContextAccessor
    {
        private static readonly AsyncLocal<HttpContextHolder> _httpContextCurrent = new AsyncLocal<HttpContextHolder>();

        /// <inheritdoc/>
        public HttpContext? HttpContext
        {
            get
            {
                return  _httpContextCurrent.Value?.Context;
            }
            set
            {
                var holder = _httpContextCurrent.Value;
                if (holder != null)
                {
                    // Clear current HttpContext trapped in the AsyncLocals, as its done.
                    holder.Context = null;
                }

                if (value != null)
                {
                    // Use an object indirection to hold the HttpContext in the AsyncLocal,
                    // so it can be cleared in all ExecutionContexts when its cleared.
                    _httpContextCurrent.Value = new HttpContextHolder { Context = value };
                }
            }
        }

        private class HttpContextHolder
        {
            public HttpContext? Context;
        }
    }

So, it is unclear how this default implementation is consumed by ASP.NET pipeline...
Well... the questions is "How is the setter of HttpContext property consumed?"
So we have a static AsyncLocal instance... which is wrapper for HttpContextHolder
Probably AsyncLocal keeps scoped version of HttpContext but I am not sure...

Copy link
Member Author

@raman-m raman-m Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, I have utilized the official MS DI-helper: AddHttpContextAccessor, which also registers a singleton:

        public static IServiceCollection AddHttpContextAccessor(this IServiceCollection services)
        {
            if (services == null)
            {
                throw new ArgumentNullException(nameof(services));
            }

            services.TryAddSingleton<IHttpContextAccessor, HttpContextAccessor>();
            return services;
        }

However, upon reviewing the documentation, it seems logical that this Singleton design should yield a scoped version of the object, correct? Yet, ASP.NET applications operate correctly.
This design remains a bit perplexing to me, but it does function within the ASP.NET pipeline.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggnaegi

Are we sure it's sound in case of multiplexing too?

No, not quite sure.
What should I do here in your opinion? 'cause I am confused after this code research
I can write some tests for scenario with aggregated routes, should I? But it is a bit beyond the scope.
What I can do right now is submitting one commit with code enhancement in .

Copy link
Member Author

@raman-m raman-m Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discovered an intriguing class: HttpDataRepository in the infrastructure, which employs the same concept of sharing data across all scoped services by storing it in HttpContext.Items. The dilemma then arises: should one inject the native ASP.NET service IHttpContextAccessor, or opt for Ocelot's own IRequestScopedDataRepository? 😃

Update 1

Injecting a data repository is a strict case; therefore, injecting HttpContext is preferable as it provides extensive functionality based on the context. Consequently, the builder class can theoretically act as middleware.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggnaegi FYI, my commit for code review is 8acd287. Honestly, I'm quite exhausted. Could you approve it? If not, I might have to pass this task over to you. 😉
Just a heads up, the configuration isn't actually consumed in the default service builder; it's set up for descendants and intended for future use. You'll find further explanations in the commit as TODO notes.

@raman-m
Copy link
Member Author

raman-m commented Oct 2, 2024

TODO

  • Refactor ConsulProviderFactory according to the ToDo-comments in src. Main goal is making ConsulProviderFactory as a true IServiceDiscoveryProviderFactory. Inherit from the IServiceDiscoveryProviderFactory interface, and finally moving to DI-container.
  • Acceptance test(s for aggregated routes to ensure the Multiplexer is functioning correctly

Convert anonymous delegates to named ones in placeholders processing
@raman-m raman-m removed the Oct'24 October 2024 release label Oct 2, 2024
@ggnaegi
Copy link
Member

ggnaegi commented Oct 2, 2024

@raman-m let's accept your changes and see what's coming, ok?

@raman-m
Copy link
Member Author

raman-m commented Oct 2, 2024

@ggnaegi I could extend the description of the follow up ToDo task with adding a requirement to write acceptance tests for aggregated routes to ensure Multiplexer correctly functions.
So, my plan will be to add those tests for 2119 case plus Aggregation during the next phase of refactoring of ConsulProviderFactory. Sounds good?
If we detect some Consul issue in Production, we will release a hotfix. Sounds better?

Will you deploy this version 23.3.4 to your Production env? I guess you should deploy to try all new features of ver. 23.3.0

Copy link
Member

@ggnaegi ggnaegi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will test this version later this month on a staging environment. I'm unfortunately still using Ocelot 23.2.2 on production...

@raman-m raman-m merged commit 09f2b1a into develop Oct 3, 2024
1 check passed
@raman-m raman-m deleted the raman-m/2119 branch October 3, 2024 15:09
raman-m added a commit that referenced this pull request Oct 3, 2024
…Blue Olympic Balumbes release

* #2084 Apply default config file paths in `GetMergedOcelotJson` when providing the `folder` argument of `AddOcelot` (#2120)

* Adding unit test first

* Fixing default global config file not being found in folder

* Adding PR trait to test

* Backing out whitespace changes

* Code review by @raman-m

* Create Configuration feature folder and move test classes

* Adjust namespace and review what we have

* Acceptance tests for #2084 user scenario

---------

Co-authored-by: Raman Maksimchuk <[email protected]>

* Bump Steeltoe.Discovery.Eureka from 3.2.5 to 3.2.8 in /src/Ocelot.Provider.Eureka (#2122)

* Bump Steeltoe.Discovery.Eureka in /src/Ocelot.Provider.Eureka

Bumps [Steeltoe.Discovery.Eureka](https://github.com/SteeltoeOSS/Steeltoe) from 3.2.5 to 3.2.8.
- [Release notes](https://github.com/SteeltoeOSS/Steeltoe/releases)
- [Changelog](https://github.com/SteeltoeOSS/Steeltoe/blob/main/Steeltoe.Release.ruleset)
- [Commits](SteeltoeOSS/Steeltoe@3.2.5...3.2.8)

---
updated-dependencies:
- dependency-name: Steeltoe.Discovery.Eureka
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

* Bump Steeltoe.Discovery.ClientCore from 3.2.5 to 3.2.8

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Raman Maksimchuk <[email protected]>

* #2110 Review load balancing and independent fetching the list of services in `Kube` provider (#2111)

* Move the creation of the services list from the class field to the method, to prevent modification list from different threads

* Early return after data checking

* Add unit test for concurrent get list of services

* Add logging for invalid service configuration error in RoundRobin load balancer

* Code review by @raman-m

* Workaround for mistakes made during acceptance testing of load balancing versus service discovery, where tests designed for parallel requests were mistakenly executed sequentially. This resulted in load balancers being loaded by sequential `HttpClient` calls, which was a significant oversight.

* Let's DRY StickySessionsTests

* Add acceptance tests, but...
RoundRobin is not actually RoundRobin 😁 -> 😆

* Independent static indexing iterators per route via service names

* Stabilize `CookieStickySessions` load balancer.
Review tests after refactoring of `RoundRobin` load balancer

* Refactor Lease operation for load balancing.
Review LeastConnection load balancer

* Leasing mechanism in Round Robin load balancer

* Acceptance tests, final version

* Apply Retry pattern for K8s endpoint integration

* Fix IDE warnings and messages

* Follow suggestions and fix issues from code review by @ggnaegi

* Bump KubeClient from 2.4.10 to 2.5.8

* Fix warnings

* Final version of `Retry` pattern

---------

Co-authored-by: Raman Maksimchuk <[email protected]>

* Downgrade the Warning to Information on missing `Content-Length` header in `MultiplexingMiddleware` (#2146)

* fix: downgrade the warning to information on missing content-length header

* chore: add route name to logs

* test: fixing multiplexing middleware tests

* Code review by @raman-m

---------

Co-authored-by: Paul Roy <[email protected]>
Co-authored-by: Raman Maksimchuk <[email protected]>

* Correct the broken link to the GraphQL sample's `README.md` (#2149)

Signed-off-by: Emmanuel Ferdman <[email protected]>
Co-authored-by: Raman Maksimchuk <[email protected]>

* #2116 Escaping unsafe pattern values of `Regex` constructor ​​derived from URL query parameter values containing special `Regex` chars (#2150)

* regex escape handling for url templates

* refactored regex method to lamda version

* Quick code review by @raman-m

* added acceptance test for url regex bug

* moved acceptance test to routing tests

* Convert to theory: define 2 test cases

---------

Co-authored-by: Raman Maksimchuk <[email protected]>

* #2119 Review load balancing (2nd round) and redesign `DefaultConsulServiceBuilder` with `ConsulProviderFactory` refactoring to make it thread safe and friendly (#2151)

* Review tests

* History of Service Discovery testing: add traits

* LoadBalancer traits

* #2119 Steps to Reproduce

* Reuse service handlers of `ConcurrentSteps`

* Reuse service counters of `ConcurrentSteps`

* Add LoadBalancer namespace and move classes

* Move `Lease`

* Move `LeaseEventArgs`

* Analyze load balancers aka `ILoadBalancerAnalyzer` interface objects

* Prefer using named local methods as delegates over anonymous methods for awesome call stack, ensuring the delegate's typed result matches the typed balancer's creator. Additionally, employ an IServiceProvider workaround.

* Review load balancing. Assert service & leasing counters as concurrent step. Final version of acceptance test.

* Fixed naming violation for asynchronous methods: `Lease` -> `LeaseAsync`

* Fix ugly reflection issue of dymanic detection in favor of static type property

* Propagate the `ConsulRegistryConfiguration` object through `HttpContext` in the scoped version of the default service builder, utilizing the injected `IHttpContextAccessor` object.
Update `ConsulProviderFactory`.
Update docs.
Update tests.

* Add tests from clean experiment

* Final review of the tests

* Review `IHttpContextAccessor` logic.
Convert anonymous delegates to named ones in placeholders processing

* Tried to enhance more, but failed

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Emmanuel Ferdman <[email protected]>
Co-authored-by: Ben Bartholomew <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Roman <[email protected]>
Co-authored-by: Paul Roy <[email protected]>
Co-authored-by: Paul Roy <[email protected]>
Co-authored-by: Emmanuel Ferdman <[email protected]>
Co-authored-by: Finn <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Identified as a potential bug Consul Service discovery by Consul high High priority Load Balancer Ocelot feature: Load Balancer Service Discovery Ocelot feature: Service Discovery
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Concurrent requests with different services, using consul, returns the same service and then 404
2 participants