Strange memory allocation after RabbitMQ-disconnect #1213

ChristianSteu · 2022-06-08T12:23:31Z

ChristianSteu
Jun 8, 2022

I'm tracking down a problematic behavior of my application using RabbitMQ (via MassTransit).

Normally everything works fine but we recently had a short outage of our RabbitMQ container and the application failed to reconnect, had a spontaneous allocation of ~1.3GB memory and a constant CPU usage increase to 50% (while normally needing only a few percent at most). Unfortunately this doesn't occur every time a disconnect happens, making things quite harder to figure out...

Investigating the issue so far makes me believe these are network-shenanigans that aren't handled well by RabbitMQ.Client: The memory is one byte array of length 1345270063, the first 96 bytes decoded to utf8 are:

1 400 Bad Request
Content-Type: text/plain; charset=utf-8
Connection: close

400 Bad Request

while the remainder of it seems to be 0. I could memory profile a run with dotMemory and the allocation stack is the reason I'm here (though I have previously started a discussion on MassTransit):

I could not reproduce it on a locally run rabbitmq-docker-node.
Tested/affected versions:

TargetFramework: net5.0-windows, net6.0
RabbitMQ.Client: 6.2.2, 6.2.4, 6.3
MassTransit/MassTransit.RabbitMQ: 7.3.0, 8.0.2, 8.0.3
RabbitMQ: Dockerimages rabbitmq:3.9.9-management (RabbitMQ 3.9.9, Erlang 24.1.6) / rabbitmq:3.10-management (RabbitMQ 3.10.5, Erlang 24.3.4), only change made to allow proxies between the RabbitMQs and its clients
OPNsense HAProxy and Traefik, routing the traffic to the container

The following is the last output of the rabbitmq-container shutting down before the incident happened:

2022-06-08 11:59:41.115728+00:00 [info] <0.627.0> connection <0.627.0> (10.0.21.23:64713 -> 10.0.2.2:5674) has a client-provided name: MassTransitRabbitMQTest
2022-06-08 11:59:41.120527+00:00 [info] <0.627.0> connection <0.627.0> (10.0.21.23:64713 -> 10.0.2.2:5674 - MassTransitRabbitMQTest): user 'test-rabbitmq' authenticated and granted access to vhost '/'
2022-06-08 12:00:02.698030+00:00 [notice] <0.60.0> SIGTERM received - shutting down
2022-06-08 12:00:02.698030+00:00 [notice] <0.60.0> 
2022-06-08 12:00:02.706224+00:00 [warning] <0.529.0> HTTP listener registry could not find context rabbitmq_prometheus_tls
2022-06-08 12:00:02.712507+00:00 [warning] <0.529.0> HTTP listener registry could not find context rabbitmq_management_tls
2022-06-08 12:00:02.726118+00:00 [info] <0.221.0> Peer discovery backend rabbit_peer_discovery_classic_config does not support registration, skipping unregistration.
2022-06-08 12:00:02.728353+00:00 [info] <0.623.0> stopped TCP listener on [::]:5672
2022-06-08 12:00:02.736353+00:00 [error] <0.627.0> Error on AMQP connection <0.627.0> (10.0.21.23:64713 -> 10.0.2.2:5674 - MassTransitRabbitMQTest, vhost: '/', user: 'test-rabbitmq', state: running), channel 0:
2022-06-08 12:00:02.736353+00:00 [error] <0.627.0>  operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
2022-06-08 12:00:02.742591+00:00 [info] <0.658.0> Closing all connections in vhost '/' on node 'rabbit@rabbitmq-docker-server' because the vhost is stopping
2022-06-08 12:00:02.763251+00:00 [info] <0.436.0> Stopping message store for directory '/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-docker-server/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/msg_store_persistent'
2022-06-08 12:00:02.855836+00:00 [info] <0.436.0> Message store for directory '/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-docker-server/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/msg_store_persistent' is stopped
2022-06-08 12:00:02.856278+00:00 [info] <0.432.0> Stopping message store for directory '/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-docker-server/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/msg_store_transient'
2022-06-08 12:00:02.955644+00:00 [info] <0.432.0> Message store for directory '/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-docker-server/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/msg_store_transient' is stopped

I couldn't get any useful log output form the client, the only warnings that pop up are the same that gets logged when everything works fine (that is RabbitMQ being unavailable for some time but the application can reconnect afterwards without issues):

warn: MassTransit[0]
      Connection Failed: rabbitmq://3-10-rabbitmq.dahag-test-apps.de:5674/
      RabbitMQ.Client.Exceptions.BrokerUnreachableException: None of the specified endpoints were reachable
       ---> System.IO.IOException: connection.start was never received, likely due to a network timeout
         at RabbitMQ.Client.Framing.Impl.Connection.StartAndTune()
         at RabbitMQ.Client.Framing.Impl.Connection.Open(Boolean insist)
         at RabbitMQ.Client.Framing.Impl.Connection..ctor(IConnectionFactory factory, Boolean insist, IFrameHandler frameHandler, String clientProvidedName)
         at RabbitMQ.Client.Framing.Impl.Connection..ctor(IConnectionFactory factory, Boolean insist, IFrameHandler frameHandler, ArrayPool`1 memoryPool, String clientProvidedName)
         at RabbitMQ.Client.Framing.Impl.ProtocolBase.CreateConnection(IConnectionFactory factory, Boolean insist, IFrameHandler frameHandler, ArrayPool`1 memoryPool, String clientProvidedName)
         at RabbitMQ.Client.ConnectionFactory.CreateConnection(IEndpointResolver endpointResolver, String clientProvidedName)
         --- End of inner exception stack trace ---
         at RabbitMQ.Client.ConnectionFactory.CreateConnection(IEndpointResolver endpointResolver, String clientProvidedName)
         at RabbitMQ.Client.ConnectionFactory.CreateConnection(IList`1 hostnames, String clientProvidedName)
         at MassTransit.RabbitMqTransport.ConnectionContextFactory.CreateConnection(ISupervisor supervisor) in /_/src/Transports/MassTransit.RabbitMqTransport/RabbitMqTransport/ConnectionContextFactory.cs:line 86
warn: MassTransit[0]
      Connection Failed: rabbitmq://3-10-rabbitmq.dahag-test-apps.de:5674/
      RabbitMQ.Client.Exceptions.BrokerUnreachableException: None of the specified endpoints were reachable
       ---> System.IO.IOException: connection.start was never received, likely due to a network timeout
         at RabbitMQ.Client.Framing.Impl.Connection.StartAndTune()
         at RabbitMQ.Client.Framing.Impl.Connection.Open(Boolean insist)
         at RabbitMQ.Client.Framing.Impl.Connection..ctor(IConnectionFactory factory, Boolean insist, IFrameHandler frameHandler, String clientProvidedName)
         at RabbitMQ.Client.Framing.Impl.Connection..ctor(IConnectionFactory factory, Boolean insist, IFrameHandler frameHandler, ArrayPool`1 memoryPool, String clientProvidedName)
         at RabbitMQ.Client.Framing.Impl.ProtocolBase.CreateConnection(IConnectionFactory factory, Boolean insist, IFrameHandler frameHandler, ArrayPool`1 memoryPool, String clientProvidedName)
         at RabbitMQ.Client.ConnectionFactory.CreateConnection(IEndpointResolver endpointResolver, String clientProvidedName)
         --- End of inner exception stack trace ---
         at RabbitMQ.Client.ConnectionFactory.CreateConnection(IEndpointResolver endpointResolver, String clientProvidedName)
         at RabbitMQ.Client.ConnectionFactory.CreateConnection(IList`1 hostnames, String clientProvidedName)
         at MassTransit.RabbitMqTransport.ConnectionContextFactory.CreateConnection(ISupervisor supervisor) in /_/src/Transports/MassTransit.RabbitMqTransport/RabbitMqTransport/ConnectionContextFactory.cs:line 86

Here is the full client project code I used trying to reproduce the issue (minus changed host/credentials to the other rabbit node) that still can trigger it (though unfortunately not guarenteed):

<Project Sdk="Microsoft.NET.Sdk.Web">

  <PropertyGroup>
    <TargetFramework>net6.0</TargetFramework>
    <Nullable>enable</Nullable>
    <ImplicitUsings>enable</ImplicitUsings>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="MassTransit.RabbitMQ" Version="8.0.3" />
    <PackageReference Include="RabbitMQ.Client" Version="6.3.0" />
  </ItemGroup>

</Project>

using MassTransit;
using MassTransitRabbitMQTest;

var builder = WebApplication.CreateBuilder(args);

const string username = "guest";
const string password = "guest";
const string host = "127.0.0.1";
const ushort port = 9090;

builder.Services.AddMassTransit(busConfigurator =>
{
    busConfigurator.UsingRabbitMq((context, rabbitMqConfigurator) =>
    {
        rabbitMqConfigurator.Host(host, port, "/", hostCfg =>
        {
            hostCfg.Username(username);
            hostCfg.Password(password);
        });

        rabbitMqConfigurator.ConfigureEndpoints(context);
    });

    busConfigurator.AddConsumer<WeatherForecastConsumer>()
        .Endpoint(endpoint => endpoint.Name = typeof(WeatherForecastConsumer).FullName!);
});

builder.Services.AddHostedService<PublishWeatherForecast>();

// Add services to the container.

var app = builder.Build();

// Configure the HTTP request pipeline.

app.UseHttpsRedirection();

app.Run();

using MassTransit;

namespace MassTransitRabbitMQTest;

internal record WeatherForecast(DateTime Date, int TemperatureC, string? Summary)
{
    public int TemperatureF => 32 + (int)(TemperatureC / 0.5556);
}

internal class WeatherForecastConsumer : IConsumer<WeatherForecast>
{
    public Task Consume(ConsumeContext<WeatherForecast> context)
    {
        Console.WriteLine($"CONSUMING {context.Message}");
        return Task.CompletedTask;
    }
}

internal class PublishWeatherForecast : BackgroundService
{
    private IServiceProvider ServiceProvider { get; }

    public PublishWeatherForecast(IServiceProvider serviceProvider)
    {
        ServiceProvider = serviceProvider;
    }
    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        var i = 0;
        while (!stoppingToken.IsCancellationRequested)
        {
            await Task.Delay(TimeSpan.FromSeconds(1), stoppingToken);
            var weatherForecast = new WeatherForecast
            (
                DateTime.Now.AddDays(i), Random.Shared.Next(-20, 55), Guid.NewGuid().ToString()
            );
            Console.WriteLine($"PUBLISHING {weatherForecast}");
            await ServiceProvider.CreateScope().ServiceProvider.GetRequiredService<IPublishEndpoint>().Publish(weatherForecast, stoppingToken);
            i++;
        }
    }
}

Any help/solution would be greatly appreciated. I'm still trying to further isolate/pinpoint the issue, but have the feeling that I'm going to wireshark network traffic to get closer to what the root issue is...

lukebakken · 2022-06-08T13:27:57Z

lukebakken
Jun 8, 2022
Maintainer

Does this look like the contents of any of the messages you are sending via RabbitMQ?

1 400 Bad Request
Content-Type: text/plain; charset=utf-8
Connection: close

18 replies

ChristianSteu Jun 10, 2022
Author

So here I am with more logs and they brought me to a new working hypothesis on what is going on (and figured out some things I probably could have noticed earlier).

From my wireshark I don't see any AMQP protocol itself, but basically only TCP with one exception: the package that went awry:

My wireshark thinks this looks like a HTTP package, and from the content it certainly does. The 0x502F312E you noticed is 1345270062 is in fact P/1. in UTF8, and from Wireshark I see the message starting with HTTP/1.1 400 Bad Request. So it sounds like there is a loose/unexpected HTTP package that trips the RabbitMQ.Client - that would also explain why the message that was written inside the array started with 1 400 Bad Request, because the first few bytes (containing HTTP/1.) where detected as part of the header (including that strange payload size)...

What I don't know yet where the loose package comes from. Not sure if it is Traefik or something during the start of the new rabbit-container, though my money would be on Traefik. (Which also would explain why I cannot reproduce it locally.) I think it is still good RabbitMQ.Client is resilient against such a loose package, but I think that is a problem you don't need to follow through.

Full wireshark traffic captured by the client: rabbit-tcp-full.zip Container stopping initiated packages 141+

A full log of the first container:

2022-06-10 07:52:52.467047+00:00 [info] <0.221.0> Running boot step rabbit_exchange_type_direct defined by app rabbit
2022-06-10 07:52:52.467197+00:00 [info] <0.221.0> Running boot step rabbit_exchange_type_fanout defined by app rabbit
2022-06-10 07:52:52.467357+00:00 [info] <0.221.0> Running boot step rabbit_exchange_type_headers defined by app rabbit
2022-06-10 07:52:52.467497+00:00 [info] <0.221.0> Running boot step rabbit_exchange_type_topic defined by app rabbit
2022-06-10 07:52:52.467623+00:00 [info] <0.221.0> Running boot step rabbit_mirror_queue_mode_all defined by app rabbit
2022-06-10 07:52:52.467761+00:00 [info] <0.221.0> Running boot step rabbit_mirror_queue_mode_exactly defined by app rabbit
2022-06-10 07:52:52.467864+00:00 [info] <0.221.0> Running boot step rabbit_mirror_queue_mode_nodes defined by app rabbit
2022-06-10 07:52:52.467991+00:00 [info] <0.221.0> Running boot step rabbit_priority_queue defined by app rabbit
2022-06-10 07:52:52.468094+00:00 [info] <0.221.0> Priority queues enabled, real BQ is rabbit_variable_queue
2022-06-10 07:52:52.468274+00:00 [info] <0.221.0> Running boot step rabbit_queue_location_client_local defined by app rabbit
2022-06-10 07:52:52.468450+00:00 [info] <0.221.0> Running boot step rabbit_queue_location_min_masters defined by app rabbit
2022-06-10 07:52:52.468571+00:00 [info] <0.221.0> Running boot step kernel_ready defined by app rabbit
2022-06-10 07:52:52.468672+00:00 [info] <0.221.0> Running boot step rabbit_sysmon_minder defined by app rabbit
2022-06-10 07:52:52.468927+00:00 [info] <0.221.0> Running boot step rabbit_epmd_monitor defined by app rabbit
2022-06-10 07:52:52.470078+00:00 [info] <0.384.0> epmd monitor knows us, inter-node communication (distribution) port: 25672
2022-06-10 07:52:52.470283+00:00 [info] <0.221.0> Running boot step guid_generator defined by app rabbit
2022-06-10 07:52:52.498135+00:00 [info] <0.221.0> Running boot step rabbit_node_monitor defined by app rabbit
2022-06-10 07:52:52.498506+00:00 [info] <0.388.0> Starting rabbit_node_monitor
2022-06-10 07:52:52.498885+00:00 [info] <0.221.0> Running boot step delegate_sup defined by app rabbit
2022-06-10 07:52:52.500810+00:00 [info] <0.221.0> Running boot step rabbit_memory_monitor defined by app rabbit
2022-06-10 07:52:52.501269+00:00 [info] <0.221.0> Running boot step rabbit_fifo_dlx_sup defined by app rabbit
2022-06-10 07:52:52.501458+00:00 [info] <0.221.0> Running boot step core_initialized defined by app rabbit
2022-06-10 07:52:52.501546+00:00 [info] <0.221.0> Running boot step upgrade_queues defined by app rabbit
2022-06-10 07:52:52.517109+00:00 [info] <0.221.0> Running boot step channel_tracking defined by app rabbit
2022-06-10 07:52:52.517537+00:00 [info] <0.221.0> Setting up a table for channel tracking on this node: 'tracked_channel_on_node_rabbit@rabbitmq-docker-server'
2022-06-10 07:52:52.518005+00:00 [info] <0.221.0> Setting up a table for channel tracking on this node: 'tracked_channel_table_per_user_on_node_rabbit@rabbitmq-docker-server'
2022-06-10 07:52:52.518336+00:00 [info] <0.221.0> Running boot step rabbit_channel_tracking_handler defined by app rabbit
2022-06-10 07:52:52.518515+00:00 [info] <0.221.0> Running boot step connection_tracking defined by app rabbit
2022-06-10 07:52:52.519077+00:00 [info] <0.221.0> Setting up a table for connection tracking on this node: 'tracked_connection_on_node_rabbit@rabbitmq-docker-server'
2022-06-10 07:52:52.519523+00:00 [info] <0.221.0> Setting up a table for per-vhost connection counting on this node: 'tracked_connection_per_vhost_on_node_rabbit@rabbitmq-docker-server'
2022-06-10 07:52:52.519909+00:00 [info] <0.221.0> Setting up a table for per-user connection counting on this node: 'tracked_connection_table_per_user_on_node_rabbit@rabbitmq-docker-server'
2022-06-10 07:52:52.520430+00:00 [info] <0.221.0> Running boot step rabbit_connection_tracking_handler defined by app rabbit
2022-06-10 07:52:52.520618+00:00 [info] <0.221.0> Running boot step rabbit_definitions_hashing defined by app rabbit
2022-06-10 07:52:52.520960+00:00 [info] <0.221.0> Running boot step rabbit_exchange_parameters defined by app rabbit
2022-06-10 07:52:52.521251+00:00 [info] <0.221.0> Running boot step rabbit_mirror_queue_misc defined by app rabbit
2022-06-10 07:52:52.522016+00:00 [info] <0.221.0> Running boot step rabbit_policies defined by app rabbit
2022-06-10 07:52:52.522745+00:00 [info] <0.221.0> Running boot step rabbit_policy defined by app rabbit
2022-06-10 07:52:52.522891+00:00 [info] <0.221.0> Running boot step rabbit_queue_location_validator defined by app rabbit
2022-06-10 07:52:52.523015+00:00 [info] <0.221.0> Running boot step rabbit_quorum_memory_manager defined by app rabbit
2022-06-10 07:52:52.523285+00:00 [info] <0.221.0> Running boot step rabbit_stream_coordinator defined by app rabbit
2022-06-10 07:52:52.523651+00:00 [info] <0.221.0> Running boot step rabbit_vhost_limit defined by app rabbit
2022-06-10 07:52:52.523794+00:00 [info] <0.221.0> Running boot step rabbit_mgmt_reset_handler defined by app rabbitmq_management
2022-06-10 07:52:52.523930+00:00 [info] <0.221.0> Running boot step rabbit_mgmt_db_handler defined by app rabbitmq_management_agent
2022-06-10 07:52:52.524030+00:00 [info] <0.221.0> Management plugin: using rates mode 'basic'
2022-06-10 07:52:52.524907+00:00 [info] <0.221.0> Running boot step recovery defined by app rabbit
2022-06-10 07:52:52.527874+00:00 [info] <0.427.0> Making sure data directory '/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-docker-server/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L' for vhost '/' exists
2022-06-10 07:52:52.582811+00:00 [info] <0.427.0> Starting message stores for vhost '/'
2022-06-10 07:52:52.583322+00:00 [info] <0.432.0> Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_transient": using rabbit_msg_store_ets_index to provide index
2022-06-10 07:52:52.633674+00:00 [info] <0.427.0> Started message store of type transient for vhost '/'
2022-06-10 07:52:52.634022+00:00 [info] <0.436.0> Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_persistent": using rabbit_msg_store_ets_index to provide index
2022-06-10 07:52:52.670087+00:00 [info] <0.427.0> Started message store of type persistent for vhost '/'
2022-06-10 07:52:52.719393+00:00 [info] <0.427.0> Recovering 2 queues of type rabbit_classic_queue took 165ms
2022-06-10 07:52:52.719532+00:00 [info] <0.427.0> Recovering 0 queues of type rabbit_quorum_queue took 0ms
2022-06-10 07:52:52.719601+00:00 [info] <0.427.0> Recovering 0 queues of type rabbit_stream_queue took 0ms
2022-06-10 07:52:52.724794+00:00 [info] <0.221.0> Running boot step empty_db_check defined by app rabbit
2022-06-10 07:52:52.724944+00:00 [info] <0.221.0> Will not seed default virtual host and user: have definitions to load...
2022-06-10 07:52:52.724999+00:00 [info] <0.221.0> Running boot step rabbit_looking_glass defined by app rabbit
2022-06-10 07:52:52.725087+00:00 [info] <0.221.0> Running boot step rabbit_core_metrics_gc defined by app rabbit
2022-06-10 07:52:52.725272+00:00 [info] <0.221.0> Running boot step background_gc defined by app rabbit
2022-06-10 07:52:52.725474+00:00 [info] <0.221.0> Running boot step routing_ready defined by app rabbit
2022-06-10 07:52:52.725615+00:00 [info] <0.221.0> Running boot step pre_flight defined by app rabbit
2022-06-10 07:52:52.725710+00:00 [info] <0.221.0> Running boot step notify_cluster defined by app rabbit
2022-06-10 07:52:52.725775+00:00 [info] <0.221.0> Running boot step networking defined by app rabbit
2022-06-10 07:52:52.725945+00:00 [info] <0.221.0> Running boot step rabbit_maintenance_mode_state defined by app rabbit
2022-06-10 07:52:52.726028+00:00 [info] <0.221.0> Creating table rabbit_node_maintenance_states for feature flag `maintenance_mode_status`
2022-06-10 07:52:52.726244+00:00 [info] <0.221.0> Running boot step definition_import_worker_pool defined by app rabbit
2022-06-10 07:52:52.726415+00:00 [info] <0.330.0> Starting worker pool 'definition_import_pool' with 4 processes in it
2022-06-10 07:52:52.727344+00:00 [info] <0.221.0> Running boot step cluster_name defined by app rabbit
2022-06-10 07:52:52.727438+00:00 [info] <0.221.0> Running boot step direct_client defined by app rabbit
2022-06-10 07:52:52.727689+00:00 [info] <0.221.0> Running boot step rabbit_management_load_definitions defined by app rabbitmq_management
2022-06-10 07:52:52.737104+00:00 [info] <0.478.0> Resetting node maintenance status
2022-06-10 07:52:52.753774+00:00 [info] <0.537.0> Management plugin: HTTP (non-TLS) listener started on port 15672
2022-06-10 07:52:52.753900+00:00 [info] <0.565.0> Statistics database started.
2022-06-10 07:52:52.754054+00:00 [info] <0.564.0> Starting worker pool 'management_worker_pool' with 3 processes in it
2022-06-10 07:52:52.763347+00:00 [info] <0.579.0> Prometheus metrics: HTTP (non-TLS) listener started on port 15692
2022-06-10 07:52:52.763570+00:00 [info] <0.478.0> Ready to start client connection listeners
2022-06-10 07:52:52.765447+00:00 [info] <0.623.0> started TCP listener on [::]:5672
 completed with 4 plugins.
2022-06-10 07:52:52.917617+00:00 [info] <0.478.0> Server startup complete; 4 plugins started.
2022-06-10 07:52:52.917617+00:00 [info] <0.478.0>  * rabbitmq_prometheus
2022-06-10 07:52:52.917617+00:00 [info] <0.478.0>  * rabbitmq_management
2022-06-10 07:52:52.917617+00:00 [info] <0.478.0>  * rabbitmq_web_dispatch
2022-06-10 07:52:52.917617+00:00 [info] <0.478.0>  * rabbitmq_management_agent
2022-06-10 08:01:10.409934+00:00 [info] <0.788.0> accepting AMQP connection <0.788.0> (10.0.21.23:49619 -> 10.0.2.2:5674)
2022-06-10 08:01:10.437499+00:00 [info] <0.788.0> connection <0.788.0> (10.0.21.23:49619 -> 10.0.2.2:5674) has a client-provided name: MassTransitRabbitMQTest
2022-06-10 08:01:10.444390+00:00 [info] <0.788.0> connection <0.788.0> (10.0.21.23:49619 -> 10.0.2.2:5674 - MassTransitRabbitMQTest): user 'test-rabbitmq' authenticated and granted access to vhost '/'
2022-06-10 08:02:20.105634+00:00 [notice] <0.60.0> SIGTERM received - shutting down
2022-06-10 08:02:20.105634+00:00 [notice] <0.60.0> 
2022-06-10 08:02:20.124174+00:00 [warning] <0.529.0> HTTP listener registry could not find context rabbitmq_prometheus_tls
2022-06-10 08:02:20.131671+00:00 [warning] <0.529.0> HTTP listener registry could not find context rabbitmq_management_tls
2022-06-10 08:02:20.143744+00:00 [info] <0.221.0> Peer discovery backend rabbit_peer_discovery_classic_config does not support registration, skipping unregistration.
2022-06-10 08:02:20.144029+00:00 [info] <0.623.0> stopped TCP listener on [::]:5672
2022-06-10 08:02:20.145146+00:00 [error] <0.788.0> Error on AMQP connection <0.788.0> (10.0.21.23:49619 -> 10.0.2.2:5674 - MassTransitRabbitMQTest, vhost: '/', user: 'test-rabbitmq', state: running), channel 0:
2022-06-10 08:02:20.145146+00:00 [error] <0.788.0>  operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
2022-06-10 08:02:20.147392+00:00 [info] <0.835.0> Closing all connections in vhost '/' on node 'rabbit@rabbitmq-docker-server' because the vhost is stopping
2022-06-10 08:02:20.169956+00:00 [info] <0.436.0> Stopping message store for directory '/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-docker-server/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/msg_store_persistent'
2022-06-10 08:02:20.292011+00:00 [info] <0.436.0> Message store for directory '/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-docker-server/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/msg_store_persistent' is stopped
2022-06-10 08:02:20.292600+00:00 [info] <0.432.0> Stopping message store for directory '/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-docker-server/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/msg_store_transient'
2022-06-10 08:02:20.423471+00:00 [info] <0.432.0> Message store for directory '/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-docker-server/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/msg_store_transient' is stopped
2022-06-10 08:02:20.478965+00:00 [info] <0.380.0> Management plugin: to stop collect_statistics.

A full log of the second container:

2022-06-10 08:02:44.897806+00:00 [info] <0.221.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2022-06-10 08:02:44.907447+00:00 [info] <0.221.0> Successfully synced tables from a peer
2022-06-10 08:02:44.907594+00:00 [info] <0.221.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2022-06-10 08:02:44.908082+00:00 [info] <0.221.0> Successfully synced tables from a peer
2022-06-10 08:02:44.993361+00:00 [info] <0.221.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2022-06-10 08:02:44.993588+00:00 [info] <0.221.0> Successfully synced tables from a peer
2022-06-10 08:02:44.993723+00:00 [info] <0.221.0> Peer discovery backend rabbit_peer_discovery_classic_config does not support registration, skipping registration.
2022-06-10 08:02:44.993875+00:00 [info] <0.221.0> Running boot step database_sync defined by app rabbit
2022-06-10 08:02:44.994100+00:00 [info] <0.221.0> Running boot step feature_flags defined by app rabbit
2022-06-10 08:02:44.994314+00:00 [info] <0.221.0> Running boot step codec_correctness_check defined by app rabbit
2022-06-10 08:02:44.994382+00:00 [info] <0.221.0> Running boot step external_infrastructure defined by app rabbit
2022-06-10 08:02:44.994447+00:00 [info] <0.221.0> Running boot step rabbit_registry defined by app rabbit
2022-06-10 08:02:44.994608+00:00 [info] <0.221.0> Running boot step rabbit_auth_mechanism_cr_demo defined by app rabbit
2022-06-10 08:02:44.994786+00:00 [info] <0.221.0> Running boot step rabbit_queue_location_random defined by app rabbit
2022-06-10 08:02:44.994926+00:00 [info] <0.221.0> Running boot step rabbit_event defined by app rabbit
2022-06-10 08:02:44.995205+00:00 [info] <0.221.0> Running boot step rabbit_auth_mechanism_amqplain defined by app rabbit
2022-06-10 08:02:44.995405+00:00 [info] <0.221.0> Running boot step rabbit_auth_mechanism_plain defined by app rabbit
2022-06-10 08:02:44.995532+00:00 [info] <0.221.0> Running boot step rabbit_exchange_type_direct defined by app rabbit
2022-06-10 08:02:44.995640+00:00 [info] <0.221.0> Running boot step rabbit_exchange_type_fanout defined by app rabbit
2022-06-10 08:02:44.995741+00:00 [info] <0.221.0> Running boot step rabbit_exchange_type_headers defined by app rabbit
2022-06-10 08:02:44.995873+00:00 [info] <0.221.0> Running boot step rabbit_exchange_type_topic defined by app rabbit
2022-06-10 08:02:44.995958+00:00 [info] <0.221.0> Running boot step rabbit_mirror_queue_mode_all defined by app rabbit
2022-06-10 08:02:44.996046+00:00 [info] <0.221.0> Running boot step rabbit_mirror_queue_mode_exactly defined by app rabbit
2022-06-10 08:02:44.996137+00:00 [info] <0.221.0> Running boot step rabbit_mirror_queue_mode_nodes defined by app rabbit
2022-06-10 08:02:44.996227+00:00 [info] <0.221.0> Running boot step rabbit_priority_queue defined by app rabbit
2022-06-10 08:02:44.996299+00:00 [info] <0.221.0> Priority queues enabled, real BQ is rabbit_variable_queue
2022-06-10 08:02:44.996409+00:00 [info] <0.221.0> Running boot step rabbit_queue_location_client_local defined by app rabbit
2022-06-10 08:02:44.996563+00:00 [info] <0.221.0> Running boot step rabbit_queue_location_min_masters defined by app rabbit
2022-06-10 08:02:44.996666+00:00 [info] <0.221.0> Running boot step kernel_ready defined by app rabbit
2022-06-10 08:02:44.996728+00:00 [info] <0.221.0> Running boot step rabbit_sysmon_minder defined by app rabbit
2022-06-10 08:02:44.997048+00:00 [info] <0.221.0> Running boot step rabbit_epmd_monitor defined by app rabbit
2022-06-10 08:02:44.998147+00:00 [info] <0.384.0> epmd monitor knows us, inter-node communication (distribution) port: 25672
2022-06-10 08:02:44.998367+00:00 [info] <0.221.0> Running boot step guid_generator defined by app rabbit
2022-06-10 08:02:45.043341+00:00 [info] <0.221.0> Running boot step rabbit_node_monitor defined by app rabbit
2022-06-10 08:02:45.044115+00:00 [info] <0.388.0> Starting rabbit_node_monitor
2022-06-10 08:02:45.044574+00:00 [info] <0.221.0> Running boot step delegate_sup defined by app rabbit
2022-06-10 08:02:45.046653+00:00 [info] <0.221.0> Running boot step rabbit_memory_monitor defined by app rabbit
2022-06-10 08:02:45.047198+00:00 [info] <0.221.0> Running boot step rabbit_fifo_dlx_sup defined by app rabbit
2022-06-10 08:02:45.047377+00:00 [info] <0.221.0> Running boot step core_initialized defined by app rabbit
2022-06-10 08:02:45.047583+00:00 [info] <0.221.0> Running boot step upgrade_queues defined by app rabbit
2022-06-10 08:02:45.062702+00:00 [info] <0.221.0> Running boot step channel_tracking defined by app rabbit
2022-06-10 08:02:45.063092+00:00 [info] <0.221.0> Setting up a table for channel tracking on this node: 'tracked_channel_on_node_rabbit@rabbitmq-docker-server'
2022-06-10 08:02:45.063451+00:00 [info] <0.221.0> Setting up a table for channel tracking on this node: 'tracked_channel_table_per_user_on_node_rabbit@rabbitmq-docker-server'
2022-06-10 08:02:45.063722+00:00 [info] <0.221.0> Running boot step rabbit_channel_tracking_handler defined by app rabbit
2022-06-10 08:02:45.063825+00:00 [info] <0.221.0> Running boot step connection_tracking defined by app rabbit
2022-06-10 08:02:45.064111+00:00 [info] <0.221.0> Setting up a table for connection tracking on this node: 'tracked_connection_on_node_rabbit@rabbitmq-docker-server'
2022-06-10 08:02:45.064424+00:00 [info] <0.221.0> Setting up a table for per-vhost connection counting on this node: 'tracked_connection_per_vhost_on_node_rabbit@rabbitmq-docker-server'
2022-06-10 08:02:45.064666+00:00 [info] <0.221.0> Setting up a table for per-user connection counting on this node: 'tracked_connection_table_per_user_on_node_rabbit@rabbitmq-docker-server'
2022-06-10 08:02:45.065089+00:00 [info] <0.221.0> Running boot step rabbit_connection_tracking_handler defined by app rabbit
2022-06-10 08:02:45.065209+00:00 [info] <0.221.0> Running boot step rabbit_definitions_hashing defined by app rabbit
2022-06-10 08:02:45.065320+00:00 [info] <0.221.0> Running boot step rabbit_exchange_parameters defined by app rabbit
2022-06-10 08:02:45.065472+00:00 [info] <0.221.0> Running boot step rabbit_mirror_queue_misc defined by app rabbit
2022-06-10 08:02:45.065724+00:00 [info] <0.221.0> Running boot step rabbit_policies defined by app rabbit
2022-06-10 08:02:45.066289+00:00 [info] <0.221.0> Running boot step rabbit_policy defined by app rabbit
2022-06-10 08:02:45.066398+00:00 [info] <0.221.0> Running boot step rabbit_queue_location_validator defined by app rabbit
2022-06-10 08:02:45.066519+00:00 [info] <0.221.0> Running boot step rabbit_quorum_memory_manager defined by app rabbit
2022-06-10 08:02:45.066603+00:00 [info] <0.221.0> Running boot step rabbit_stream_coordinator defined by app rabbit
2022-06-10 08:02:45.066933+00:00 [info] <0.221.0> Running boot step rabbit_vhost_limit defined by app rabbit
2022-06-10 08:02:45.067343+00:00 [info] <0.221.0> Running boot step rabbit_mgmt_reset_handler defined by app rabbitmq_management
2022-06-10 08:02:45.067489+00:00 [info] <0.221.0> Running boot step rabbit_mgmt_db_handler defined by app rabbitmq_management_agent
2022-06-10 08:02:45.067685+00:00 [info] <0.221.0> Management plugin: using rates mode 'basic'
2022-06-10 08:02:45.068562+00:00 [info] <0.221.0> Running boot step recovery defined by app rabbit
2022-06-10 08:02:45.072840+00:00 [info] <0.427.0> Making sure data directory '/var/lib/rabbitmq/mnesia/rabbit@rabbitmq-docker-server/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L' for vhost '/' exists
2022-06-10 08:02:45.165461+00:00 [info] <0.427.0> Starting message stores for vhost '/'
2022-06-10 08:02:45.166313+00:00 [info] <0.432.0> Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_transient": using rabbit_msg_store_ets_index to provide index
2022-06-10 08:02:45.241234+00:00 [info] <0.427.0> Started message store of type transient for vhost '/'
2022-06-10 08:02:45.241788+00:00 [info] <0.436.0> Message store "628WB79CIFDYO9LJI6DKMI09L/msg_store_persistent": using rabbit_msg_store_ets_index to provide index
2022-06-10 08:02:45.300529+00:00 [info] <0.427.0> Started message store of type persistent for vhost '/'
2022-06-10 08:02:45.387448+00:00 [info] <0.427.0> Recovering 2 queues of type rabbit_classic_queue took 278ms
2022-06-10 08:02:45.387613+00:00 [info] <0.427.0> Recovering 0 queues of type rabbit_quorum_queue took 0ms
2022-06-10 08:02:45.387671+00:00 [info] <0.427.0> Recovering 0 queues of type rabbit_stream_queue took 0ms
2022-06-10 08:02:45.392234+00:00 [info] <0.221.0> Running boot step empty_db_check defined by app rabbit
2022-06-10 08:02:45.392325+00:00 [info] <0.221.0> Will not seed default virtual host and user: have definitions to load...
2022-06-10 08:02:45.392389+00:00 [info] <0.221.0> Running boot step rabbit_looking_glass defined by app rabbit
2022-06-10 08:02:45.392442+00:00 [info] <0.221.0> Running boot step rabbit_core_metrics_gc defined by app rabbit
2022-06-10 08:02:45.392724+00:00 [info] <0.221.0> Running boot step background_gc defined by app rabbit
2022-06-10 08:02:45.392959+00:00 [info] <0.221.0> Running boot step routing_ready defined by app rabbit
2022-06-10 08:02:45.393064+00:00 [info] <0.221.0> Running boot step pre_flight defined by app rabbit
2022-06-10 08:02:45.393117+00:00 [info] <0.221.0> Running boot step notify_cluster defined by app rabbit
2022-06-10 08:02:45.393182+00:00 [info] <0.221.0> Running boot step networking defined by app rabbit
2022-06-10 08:02:45.393367+00:00 [info] <0.221.0> Running boot step rabbit_maintenance_mode_state defined by app rabbit
2022-06-10 08:02:45.393437+00:00 [info] <0.221.0> Creating table rabbit_node_maintenance_states for feature flag `maintenance_mode_status`
2022-06-10 08:02:45.393790+00:00 [info] <0.221.0> Running boot step definition_import_worker_pool defined by app rabbit
2022-06-10 08:02:45.393940+00:00 [info] <0.335.0> Starting worker pool 'definition_import_pool' with 4 processes in it
2022-06-10 08:02:45.394688+00:00 [info] <0.221.0> Running boot step cluster_name defined by app rabbit
2022-06-10 08:02:45.394773+00:00 [info] <0.221.0> Running boot step direct_client defined by app rabbit
2022-06-10 08:02:45.395083+00:00 [info] <0.221.0> Running boot step rabbit_management_load_definitions defined by app rabbitmq_management
2022-06-10 08:02:45.395351+00:00 [info] <0.478.0> Resetting node maintenance status
2022-06-10 08:02:45.436133+00:00 [info] <0.537.0> Management plugin: HTTP (non-TLS) listener started on port 15672
2022-06-10 08:02:45.436356+00:00 [info] <0.565.0> Statistics database started.
2022-06-10 08:02:45.436716+00:00 [info] <0.564.0> Starting worker pool 'management_worker_pool' with 3 processes in it
2022-06-10 08:02:45.453183+00:00 [info] <0.579.0> Prometheus metrics: HTTP (non-TLS) listener started on port 15692
2022-06-10 08:02:45.453507+00:00 [info] <0.478.0> Ready to start client connection listeners
2022-06-10 08:02:45.455829+00:00 [info] <0.623.0> started TCP listener on [::]:5672
 completed with 4 plugins.
2022-06-10 08:02:45.616359+00:00 [info] <0.478.0> Server startup complete; 4 plugins started.
2022-06-10 08:02:45.616359+00:00 [info] <0.478.0>  * rabbitmq_prometheus
2022-06-10 08:02:45.616359+00:00 [info] <0.478.0>  * rabbitmq_management
2022-06-10 08:02:45.616359+00:00 [info] <0.478.0>  * rabbitmq_web_dispatch
2022-06-10 08:02:45.616359+00:00 [info] <0.478.0>  * rabbitmq_management_agent

lukebakken Jun 10, 2022
Maintainer

From my wireshark I don't see any AMQP protocol itself, but basically only TCP with one exception: the package that went awry:

Right-click on a packet and choose "Decode As" from the menu. You can set port 5674 to decode as AMQP.

lukebakken Jun 10, 2022
Maintainer

Thank you for the full packet capture. I can see a typical AMQP connection negotiation in frames 1 through 10. I compared packet 4 and packet 151 and don't see any meaningful difference, so the fact that Wirkshark mis-decodes 151 does not indicate an issue.

You can see in packets 152 through 156 that the connection that was going to start using client port 49635 is shutdown cleanly. I also don't see anything in the RabbitMQ logs that corresponds to this connection shutting down early (it would be logged).

Then a new connection attempt is made using port 49636, and that is shut down. Then you see 49637, etc etc. Again, nothing in the RabbitMQ logs.

Your load balancer should be logging these connection attempts. If you can increase verbosity to see what the heck it's doing, that would be great. My guess is that the LB is causing this issue.

I should have a beta release of version 6.4.0 soon for you to try in your environment. Have a good weekend.

ChristianSteu Jun 10, 2022
Author

Have a good weekend as well and thanks for looking into :-)

lukebakken Jun 10, 2022
Maintainer

No rush, but if you get the chance to take a traffic capture on the client and on the RabbitMQ node at the same time it should show that RabbitMQ is not getting the reconnection attempts. At least, that's what I think is going on.

lukebakken · 2022-06-08T15:19:49Z

lukebakken
Jun 8, 2022
Maintainer

https://github.com/rabbitmq/rabbitmq-dotnet-client/blob/v6.3.0/projects/RabbitMQ.Client/client/impl/Frame.cs#L240-L247

Given the provided information, I'm not convinced this is a bug in this library unless it can be shown that a memory pool is rented by this client with an enormous size (in the above code).

I say this because it's more likely that some other part of your application is renting an extremely large pool and it just happens that it's also being used by this client to read a frame. The evidence for this is the content of the rented pool, which appears to be HTTP 400 Bad Request responses over-and-over.

6 replies

lukebakken Jun 9, 2022
Maintainer

Wow, that is surprising!

stebet Jun 10, 2022

I'm wondering if that content returned by MassTransit somehow makes the client read a frame header and what it thinks is the "frame-size" part of that data in that response makes it rent this huge array? If you take that response and treat is as a AMQP Frame, what does it look like?

ChristianSteu Jun 10, 2022
Author

From what I figured further it looks like Traefik produces a HTTP Bad Request response during container-start/failover. To me it looks like that RabbitMQ.Client tries interpreting it as an AMQP package resulting in the problem. Like that payloadSize (which is 0x502F312E in hex) corresponds to the P/1. part of the HTTP response. For further details you can read into the other thread of the discussion, but here is the problematic package itself:

0000   2c f0 5d 33 ca a5 3c ec ef 42 be 00 08 00 45 00
0010   00 8f 00 00 40 00 40 06 0f 51 0a 00 02 02 0a 00
0020   15 17 16 2a d7 7b 78 1e 6d 93 31 a9 ef 8f 50 18
0030   02 02 e8 79 00 00 48 54 54 50 2f 31 2e 31 20 34
0040   30 30 20 42 61 64 20 52 65 71 75 65 73 74 0d 0a
0050   43 6f 6e 74 65 6e 74 2d 54 79 70 65 3a 20 74 65
0060   78 74 2f 70 6c 61 69 6e 3b 20 63 68 61 72 73 65
0070   74 3d 75 74 66 2d 38 0d 0a 43 6f 6e 6e 65 63 74
0080   69 6f 6e 3a 20 63 6c 6f 73 65 0d 0a 0d 0a 34 30
0090   30 20 42 61 64 20 52 65 71 75 65 73 74

Tha part starting at bytes 0036 (so 48 54 54...) in UTF8 is

HTTP/1.1 400 Bad Request
Content-Type: text/plain; charset=utf-8
Connection: close

400 Bad Request

stebet Jun 10, 2022

What I'm wondering is why the RabbitMQ client even gets that far. Doesn't it go through the protocol handshakes again after disconnecting or is there some magic going on in Traefik?

lukebakken Jun 10, 2022
Maintainer

Yes, we see the AMQP 091 connection header sent over-and-over. At some point, though, the client does make it past that. You can see it in this capture..

#1213 (reply in thread)

...several incomplete reconnect attempts, then the one using port 49637 reads the invalid frame with the huge payload size.

I will dig into this further, but I agree with @ChristianSteu that this is due to the load balancer and the best we can do is be a bit more defensive in the code.

lukebakken · 2022-06-14T19:29:09Z

lukebakken
Jun 14, 2022
Maintainer

@ChristianSteu - please test out this version in your environment:

https://www.nuget.org/packages/RabbitMQ.Client/6.4.0-rc.1

Also, please see my comment here - #1213 (reply in thread)

Thanks!

3 replies

ChristianSteu Jun 15, 2022
Author

Yeah that is certainly much better. Unfortunately I currently see no way to configure the setting via MassTransit, but that is not your responsibility that the setting is/can be exposed via another library. It is still much better, because it doesn't get into the while true loop, so it can recover. And while the allocation still happens, it is not permanent, so the memory isn't lost.

I will see if I can get TCP dump of the RabbitMQ side, but I'm not familiar enough with this part of the stack to do it myself without investing too much time. I have asked a coworker if that is feasible, but I cannot promise I can provide one. But to me it is relatively safe that it is no problem with RabbitMQ, but with our Treafik setup. The same happens (for an increased time) if the health-check period is changed, so it seems that happens when the container is down and the new one isn't healthy yet.

If I'll get one I'll post it, but to me the issue is solved (at least from the side of RabbitMQ.Client), so thank you for your time :-)

lukebakken Jun 15, 2022
Maintainer

Great, I appreciate the follow up and will release 6.4.0 today.

lukebakken Jun 15, 2022
Maintainer

https://github.com/rabbitmq/rabbitmq-dotnet-client/releases/tag/v6.4.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange memory allocation after RabbitMQ-disconnect #1213

{{title}}

Replies: 3 comments 27 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Strange memory allocation after RabbitMQ-disconnect #1213

ChristianSteu Jun 8, 2022

Replies: 3 comments · 27 replies

lukebakken Jun 8, 2022 Maintainer

ChristianSteu Jun 10, 2022 Author

lukebakken Jun 10, 2022 Maintainer

lukebakken Jun 10, 2022 Maintainer

ChristianSteu Jun 10, 2022 Author

lukebakken Jun 10, 2022 Maintainer

lukebakken Jun 8, 2022 Maintainer

lukebakken Jun 9, 2022 Maintainer

stebet Jun 10, 2022

ChristianSteu Jun 10, 2022 Author

stebet Jun 10, 2022

lukebakken Jun 10, 2022 Maintainer

lukebakken Jun 14, 2022 Maintainer

ChristianSteu Jun 15, 2022 Author

lukebakken Jun 15, 2022 Maintainer

lukebakken Jun 15, 2022 Maintainer

ChristianSteu
Jun 8, 2022

Replies: 3 comments 27 replies

lukebakken
Jun 8, 2022
Maintainer

ChristianSteu Jun 10, 2022
Author

lukebakken Jun 10, 2022
Maintainer

lukebakken Jun 10, 2022
Maintainer

ChristianSteu Jun 10, 2022
Author

lukebakken Jun 10, 2022
Maintainer

lukebakken
Jun 8, 2022
Maintainer

lukebakken Jun 9, 2022
Maintainer

ChristianSteu Jun 10, 2022
Author

lukebakken Jun 10, 2022
Maintainer

lukebakken
Jun 14, 2022
Maintainer

ChristianSteu Jun 15, 2022
Author

lukebakken Jun 15, 2022
Maintainer

lukebakken Jun 15, 2022
Maintainer