Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ServiceBusProcessorClient stop consuming messages when maxConcurrentCalls > 1 and the network is down for a short period of time #33493

Closed
liukun-msft opened this issue Feb 14, 2023 · 9 comments · Fixed by #33593
Assignees
Labels
amqp Label for tracking issues related to AMQP Service Bus
Milestone

Comments

@liukun-msft
Copy link
Contributor

This issue comes from #26465 (comment)

Describe the bug

User encounter a problem that their ServiceBusProcessorClient stop consuming messages when they set maxConcurrentCalls to 10 and randomly cut the network connection for a short period of time (< 5 seconds). The threads will stop working and stay in the state "waiting" and never start working again. But if they set the maxConcurrentCalls to 1, everything is working well and the processing of messages never stops.

They suspected that the .block() calls in the ServiceBusReceivedMessageContext didn't have a timeout and due to this missing timeout the thread is eternally blocked, so they added the timeout options for complete()/abandon()/defer(). After local testing, they can see the thread recovered after the timeout expired and process messages again as normal.

Exception or Stack Trace

The thread dump state for complete() calls when they encounter this issue:

[email protected]/jdk.internal.misc.Unsafe.park(NativeMethod)
[email protected]/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
[email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885)
[email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1039)
[email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1345)
[email protected]/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:232)app//reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:87)app//reactor.core.publisher.Mono.block(Mono.java:1707)app//com.azure.messaging.servicebus.ServiceBusReceivedMessageContext.complete(ServiceBusReceivedMessageContext.java:81)

Code Snippet

new ServiceBusClientBuilder()
    .connectionString("connectionString")    
    .processor()
    .receiveMode(ServiceBusReceiveMode.PEEK_LOCK)
    .topicName("topicName") 
    .subscriptionName("subscriptionName")
    .prefetchCount(0)
    .maxConcurrentCalls(5)
    .maxAutoLockRenewDuration(Duration.ofMinutes(1L))    
    .disableAutoComplete()
    .processMessage(serviceBusMessageHandler::receiveMessage)    
    .processError(serviceBusMessageHandler::handleError);
public void receiveMessage(final ServiceBusReceivedMessageContext context){
    try    {
        final ServiceBusReceivedMessage message = context.getMessage();
        logDebug(() -> String.format("Message received: Header: %s", message.getApplicationProperties())); 
       final MessageStatus messageStatus = azureMessageProcessor.processMessage(message);        
    }   catch (Exception ex)    {
        LOGGER.error("CRITICAL: error in messaging processing!", ex);
    }
   context.complete();  
}
public void handleError(final ServiceBusErrorContext context) {
    if (LOGGER.isErrorEnabled())    {
        LOGGER.error(String.format("Exception '%s' occurred during '%s' action for entity '%s'", context.getException().getMessage(), context.getErrorSource(), context.getEntityPath()),context.getException());
    }}
@liukun-msft
Copy link
Contributor Author

liukun-msft commented Feb 14, 2023

There are two questions from the issue comments and I give my answers below:

Why ServiceBusProcessorClient stop consuming messages when maxConcurrentCalls=10 but work well when maxConcurrentCalls=1?

This could be an known issue we already addressed. Please check client-hangs-or-stalls-with-a-high-prefetch-and-maxconcurrentcall-value for the solution.

I assume you are running the processor client on a VM with less than or equal to 1 CPU core. In this case, your default thread pool size is 10, you can increase the thread pool size by adding vm option -Dreactor.schedulers.defaultBoundedElasticSize=20 (number larger than concurrent) to avoid this issue.

Can we add a timeout option for the method calls in ServiceBusReceivedMessageContext?

Instead of adding a timeout option for these calls, I think the changes would be better to provide async methods, i.e. Mono<Void> complete().

However, for some reasons, we didn't provide these async methods before. I don't know the original design, one reason I think maybe we want to prevent our users from using them incorrectly, for example, forgetting to add block(). Another important reason is that inside these methods, we already have the retry timeout to wait for the service response, which can be configured by RetryOptions. If we want to add an additional timeout for the compete()/abandon()/defer(), we need to investigate its side effects.

For now, we need to make sure if your issue root cause is the problem#1. Please try increasing the default thread pool size and let me know if this issue is still happening.

@TheDevOps
Copy link

TheDevOps commented Feb 14, 2023

Hi, thanks a lot for splitting the issue and commenting back!

I assume you are running the processor client on a VM with less than or equal to 1 CPU core

No in fact I can even reproduce this issue pretty reliably within a few tries locally on my machine with 4 real, 8 hyperthreading CPUs and a setting of prefetch=0 and concurrent=2. To me it really seems that any concurrent >1 has the chance of it happening, it just get's more frequent in some threads the higher the number is.
Our Openshift deployments are also running with prefetch=0 and concurrent=5 on a 1 CPU request, 2 CPU limit deployment and for "heavy" load subscriptions usually encounter the issue multiple times each day, we regularly see individual pods being down to 0-2 processing threads still active from the intended 5 by the end of the day before we currently automatically restart them during the night as a workaround. Here's for example how this currently looks like for one application that usually has around 5k/msg per minute, this is a small Kibana Dashboard we created where we just check how many unique threads show the expected log we always have on message arrival within the current timerange, it being the last 1 hour for this screen

image

As can be seen we see nine pods, while we actually have ten replicas, so one is already doing absolutely nothing anymore, out of the remaining nine there are seven still working on all 5 threads, 1 with only 3 threads and one with only 2 threads. After restarting them we are back to all ten running 5 threads each.
All of this combined generally makes me believe there is likely some very tight time window in the completion step during which the network disruption has to happen in order for the thread to hang, this would explain why we mainly observe it for high load clients that simply complete a lot of messages but it also rarely happens for these with smaller loads as long as they have more than 1 concurrent consumer, but so far I've not managed to perfectly debug where exactly this could be sadly. Neither do we know yet what could be causing such regular network interruptions...
I'll still test forcing -Dreactor.schedulers.defaultBoundedElasticSize=20 both locally and for some deployment and report back if it changes anything once I get some data about it!

Instead of adding a timeout option for these calls, I think the changes would be better to provide async methods

I'm also perfectly fine with this since by all intends for us it would achieve the same thing and allow us to call it with some timeout (currently we are considering the max lock renew time) so the threads would at least recover without the application needing a restart, even if I also feel this would basically just hide the underlying root issue

@liukun-msft
Copy link
Contributor Author

No in fact I can even reproduce this issue pretty reliably within a few tries locally on my machine with 4 real, 8 hyperthreading CPUs and a setting of prefetch=0 and concurrent=2.

Oh, I see. Let me try to repro your issue on my local to understand the root cause.

Can you provide me more information to help me to do the repro:

  • What is your service bus sdk version?
  • What is the average size of each message?
  • What is the average processing time for each message?
  • Are you using the AKS for deployment, does it deploy on the same region as the service bus?

For deployment, I think you are running the processor client on 10 pods and consuming messages from one queue. Please correct me if I am wrong. How about memory allocation for each pod?

@liukun-msft liukun-msft added this to the Backlog milestone Feb 14, 2023
@TheDevOps
Copy link

TheDevOps commented Feb 14, 2023

Hi again, sorry for the delay, I was trying to make an as small program as possible to reproduce it and finally got something working that is reasonable enough to post here now. Note that it might take a few attempts to disconnect before it happens it's not 100% for me at least.

As for versions the following pom.xml is used for version management

pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>at.test</groupId>
  <artifactId>azure-test3</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <name>Azure Test Application</name>
  <description>A test project to show issues with azure hanging threads</description>

  <properties>
      <!-- Basic Stuff -->
      <java.version>11</java.version>
      <compileSource>11</compileSource>
      <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
      <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
  </properties>

  <dependencies>
      <dependency>
          <groupId>com.azure</groupId>
          <artifactId>azure-core</artifactId>
          <version>1.36.0</version>
      </dependency>
      <dependency>
          <groupId>com.azure</groupId>
          <artifactId>azure-messaging-servicebus</artifactId>
          <version>7.13.1</version>
      </dependency>
      <dependency>
          <groupId>ch.qos.logback</groupId>
          <artifactId>logback-classic</artifactId>
          <version>1.2.11</version>
      </dependency>
  </dependencies>
  <build>
      <pluginManagement>
          <plugins>
              <plugin>
                  <groupId>org.apache.maven.plugins</groupId>
                  <artifactId>maven-compiler-plugin</artifactId>
                  <version>3.8.1</version>
                  <configuration>
                      <source>${java.version}</source>
                      <target>${java.version}</target>
                  </configuration>
              </plugin>
          </plugins>
      </pluginManagement>
  </build>
</project>

And the following main java class

AzureTestMain.java
package at.test;

import java.time.Duration;
import java.time.temporal.ChronoUnit;
import java.util.HashSet;
import java.util.Set;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import com.azure.messaging.servicebus.ServiceBusClientBuilder;
import com.azure.messaging.servicebus.ServiceBusErrorContext;
import com.azure.messaging.servicebus.ServiceBusMessage;
import com.azure.messaging.servicebus.ServiceBusMessageBatch;
import com.azure.messaging.servicebus.ServiceBusProcessorClient;
import com.azure.messaging.servicebus.ServiceBusReceivedMessageContext;
import com.azure.messaging.servicebus.ServiceBusSenderClient;
import com.azure.messaging.servicebus.models.CreateMessageBatchOptions;
import com.azure.messaging.servicebus.models.ServiceBusReceiveMode;

public class AzureTestMain
{
  // replace with valid url, also ensure topic and subscription exist as required
  private static final String CON_URL =
      "Endpoint=sb://xxx.servicebus.windows.net/;SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=xxx";
  private static final String TOPIC = "azure-test";
  private static final String SUBSCRIPTION = "TEST";
  private static final int MESSAGES = 100;

  private static final Logger LOGGER = LoggerFactory.getLogger(AzureTestMain.class);

  private static final Set<Integer> PROCESSED_MESSAGES = new HashSet<>();

  public static void main(String[] args) throws InterruptedException
  {
      // create the topic client
      // create a Service Bus Sender client for the topic and send messages, note the topic + subscription have to exist in advance
      ServiceBusSenderClient senderClient = new ServiceBusClientBuilder()
          .connectionString(CON_URL)
          .sender()
          .topicName(TOPIC)
          .buildClient();
      LOGGER.info("Going to send {} messages to the topic '{}'", MESSAGES, TOPIC);
      ServiceBusMessageBatch currentBatch = senderClient.createMessageBatch(
          new CreateMessageBatchOptions().setMaximumSizeInBytes(256 * 1024));
      int count = 0;
      for (int j = 1; j <= MESSAGES; j++)
      {
          // this examples assumes the topic already exists
          ServiceBusMessage message = new ServiceBusMessage("Some Body: " + j);
          message.getApplicationProperties().put("message-id", j);
          if (currentBatch.tryAddMessage(message))
          {
              count++;
              continue;
          }
          // The batch is full, so we create a new batch and send the batch.
          senderClient.sendMessages(currentBatch);
          LOGGER.info("finished sending message batch with size {}", count);
          count = 0;
          currentBatch =
              senderClient.createMessageBatch(new CreateMessageBatchOptions().setMaximumSizeInBytes(256 * 1024));

          // Add that message that we couldn't before.
          if (!currentBatch.tryAddMessage(message))
          {
              LOGGER.error("Message is too large for an empty batch. Skipping. Max size: {}. Message: {}",
                  currentBatch.getMaxSizeInBytes(), message.getBody());
          }
          else
          {
              count++;
          }

      }
      senderClient.sendMessages(currentBatch);
      LOGGER.info("finished sending message batch with size {}", count);
      senderClient.close();
      PROCESSED_MESSAGES.clear();

      // Start message processing with a defined number of concurrent calls
      ServiceBusClientBuilder.ServiceBusProcessorClientBuilder builder = new ServiceBusClientBuilder()
          .connectionString(CON_URL)
          .processor()
          .receiveMode(ServiceBusReceiveMode.PEEK_LOCK)
          .topicName(TOPIC)
          .subscriptionName(SUBSCRIPTION)
          .prefetchCount(0)
          .maxConcurrentCalls(10)
          .maxAutoLockRenewDuration(Duration.of(10L, ChronoUnit.MINUTES))
          .disableAutoComplete()
          .processMessage(AzureTestMain::receiveMessage)
          .processError(AzureTestMain::handleError);
      ServiceBusProcessorClient serviceBusProcessorClient = builder.buildProcessorClient();
      serviceBusProcessorClient.start();

      // Just wait for all messages to be received
      while (PROCESSED_MESSAGES.size() < MESSAGES)
      {
          LOGGER.info("Not all messages received yet only {}, sleeping for 10 seconds", PROCESSED_MESSAGES.size());
          Thread.sleep(10000);
      }
      LOGGER.info("All messages processed, completing");
      serviceBusProcessorClient.close();
      System.exit(0);
  }

  private static void receiveMessage(ServiceBusReceivedMessageContext paramServiceBusReceivedMessageContext)
  {
      int messageId =
          (int) paramServiceBusReceivedMessageContext.getMessage().getApplicationProperties().get("message-id");
      PROCESSED_MESSAGES.add(messageId);
      LOGGER.info("Got message {} in thread {}", messageId, Thread.currentThread().getName());

      paramServiceBusReceivedMessageContext.complete();
  }

  private static void handleError(ServiceBusErrorContext paramServiceBusErrorContext)
  {
      LOGGER.error("Error in message processing in thread {}", Thread.currentThread().getName(),
          paramServiceBusErrorContext.getException());
  }

}

With this running on java 11 right now - but I also previously had it for a java17 app - I can usually reproduce it after a few attempts by simply disconnecting my network during the message processing. The number of messages, concurrentConsumers, the final sleeptime and so on can be adjusted as needed.
Here's an example log from one of my executions that ended up encountering the issue

log.txt
2023-02-14 10:55:00.732+0100 07 [main] I c.a.m.s.i.ServiceBusConnectionProcessor - {"az.sdk.message":"Setting next AMQP channel.","entityPath":"N/A"}
2023-02-14 10:55:00.736+0100 07 [main] I      c.a.m.s.ServiceBusClientBuilder - # of open clients with shared connection: 1
2023-02-14 10:55:00.747+0100 07 [main] I                at.test.AzureTestMain - Going to send 100 messages to the topic 'azure-test'
2023-02-14 10:55:00.754+0100 07 [main] I          c.a.c.a.i.ReactorConnection - {"az.sdk.message":"Creating and starting connection.","connectionId":"MF_e8be93_1676368500676","hostName":"poi-retail-developer.servicebus.windows.net","port":5671}
2023-02-14 10:55:00.779+0100 07 [main] I            c.a.c.a.i.ReactorExecutor - {"az.sdk.message":"Starting reactor.","connectionId":"MF_e8be93_1676368500676"}
2023-02-14 10:55:00.804+0100 07 [reactor-executor-1] I  c.a.c.a.i.handler.ConnectionHandler - {"az.sdk.message":"onConnectionInit","connectionId":"MF_e8be93_1676368500676","hostName":"poi-retail-developer.servicebus.windows.net","namespace":"poi-retail-developer.servicebus.windows.net"}
2023-02-14 10:55:00.805+0100 07 [reactor-executor-1] I     c.a.c.a.i.handler.ReactorHandler - {"az.sdk.message":"reactor.onReactorInit","connectionId":"MF_e8be93_1676368500676"}
2023-02-14 10:55:00.805+0100 07 [reactor-executor-1] I  c.a.c.a.i.handler.ConnectionHandler - {"az.sdk.message":"onConnectionLocalOpen","connectionId":"MF_e8be93_1676368500676","errorCondition":null,"errorDescription":null,"hostName":"poi-retail-developer.servicebus.windows.net"}
2023-02-14 10:55:00.872+0100 07 [reactor-executor-1] I  c.a.c.a.i.handler.ConnectionHandler - {"az.sdk.message":"onConnectionBound","connectionId":"MF_e8be93_1676368500676","hostName":"poi-retail-developer.servicebus.windows.net","peerDetails":"poi-retail-developer.servicebus.windows.net:5671"}
2023-02-14 10:55:00.994+0100 07 [reactor-executor-1] I      c.a.c.a.i.h.StrictTlsContextSpi - SSLv2Hello was an enabled protocol. Filtering out.
2023-02-14 10:55:01.235+0100 07 [reactor-executor-1] I  c.a.c.a.i.handler.ConnectionHandler - {"az.sdk.message":"onConnectionRemoteOpen","connectionId":"MF_e8be93_1676368500676","hostName":"poi-retail-developer.servicebus.windows.net","remoteContainer":"7b6022f3f56c4e8c81ad65f312f63ae8_G5"}
2023-02-14 10:55:01.236+0100 07 [reactor-executor-1] I c.a.m.s.i.ServiceBusConnectionProcessor - {"az.sdk.message":"Channel is now active.","entityPath":"N/A"}
2023-02-14 10:55:01.285+0100 07 [reactor-executor-1] I     c.a.c.a.i.handler.SessionHandler - {"az.sdk.message":"onSessionRemoteOpen","connectionId":"MF_e8be93_1676368500676","sessionName":"azure-test","sessionIncCapacity":0,"sessionOutgoingWindow":2147483647}
2023-02-14 10:55:01.305+0100 07 [reactor-executor-1] I          c.a.c.a.i.ReactorConnection - {"az.sdk.message":"Setting CBS channel.","connectionId":"MF_e8be93_1676368500676"}
2023-02-14 10:55:01.344+0100 07 [reactor-executor-1] I     c.a.c.a.i.handler.SessionHandler - {"az.sdk.message":"onSessionRemoteOpen","connectionId":"MF_e8be93_1676368500676","sessionName":"cbs-session","sessionIncCapacity":0,"sessionOutgoingWindow":2147483647}
2023-02-14 10:55:01.352+0100 07 [reactor-executor-1] I          c.a.c.a.i.ReactorConnection - {"az.sdk.message":"Emitting new response channel.","connectionId":"MF_e8be93_1676368500676","entityPath":"$cbs","linkName":"cbs"}
2023-02-14 10:55:01.352+0100 07 [reactor-executor-1] I       c.a.c.a.i.AmqpChannelProcessor - {"az.sdk.message":"Setting next AMQP channel.","connectionId":"MF_e8be93_1676368500676","entityPath":"$cbs"}
2023-02-14 10:55:01.359+0100 07 [reactor-executor-1] I       c.a.c.a.i.AmqpChannelProcessor - {"az.sdk.message":"Next AMQP channel received.","connectionId":"MF_e8be93_1676368500676","entityPath":"$cbs","subscriberId":"un_f03b9d_1676368501309"}
2023-02-14 10:55:01.395+0100 07 [reactor-executor-1] I    c.a.c.a.i.handler.SendLinkHandler - {"az.sdk.message":"onLinkRemoteOpen","connectionId":"MF_e8be93_1676368500676","linkName":"cbs:sender","entityPath":"$cbs","remoteTarget":"Target{address='$cbs', durable=NONE, expiryPolicy=SESSION_END, timeout=0, dynamic=false, dynamicNodeProperties=null, capabilities=null}"}
2023-02-14 10:55:01.396+0100 07 [reactor-executor-1] I       c.a.c.a.i.AmqpChannelProcessor - {"az.sdk.message":"Channel is now active.","connectionId":"MF_e8be93_1676368500676","entityPath":"$cbs"}
2023-02-14 10:55:01.398+0100 07 [reactor-executor-1] I c.a.c.a.i.handler.ReceiveLinkHandler - {"az.sdk.message":"onLinkRemoteOpen","connectionId":"MF_e8be93_1676368500676","entityPath":"$cbs","linkName":"cbs:receiver","remoteSource":"Source{address='$cbs', durable=NONE, expiryPolicy=SESSION_END, timeout=0, dynamic=false, dynamicNodeProperties=null, distributionMode=null, filter=null, defaultOutcome=null, outcomes=null, capabilities=null}"}
2023-02-14 10:55:01.436+0100 07 [reactor-executor-1] I   c.a.c.a.i.ActiveClientTokenManager - {"az.sdk.message":"Scheduling refresh token task.","scopes":"amqp://poi-retail-developer.servicebus.windows.net/azure-test"}
2023-02-14 10:55:01.443+0100 07 [reactor-executor-1] I             c.a.c.a.i.ReactorSession - {"az.sdk.message":"Creating a new send link.","connectionId":"MF_e8be93_1676368500676","linkName":"azure-testazure-test","sessionName":"azure-test"}
2023-02-14 10:55:01.488+0100 07 [reactor-executor-1] I    c.a.c.a.i.handler.SendLinkHandler - {"az.sdk.message":"onLinkRemoteOpen","connectionId":"MF_e8be93_1676368500676","linkName":"azure-testazure-test","entityPath":"azure-test","remoteTarget":"Target{address='azure-test', durable=NONE, expiryPolicy=SESSION_END, timeout=0, dynamic=false, dynamicNodeProperties=null, capabilities=null}"}
2023-02-14 10:55:01.508+0100 07 [main] I       c.a.c.i.jackson.JacksonVersion - Package versions: jackson-core=2.13.4, jackson-databind=2.13.4-2, jackson-dataformat-xml=2.13.4, jackson-datatype-jsr310=2.13.4, azure-core=1.36.0, Troubleshooting version conflicts: https://aka.ms/azsdk/java/dependency/troubleshoot
2023-02-14 10:55:01.674+0100 07 [main] I  c.a.m.s.ServiceBusSenderAsyncClient - {"az.sdk.message":"Sending batch.","batchSize":100}
2023-02-14 10:55:01.751+0100 07 [main] I                at.test.AzureTestMain - finished sending message batch with size 100
2023-02-14 10:55:01.751+0100 07 [main] I      c.a.m.s.ServiceBusClientBuilder - {"az.sdk.message":"Closing a dependent client.","numberOfOpenClients":0}
2023-02-14 10:55:01.751+0100 07 [main] I      c.a.m.s.ServiceBusClientBuilder - No more open clients, closing shared connection.
2023-02-14 10:55:01.751+0100 07 [main] I c.a.m.s.i.ServiceBusConnectionProcessor - {"az.sdk.message":"Upstream connection publisher was completed. Terminating processor.","entityPath":"N/A"}
2023-02-14 10:55:01.751+0100 07 [main] I          c.a.c.a.i.ReactorConnection - {"az.sdk.message":"Disposing of ReactorConnection.","connectionId":"MF_e8be93_1676368500676","isTransient":false,"isInitiatedByClient":true,"shutdownMessage":"Disposed by client."}
2023-02-14 10:55:01.751+0100 07 [main] I c.a.m.s.i.ServiceBusConnectionProcessor - {"az.sdk.message":"Channel is disposed.","entityPath":"N/A"}
2023-02-14 10:55:01.763+0100 07 [main] I c.a.m.s.i.ServiceBusConnectionProcessor - {"az.sdk.message":"Setting next AMQP channel.","entityPath":"N/A"}
2023-02-14 10:55:01.764+0100 07 [main] I      c.a.m.s.ServiceBusClientBuilder - # of open clients with shared connection: 1
2023-02-14 10:55:01.772+0100 07 [main] I c.a.m.s.ServiceBusReceiverAsyncClient - {"az.sdk.message":"Creating consumer.","linkName":"azure-test/subscriptions/TEST_afcf71_1676368501771","entityPath":"azure-test/subscriptions/TEST"}
2023-02-14 10:55:01.774+0100 07 [main] I c.a.m.s.i.ServiceBusReceiveLinkProcessor - Requesting a new AmqpReceiveLink from upstream.
2023-02-14 10:55:01.775+0100 07 [main] I          c.a.c.a.i.ReactorConnection - {"az.sdk.message":"Creating and starting connection.","connectionId":"MF_f52aa4_1676368501763","hostName":"poi-retail-developer.servicebus.windows.net","port":5671}
2023-02-14 10:55:01.776+0100 07 [main] I            c.a.c.a.i.ReactorExecutor - {"az.sdk.message":"Starting reactor.","connectionId":"MF_f52aa4_1676368501763"}
2023-02-14 10:55:01.778+0100 07 [reactor-executor-2] I  c.a.c.a.i.handler.ConnectionHandler - {"az.sdk.message":"onConnectionInit","connectionId":"MF_f52aa4_1676368501763","hostName":"poi-retail-developer.servicebus.windows.net","namespace":"poi-retail-developer.servicebus.windows.net"}
2023-02-14 10:55:01.778+0100 07 [reactor-executor-2] I     c.a.c.a.i.handler.ReactorHandler - {"az.sdk.message":"reactor.onReactorInit","connectionId":"MF_f52aa4_1676368501763"}
2023-02-14 10:55:01.778+0100 07 [reactor-executor-2] I  c.a.c.a.i.handler.ConnectionHandler - {"az.sdk.message":"onConnectionLocalOpen","connectionId":"MF_f52aa4_1676368501763","errorCondition":null,"errorDescription":null,"hostName":"poi-retail-developer.servicebus.windows.net"}
2023-02-14 10:55:01.778+0100 07 [reactor-executor-2] I  c.a.c.a.i.handler.ConnectionHandler - {"az.sdk.message":"onConnectionBound","connectionId":"MF_f52aa4_1676368501763","hostName":"poi-retail-developer.servicebus.windows.net","peerDetails":"poi-retail-developer.servicebus.windows.net:5671"}
2023-02-14 10:55:01.779+0100 07 [reactor-executor-2] I      c.a.c.a.i.h.StrictTlsContextSpi - SSLv2Hello was an enabled protocol. Filtering out.
2023-02-14 10:55:01.786+0100 07 [reactor-executor-1] I     c.a.c.a.i.handler.SessionHandler - {"az.sdk.message":"onSessionRemoteClose","connectionId":"MF_e8be93_1676368500676","errorCondition":null,"errorDescription":null,"sessionName":"azure-test"}
2023-02-14 10:55:01.786+0100 07 [reactor-executor-1] I     c.a.c.a.i.handler.SessionHandler - {"az.sdk.message":"onSessionRemoteClose","connectionId":"MF_e8be93_1676368500676","errorCondition":null,"errorDescription":null,"sessionName":"cbs-session"}
2023-02-14 10:55:01.796+0100 07 [main] I                at.test.AzureTestMain - Not all messages received yet only 0, sleeping for 10 seconds
2023-02-14 10:55:01.955+0100 07 [reactor-executor-2] I  c.a.c.a.i.handler.ConnectionHandler - {"az.sdk.message":"onConnectionRemoteOpen","connectionId":"MF_f52aa4_1676368501763","hostName":"poi-retail-developer.servicebus.windows.net","remoteContainer":"318d48cc9fab42fea1c8a9d357c69387_G50"}
2023-02-14 10:55:01.956+0100 07 [reactor-executor-2] I c.a.m.s.i.ServiceBusConnectionProcessor - {"az.sdk.message":"Channel is now active.","entityPath":"N/A"}
2023-02-14 10:55:01.988+0100 07 [reactor-executor-2] I     c.a.c.a.i.handler.SessionHandler - {"az.sdk.message":"onSessionRemoteOpen","connectionId":"MF_f52aa4_1676368501763","sessionName":"azure-test/subscriptions/TEST","sessionIncCapacity":0,"sessionOutgoingWindow":2147483647}
2023-02-14 10:55:01.991+0100 07 [reactor-executor-2] I          c.a.c.a.i.ReactorConnection - {"az.sdk.message":"Setting CBS channel.","connectionId":"MF_f52aa4_1676368501763"}
2023-02-14 10:55:02.025+0100 07 [reactor-executor-2] I     c.a.c.a.i.handler.SessionHandler - {"az.sdk.message":"onSessionRemoteOpen","connectionId":"MF_f52aa4_1676368501763","sessionName":"cbs-session","sessionIncCapacity":0,"sessionOutgoingWindow":2147483647}
2023-02-14 10:55:02.025+0100 07 [reactor-executor-2] I          c.a.c.a.i.ReactorConnection - {"az.sdk.message":"Emitting new response channel.","connectionId":"MF_f52aa4_1676368501763","entityPath":"$cbs","linkName":"cbs"}
2023-02-14 10:55:02.025+0100 07 [reactor-executor-2] I       c.a.c.a.i.AmqpChannelProcessor - {"az.sdk.message":"Setting next AMQP channel.","connectionId":"MF_f52aa4_1676368501763","entityPath":"$cbs"}
2023-02-14 10:55:02.026+0100 07 [reactor-executor-2] I       c.a.c.a.i.AmqpChannelProcessor - {"az.sdk.message":"Next AMQP channel received.","connectionId":"MF_f52aa4_1676368501763","entityPath":"$cbs","subscriberId":"un_64458e_1676368501992"}
2023-02-14 10:55:02.059+0100 07 [reactor-executor-2] I    c.a.c.a.i.handler.SendLinkHandler - {"az.sdk.message":"onLinkRemoteOpen","connectionId":"MF_f52aa4_1676368501763","linkName":"cbs:sender","entityPath":"$cbs","remoteTarget":"Target{address='$cbs', durable=NONE, expiryPolicy=SESSION_END, timeout=0, dynamic=false, dynamicNodeProperties=null, capabilities=null}"}
2023-02-14 10:55:02.059+0100 07 [reactor-executor-2] I       c.a.c.a.i.AmqpChannelProcessor - {"az.sdk.message":"Channel is now active.","connectionId":"MF_f52aa4_1676368501763","entityPath":"$cbs"}
2023-02-14 10:55:02.059+0100 07 [reactor-executor-2] I c.a.c.a.i.handler.ReceiveLinkHandler - {"az.sdk.message":"onLinkRemoteOpen","connectionId":"MF_f52aa4_1676368501763","entityPath":"$cbs","linkName":"cbs:receiver","remoteSource":"Source{address='$cbs', durable=NONE, expiryPolicy=SESSION_END, timeout=0, dynamic=false, dynamicNodeProperties=null, distributionMode=null, filter=null, defaultOutcome=null, outcomes=null, capabilities=null}"}
2023-02-14 10:55:02.092+0100 07 [reactor-executor-2] I   c.a.c.a.i.ActiveClientTokenManager - {"az.sdk.message":"Scheduling refresh token task.","scopes":"amqp://poi-retail-developer.servicebus.windows.net/azure-test/subscriptions/TEST"}
2023-02-14 10:55:02.094+0100 07 [reactor-executor-2] I             c.a.c.a.i.ReactorSession - {"az.sdk.message":"Creating a new receiver link.","connectionId":"MF_f52aa4_1676368501763","sessionName":"azure-test/subscriptions/TEST","linkName":"azure-test/subscriptions/TEST_afcf71_1676368501771"}
2023-02-14 10:55:02.098+0100 07 [reactor-executor-2] I c.a.m.s.i.ServiceBusReceiveLinkProcessor - {"az.sdk.message":"Setting next AMQP receive link.","linkName":"azure-test/subscriptions/TEST_afcf71_1676368501771","entityPath":"azure-test/subscriptions/TEST"}
2023-02-14 10:55:02.104+0100 07 [reactor-executor-2] I             c.a.c.a.i.ReactorSession - {"az.sdk.message":"Returning existing receive link.","connectionId":"MF_f52aa4_1676368501763","linkName":"azure-test/subscriptions/TEST_afcf71_1676368501771","entityPath":"azure-test/subscriptions/TEST"}
2023-02-14 10:55:02.142+0100 07 [reactor-executor-2] I c.a.c.a.i.handler.ReceiveLinkHandler - {"az.sdk.message":"onLinkRemoteOpen","connectionId":"MF_f52aa4_1676368501763","entityPath":"azure-test/subscriptions/TEST","linkName":"azure-test/subscriptions/TEST_afcf71_1676368501771","remoteSource":"Source{address='azure-test/subscriptions/TEST', durable=NONE, expiryPolicy=SESSION_END, timeout=0, dynamic=false, dynamicNodeProperties=null, distributionMode=null, filter=null, defaultOutcome=null, outcomes=null, capabilities=null}"}
2023-02-14 10:55:02.158+0100 07 [boundedElastic-3] I                at.test.AzureTestMain - Got message 1 in thread boundedElastic-3
2023-02-14 10:55:02.192+0100 07 [boundedElastic-4] I                at.test.AzureTestMain - Got message 2 in thread boundedElastic-4
2023-02-14 10:55:02.323+0100 07 [boundedElastic-5] I                at.test.AzureTestMain - Got message 3 in thread boundedElastic-5
2023-02-14 10:55:02.358+0100 07 [boundedElastic-6] I                at.test.AzureTestMain - Got message 4 in thread boundedElastic-6
2023-02-14 10:55:02.392+0100 07 [boundedElastic-7] I                at.test.AzureTestMain - Got message 5 in thread boundedElastic-7
2023-02-14 10:55:02.461+0100 07 [boundedElastic-8] I                at.test.AzureTestMain - Got message 6 in thread boundedElastic-8
2023-02-14 10:55:02.495+0100 07 [boundedElastic-9] I                at.test.AzureTestMain - Got message 7 in thread boundedElastic-9
2023-02-14 10:55:02.598+0100 07 [boundedElastic-10] I                at.test.AzureTestMain - Got message 8 in thread boundedElastic-10
2023-02-14 10:55:02.632+0100 07 [boundedElastic-11] I                at.test.AzureTestMain - Got message 9 in thread boundedElastic-11
2023-02-14 10:55:02.748+0100 07 [boundedElastic-12] I                at.test.AzureTestMain - Got message 10 in thread boundedElastic-12
2023-02-14 10:55:02.781+0100 07 [boundedElastic-3] I                at.test.AzureTestMain - Got message 11 in thread boundedElastic-3
2023-02-14 10:55:02.859+0100 07 [boundedElastic-4] I                at.test.AzureTestMain - Got message 12 in thread boundedElastic-4
2023-02-14 10:55:02.892+0100 07 [boundedElastic-5] I                at.test.AzureTestMain - Got message 13 in thread boundedElastic-5
2023-02-14 10:55:02.926+0100 07 [boundedElastic-6] I                at.test.AzureTestMain - Got message 14 in thread boundedElastic-6
2023-02-14 10:55:02.973+0100 07 [boundedElastic-7] I                at.test.AzureTestMain - Got message 15 in thread boundedElastic-7
2023-02-14 10:55:03.006+0100 07 [boundedElastic-8] I                at.test.AzureTestMain - Got message 16 in thread boundedElastic-8
2023-02-14 10:55:03.119+0100 07 [boundedElastic-9] I                at.test.AzureTestMain - Got message 17 in thread boundedElastic-9
2023-02-14 10:55:11.796+0100 07 [main] I                at.test.AzureTestMain - Not all messages received yet only 17, sleeping for 10 seconds
2023-02-14 10:55:21.802+0100 07 [main] I                at.test.AzureTestMain - Not all messages received yet only 17, sleeping for 10 seconds
2023-02-14 10:55:22.344+0100 07 [reactor-executor-2] W  c.a.c.a.i.handler.ConnectionHandler - {"az.sdk.message":"onTransportError","connectionId":"MF_f52aa4_1676368501763","errorCondition":"proton:io","errorDescription":"Eine vorhandene Verbindung wurde vom Remotehost geschlossen","hostName":"poi-retail-developer.servicebus.windows.net"}
2023-02-14 10:55:22.346+0100 07 [reactor-executor-2] I          c.a.c.a.i.ReactorConnection - {"az.sdk.message":"Disposing of ReactorConnection.","connectionId":"MF_f52aa4_1676368501763","isTransient":false,"isInitiatedByClient":false,"shutdownMessage":"Eine vorhandene Verbindung wurde vom Remotehost geschlossen, errorContext[NAMESPACE: poi-retail-developer.servicebus.windows.net. ERROR CONTEXT: N/A]"}
2023-02-14 10:55:22.348+0100 07 [reactor-executor-2] I  c.a.m.s.i.ServiceBusReactorReceiver - {"az.sdk.message":"Waiting for pending updates to complete. Locks: f8b324f1-c396-4d70-ae6f-f04a3918edcf, a389785c-9787-4d04-83fd-f9b849e92293","entityPath":"azure-test/subscriptions/TEST","linkName":"azure-test/subscriptions/TEST_afcf71_1676368501771"}
2023-02-14 10:55:22.349+0100 07 [reactor-executor-2] I  c.a.c.a.i.handler.ConnectionHandler - {"az.sdk.message":"onConnectionUnbound","connectionId":"MF_f52aa4_1676368501763","hostName":"poi-retail-developer.servicebus.windows.net","state":"ACTIVE","remoteState":"ACTIVE"}
2023-02-14 10:55:22.350+0100 07 [reactor-executor-2] I c.a.c.a.i.handler.ReceiveLinkHandler - {"az.sdk.message":"onLinkFinal","connectionId":"MF_f52aa4_1676368501763","linkName":"azure-test/subscriptions/TEST_afcf71_1676368501771","entityPath":"azure-test/subscriptions/TEST"}
2023-02-14 10:55:22.351+0100 07 [reactor-executor-2] I c.a.m.s.i.ServiceBusReceiveLinkProcessor - Receive link endpoint states are closed. Requesting another.
2023-02-14 10:55:22.351+0100 07 [reactor-executor-2] I c.a.m.s.i.ServiceBusReceiveLinkProcessor - Requesting a new AmqpReceiveLink from upstream.
2023-02-14 10:55:22.351+0100 07 [reactor-executor-2] I c.a.m.s.i.ServiceBusReceiveLinkProcessor - {"az.sdk.message":"Setting next AMQP receive link.","linkName":"azure-test/subscriptions/TEST_afcf71_1676368501771","entityPath":"azure-test/subscriptions/TEST"}
2023-02-14 10:55:22.352+0100 07 [boundedElastic-19] I c.a.m.s.i.ServiceBusReceiveLinkProcessor - Receive link endpoint states are closed. Requesting another.
2023-02-14 10:55:22.353+0100 07 [boundedElastic-19] I c.a.m.s.i.ServiceBusReceiveLinkProcessor - Requesting a new AmqpReceiveLink from upstream.
2023-02-14 10:55:22.355+0100 07 [reactor-executor-2] E             c.a.c.a.i.ReactorSession - {"az.sdk.message":"","exception":"Cannot create receive link from a closed session., errorContext[NAMESPACE: poi-retail-developer.servicebus.windows.net. ERROR CONTEXT: N/A, PATH: azure-test/subscriptions/TEST]","connectionId":"MF_f52aa4_1676368501763","sessionName":"azure-test/subscriptions/TEST","entityPath":"azure-test/subscriptions/TEST","linkName":"azure-test/subscriptions/TEST_afcf71_1676368501771"}
2023-02-14 10:55:22.357+0100 07 [reactor-executor-2] I     c.a.c.a.i.handler.SessionHandler - {"az.sdk.message":"onSessionFinal.","connectionId":"MF_f52aa4_1676368501763","errorCondition":null,"errorDescription":null,"sessionName":"azure-test/subscriptions/TEST"}
2023-02-14 10:55:22.357+0100 07 [reactor-executor-2] I    c.a.c.a.i.handler.SendLinkHandler - {"az.sdk.message":"onLinkFinal","connectionId":"MF_f52aa4_1676368501763","linkName":"cbs:sender","entityPath":"$cbs"}
2023-02-14 10:55:22.357+0100 07 [reactor-executor-2] I c.a.c.a.i.handler.ReceiveLinkHandler - {"az.sdk.message":"onLinkFinal","connectionId":"MF_f52aa4_1676368501763","linkName":"cbs:receiver","entityPath":"$cbs"}
2023-02-14 10:55:22.358+0100 07 [reactor-executor-2] I       c.a.c.a.i.AmqpChannelProcessor - {"az.sdk.message":"Channel is closed. Requesting upstream.","connectionId":"MF_f52aa4_1676368501763","entityPath":"$cbs"}
2023-02-14 10:55:22.358+0100 07 [reactor-executor-2] I       c.a.c.a.i.AmqpChannelProcessor - {"az.sdk.message":"Connection not requested, yet. Requesting one.","connectionId":"MF_f52aa4_1676368501763","entityPath":"$cbs"}
2023-02-14 10:55:22.359+0100 07 [reactor-executor-2] I          c.a.c.a.i.ReactorConnection - {"az.sdk.message":"Emitting new response channel.","connectionId":"MF_f52aa4_1676368501763","entityPath":"$cbs","linkName":"cbs"}
2023-02-14 10:55:22.359+0100 07 [reactor-executor-2] I       c.a.c.a.i.AmqpChannelProcessor - {"az.sdk.message":"Setting next AMQP channel.","connectionId":"MF_f52aa4_1676368501763","entityPath":"$cbs"}
2023-02-14 10:55:22.359+0100 07 [reactor-executor-2] I       c.a.c.a.i.AmqpChannelProcessor - {"az.sdk.message":"Upstream connection publisher was completed. Terminating processor.","connectionId":"MF_f52aa4_1676368501763","entityPath":"$cbs"}
2023-02-14 10:55:22.360+0100 07 [reactor-executor-2] I     c.a.c.a.i.handler.SessionHandler - {"az.sdk.message":"onSessionFinal.","connectionId":"MF_f52aa4_1676368501763","errorCondition":null,"errorDescription":null,"sessionName":"cbs-session"}
2023-02-14 10:55:22.360+0100 07 [reactor-executor-2] I  c.a.c.a.i.handler.ConnectionHandler - {"az.sdk.message":"onConnectionFinal","connectionId":"MF_f52aa4_1676368501763","errorCondition":null,"errorDescription":null,"hostName":"poi-retail-developer.servicebus.windows.net"}
2023-02-14 10:55:22.361+0100 07 [reactor-executor-2] I          c.a.c.a.i.ReactorConnection - {"az.sdk.message":"Closing executor.","connectionId":"MF_f52aa4_1676368501763"}
2023-02-14 10:55:22.362+0100 07 [reactor-executor-2] I    c.a.c.a.i.handler.SendLinkHandler - {"az.sdk.message":"Sender link was never active. Closing endpoint states.","connectionId":"MF_f52aa4_1676368501763","linkName":"cbs","entityPath":"$cbs"}
2023-02-14 10:55:22.362+0100 07 [reactor-executor-2] I c.a.c.a.i.handler.ReceiveLinkHandler - {"az.sdk.message":"Receiver link was never active. Closing endpoint states","connectionId":"MF_f52aa4_1676368501763","entityPath":"$cbs","linkName":"cbs"}
2023-02-14 10:55:22.363+0100 07 [reactor-executor-2] I       c.a.c.a.i.AmqpChannelProcessor - {"az.sdk.message":"Channel is disposed.","connectionId":"MF_f52aa4_1676368501763","entityPath":"$cbs"}
2023-02-14 10:55:22.363+0100 07 [reactor-executor-2] I  c.a.c.a.i.handler.ConnectionHandler - {"az.sdk.message":"onConnectionLocalClose","connectionId":"MF_f52aa4_1676368501763","errorCondition":null,"errorDescription":null,"hostName":"poi-retail-developer.servicebus.windows.net"}
2023-02-14 10:55:23.084+0100 07 [parallel-8] I         c.a.m.s.LockRenewalOperation - {"az.sdk.message":"Starting lock renewal.","isSession":false,"lockToken":"a389785c-9787-4d04-83fd-f9b849e92293"}
2023-02-14 10:55:23.085+0100 07 [parallel-8] E c.a.m.s.i.ServiceBusReactorAmqpConnection - connectionId[MF_f52aa4_1676368501763]: Connection is disposed. Cannot get management instance for 'azure-test/subscriptions/TEST'
2023-02-14 10:55:23.086+0100 07 [parallel-8] E         c.a.m.s.LockRenewalOperation - {"az.sdk.message":"Error occurred while renewing lock token.","exception":"connectionId[MF_f52aa4_1676368501763]: Connection is disposed. Cannot get management instance for 'azure-test/subscriptions/TEST'","isSession":false,"lockToken":"a389785c-9787-4d04-83fd-f9b849e92293"}
2023-02-14 10:55:23.207+0100 07 [parallel-10] I         c.a.m.s.LockRenewalOperation - {"az.sdk.message":"Starting lock renewal.","isSession":false,"lockToken":"f8b324f1-c396-4d70-ae6f-f04a3918edcf"}
2023-02-14 10:55:23.207+0100 07 [parallel-10] E c.a.m.s.i.ServiceBusReactorAmqpConnection - connectionId[MF_f52aa4_1676368501763]: Connection is disposed. Cannot get management instance for 'azure-test/subscriptions/TEST'
2023-02-14 10:55:23.207+0100 07 [parallel-10] E         c.a.m.s.LockRenewalOperation - {"az.sdk.message":"Error occurred while renewing lock token.","exception":"connectionId[MF_f52aa4_1676368501763]: Connection is disposed. Cannot get management instance for 'azure-test/subscriptions/TEST'","isSession":false,"lockToken":"f8b324f1-c396-4d70-ae6f-f04a3918edcf"}
2023-02-14 10:55:26.372+0100 07 [reactor-executor-2] I            c.a.c.a.i.ReactorExecutor - {"az.sdk.message":"Processing all pending tasks and closing old reactor.","connectionId":"MF_f52aa4_1676368501763"}
2023-02-14 10:55:26.373+0100 07 [reactor-executor-2] I          c.a.c.a.i.ReactorDispatcher - {"az.sdk.message":"Reactor selectable is being disposed.","connectionId":"MF_f52aa4_1676368501763"}
2023-02-14 10:55:26.373+0100 07 [reactor-executor-2] I          c.a.c.a.i.ReactorConnection - {"az.sdk.message":"onConnectionShutdown. Shutting down.","connectionId":"MF_f52aa4_1676368501763","isTransient":false,"isInitiatedByClient":false,"shutdownMessage":"connectionId[MF_f52aa4_1676368501763] Reactor selectable is disposed.","namespace":"poi-retail-developer.servicebus.windows.net"}
2023-02-14 10:55:26.377+0100 07 [reactor-executor-2] E     reactor.core.publisher.Operators - Operator called default onErrorDropped
reactor.core.Exceptions$ErrorCallbackNotImplemented: com.azure.core.amqp.exception.AmqpException: Eine vorhandene Verbindung wurde vom Remotehost geschlossen, errorContext[NAMESPACE: poi-retail-developer.servicebus.windows.net. ERROR CONTEXT: N/A]
Caused by: com.azure.core.amqp.exception.AmqpException: Eine vorhandene Verbindung wurde vom Remotehost geschlossen, errorContext[NAMESPACE: poi-retail-developer.servicebus.windows.net. ERROR CONTEXT: N/A]
  at com.azure.core.amqp.implementation.ExceptionUtil.toException(ExceptionUtil.java:85)
  at com.azure.core.amqp.implementation.handler.ConnectionHandler.notifyErrorContext(ConnectionHandler.java:351)
  at com.azure.core.amqp.implementation.handler.ConnectionHandler.onTransportError(ConnectionHandler.java:253)
  at org.apache.qpid.proton.engine.BaseHandler.handle(BaseHandler.java:191)
  at org.apache.qpid.proton.engine.impl.EventImpl.dispatch(EventImpl.java:108)
  at org.apache.qpid.proton.reactor.impl.ReactorImpl.dispatch(ReactorImpl.java:324)
  at org.apache.qpid.proton.reactor.impl.ReactorImpl.process(ReactorImpl.java:291)
  at com.azure.core.amqp.implementation.ReactorExecutor.run(ReactorExecutor.java:91)
  at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68)
  at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28)
  at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
  at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
  at java.base/java.lang.Thread.run(Thread.java:834)
2023-02-14 10:55:26.377+0100 07 [reactor-executor-2] I c.a.m.s.i.ServiceBusConnectionProcessor - {"az.sdk.message":"Transient error occurred. Retrying.","exception":"Eine vorhandene Verbindung wurde vom Remotehost geschlossen, errorContext[NAMESPACE: poi-retail-developer.servicebus.windows.net. ERROR CONTEXT: N/A]","entityPath":"N/A","tryCount":0,"interval_ms":4511}
2023-02-14 10:55:26.377+0100 07 [reactor-executor-2] I          c.a.c.a.i.ReactorConnection - {"az.sdk.message":"onConnectionShutdown. Shutting down.","connectionId":"MF_f52aa4_1676368501763","isTransient":false,"isInitiatedByClient":false,"shutdownMessage":"Finished processing pending tasks.","namespace":"poi-retail-developer.servicebus.windows.net"}
2023-02-14 10:55:30.903+0100 07 [parallel-9] I c.a.m.s.i.ServiceBusConnectionProcessor - {"az.sdk.message":"Requesting from upstream.","entityPath":"N/A","tryCount":0}
2023-02-14 10:55:30.903+0100 07 [parallel-9] I c.a.m.s.i.ServiceBusConnectionProcessor - {"az.sdk.message":"Connection not requested, yet. Requesting one.","entityPath":"N/A"}
2023-02-14 10:55:30.903+0100 07 [parallel-9] I c.a.m.s.i.ServiceBusConnectionProcessor - {"az.sdk.message":"Setting next AMQP channel.","entityPath":"N/A"}
2023-02-14 10:55:30.903+0100 07 [parallel-9] I          c.a.c.a.i.ReactorConnection - {"az.sdk.message":"Creating and starting connection.","connectionId":"MF_46079f_1676368501764","hostName":"poi-retail-developer.servicebus.windows.net","port":5671}
2023-02-14 10:55:30.904+0100 07 [parallel-9] I            c.a.c.a.i.ReactorExecutor - {"az.sdk.message":"Starting reactor.","connectionId":"MF_46079f_1676368501764"}
2023-02-14 10:55:30.905+0100 07 [parallel-9] I c.a.m.s.i.ServiceBusConnectionProcessor - {"az.sdk.message":"Next AMQP channel received.","entityPath":"N/A","subscriberId":"un_433a3b_1676368527448"}
2023-02-14 10:55:30.906+0100 07 [reactor-executor-3] I  c.a.c.a.i.handler.ConnectionHandler - {"az.sdk.message":"onConnectionInit","connectionId":"MF_46079f_1676368501764","hostName":"poi-retail-developer.servicebus.windows.net","namespace":"poi-retail-developer.servicebus.windows.net"}
2023-02-14 10:55:30.906+0100 07 [reactor-executor-3] I     c.a.c.a.i.handler.ReactorHandler - {"az.sdk.message":"reactor.onReactorInit","connectionId":"MF_46079f_1676368501764"}
2023-02-14 10:55:30.906+0100 07 [reactor-executor-3] I  c.a.c.a.i.handler.ConnectionHandler - {"az.sdk.message":"onConnectionLocalOpen","connectionId":"MF_46079f_1676368501764","errorCondition":null,"errorDescription":null,"hostName":"poi-retail-developer.servicebus.windows.net"}
2023-02-14 10:55:30.907+0100 07 [reactor-executor-3] I  c.a.c.a.i.handler.ConnectionHandler - {"az.sdk.message":"onConnectionBound","connectionId":"MF_46079f_1676368501764","hostName":"poi-retail-developer.servicebus.windows.net","peerDetails":"poi-retail-developer.servicebus.windows.net:5671"}
2023-02-14 10:55:30.907+0100 07 [reactor-executor-3] I      c.a.c.a.i.h.StrictTlsContextSpi - SSLv2Hello was an enabled protocol. Filtering out.
2023-02-14 10:55:31.810+0100 07 [main] I                at.test.AzureTestMain - Not all messages received yet only 17, sleeping for 10 seconds
2023-02-14 10:55:34.515+0100 07 [reactor-executor-3] I  c.a.c.a.i.handler.ConnectionHandler - {"az.sdk.message":"onConnectionRemoteOpen","connectionId":"MF_46079f_1676368501764","hostName":"poi-retail-developer.servicebus.windows.net","remoteContainer":"416e3502ebc342e687cb37ea6283dcbe_G10"}
2023-02-14 10:55:34.516+0100 07 [reactor-executor-3] I c.a.m.s.i.ServiceBusConnectionProcessor - {"az.sdk.message":"Channel is now active.","entityPath":"N/A"}
2023-02-14 10:55:34.557+0100 07 [reactor-executor-3] I     c.a.c.a.i.handler.SessionHandler - {"az.sdk.message":"onSessionRemoteOpen","connectionId":"MF_46079f_1676368501764","sessionName":"azure-test/subscriptions/TEST","sessionIncCapacity":0,"sessionOutgoingWindow":2147483647}
2023-02-14 10:55:34.558+0100 07 [reactor-executor-3] I          c.a.c.a.i.ReactorConnection - {"az.sdk.message":"Setting CBS channel.","connectionId":"MF_46079f_1676368501764"}
2023-02-14 10:55:34.595+0100 07 [reactor-executor-3] I     c.a.c.a.i.handler.SessionHandler - {"az.sdk.message":"onSessionRemoteOpen","connectionId":"MF_46079f_1676368501764","sessionName":"cbs-session","sessionIncCapacity":0,"sessionOutgoingWindow":2147483647}
2023-02-14 10:55:34.595+0100 07 [reactor-executor-3] I          c.a.c.a.i.ReactorConnection - {"az.sdk.message":"Emitting new response channel.","connectionId":"MF_46079f_1676368501764","entityPath":"$cbs","linkName":"cbs"}
2023-02-14 10:55:34.595+0100 07 [reactor-executor-3] I       c.a.c.a.i.AmqpChannelProcessor - {"az.sdk.message":"Setting next AMQP channel.","connectionId":"MF_46079f_1676368501764","entityPath":"$cbs"}
2023-02-14 10:55:34.596+0100 07 [reactor-executor-3] I       c.a.c.a.i.AmqpChannelProcessor - {"az.sdk.message":"Next AMQP channel received.","connectionId":"MF_46079f_1676368501764","entityPath":"$cbs","subscriberId":"un_0f710e_1676368534558"}
2023-02-14 10:55:34.632+0100 07 [reactor-executor-3] I    c.a.c.a.i.handler.SendLinkHandler - {"az.sdk.message":"onLinkRemoteOpen","connectionId":"MF_46079f_1676368501764","linkName":"cbs:sender","entityPath":"$cbs","remoteTarget":"Target{address='$cbs', durable=NONE, expiryPolicy=SESSION_END, timeout=0, dynamic=false, dynamicNodeProperties=null, capabilities=null}"}
2023-02-14 10:55:34.636+0100 07 [reactor-executor-3] I       c.a.c.a.i.AmqpChannelProcessor - {"az.sdk.message":"Channel is now active.","connectionId":"MF_46079f_1676368501764","entityPath":"$cbs"}
2023-02-14 10:55:34.636+0100 07 [reactor-executor-3] I c.a.c.a.i.handler.ReceiveLinkHandler - {"az.sdk.message":"onLinkRemoteOpen","connectionId":"MF_46079f_1676368501764","entityPath":"$cbs","linkName":"cbs:receiver","remoteSource":"Source{address='$cbs', durable=NONE, expiryPolicy=SESSION_END, timeout=0, dynamic=false, dynamicNodeProperties=null, distributionMode=null, filter=null, defaultOutcome=null, outcomes=null, capabilities=null}"}
2023-02-14 10:55:34.671+0100 07 [reactor-executor-3] I   c.a.c.a.i.ActiveClientTokenManager - {"az.sdk.message":"Scheduling refresh token task.","scopes":"amqp://poi-retail-developer.servicebus.windows.net/azure-test/subscriptions/TEST"}
2023-02-14 10:55:34.671+0100 07 [reactor-executor-3] I             c.a.c.a.i.ReactorSession - {"az.sdk.message":"Creating a new receiver link.","connectionId":"MF_46079f_1676368501764","sessionName":"azure-test/subscriptions/TEST","linkName":"azure-test/subscriptions/TEST_afcf71_1676368501771"}
2023-02-14 10:55:34.672+0100 07 [reactor-executor-3] I c.a.m.s.i.ServiceBusReceiveLinkProcessor - {"az.sdk.message":"Setting next AMQP receive link.","linkName":"azure-test/subscriptions/TEST_afcf71_1676368501771","entityPath":"azure-test/subscriptions/TEST"}
2023-02-14 10:55:34.673+0100 07 [reactor-executor-3] I             c.a.c.a.i.ReactorSession - {"az.sdk.message":"Returning existing receive link.","connectionId":"MF_46079f_1676368501764","linkName":"azure-test/subscriptions/TEST_afcf71_1676368501771","entityPath":"azure-test/subscriptions/TEST"}
2023-02-14 10:55:34.712+0100 07 [reactor-executor-3] I c.a.c.a.i.handler.ReceiveLinkHandler - {"az.sdk.message":"onLinkRemoteOpen","connectionId":"MF_46079f_1676368501764","entityPath":"azure-test/subscriptions/TEST","linkName":"azure-test/subscriptions/TEST_afcf71_1676368501771","remoteSource":"Source{address='azure-test/subscriptions/TEST', durable=NONE, expiryPolicy=SESSION_END, timeout=0, dynamic=false, dynamicNodeProperties=null, distributionMode=null, filter=null, defaultOutcome=null, outcomes=null, capabilities=null}"}
2023-02-14 10:55:34.716+0100 07 [boundedElastic-10] I                at.test.AzureTestMain - Got message 17 in thread boundedElastic-10
2023-02-14 10:55:34.766+0100 07 [boundedElastic-11] I                at.test.AzureTestMain - Got message 18 in thread boundedElastic-11
2023-02-14 10:55:34.798+0100 07 [boundedElastic-12] I                at.test.AzureTestMain - Got message 19 in thread boundedElastic-12
2023-02-14 10:55:34.877+0100 07 [boundedElastic-3] I                at.test.AzureTestMain - Got message 20 in thread boundedElastic-3
2023-02-14 10:55:34.910+0100 07 [boundedElastic-4] I                at.test.AzureTestMain - Got message 21 in thread boundedElastic-4
2023-02-14 10:55:34.984+0100 07 [boundedElastic-5] I                at.test.AzureTestMain - Got message 22 in thread boundedElastic-5
2023-02-14 10:55:35.016+0100 07 [boundedElastic-6] I                at.test.AzureTestMain - Got message 23 in thread boundedElastic-6
2023-02-14 10:55:35.387+0100 07 [boundedElastic-7] I                at.test.AzureTestMain - Got message 24 in thread boundedElastic-7
2023-02-14 10:55:35.448+0100 07 [boundedElastic-10] I                at.test.AzureTestMain - Got message 25 in thread boundedElastic-10
2023-02-14 10:55:35.516+0100 07 [boundedElastic-11] I                at.test.AzureTestMain - Got message 26 in thread boundedElastic-11
2023-02-14 10:55:35.549+0100 07 [boundedElastic-12] I                at.test.AzureTestMain - Got message 27 in thread boundedElastic-12
2023-02-14 10:55:35.689+0100 07 [boundedElastic-3] I                at.test.AzureTestMain - Got message 28 in thread boundedElastic-3
2023-02-14 10:55:35.730+0100 07 [boundedElastic-4] I                at.test.AzureTestMain - Got message 29 in thread boundedElastic-4
2023-02-14 10:55:35.824+0100 07 [boundedElastic-5] I                at.test.AzureTestMain - Got message 30 in thread boundedElastic-5
2023-02-14 10:55:35.868+0100 07 [boundedElastic-6] I                at.test.AzureTestMain - Got message 31 in thread boundedElastic-6
2023-02-14 10:55:36.197+0100 07 [boundedElastic-7] I                at.test.AzureTestMain - Got message 32 in thread boundedElastic-7
2023-02-14 10:55:36.234+0100 07 [boundedElastic-10] I                at.test.AzureTestMain - Got message 33 in thread boundedElastic-10
2023-02-14 10:55:36.266+0100 07 [boundedElastic-11] I                at.test.AzureTestMain - Got message 34 in thread boundedElastic-11
2023-02-14 10:55:36.302+0100 07 [boundedElastic-12] I                at.test.AzureTestMain - Got message 35 in thread boundedElastic-12
2023-02-14 10:55:36.379+0100 07 [boundedElastic-3] I                at.test.AzureTestMain - Got message 36 in thread boundedElastic-3
2023-02-14 10:55:36.411+0100 07 [boundedElastic-4] I                at.test.AzureTestMain - Got message 37 in thread boundedElastic-4
2023-02-14 10:55:36.477+0100 07 [boundedElastic-5] I                at.test.AzureTestMain - Got message 38 in thread boundedElastic-5
2023-02-14 10:55:36.512+0100 07 [boundedElastic-6] I                at.test.AzureTestMain - Got message 39 in thread boundedElastic-6
2023-02-14 10:55:36.574+0100 07 [boundedElastic-7] I                at.test.AzureTestMain - Got message 40 in thread boundedElastic-7
2023-02-14 10:55:36.606+0100 07 [boundedElastic-10] I                at.test.AzureTestMain - Got message 41 in thread boundedElastic-10
2023-02-14 10:55:36.691+0100 07 [boundedElastic-11] I                at.test.AzureTestMain - Got message 42 in thread boundedElastic-11
2023-02-14 10:55:36.724+0100 07 [boundedElastic-12] I                at.test.AzureTestMain - Got message 43 in thread boundedElastic-12
2023-02-14 10:55:36.829+0100 07 [boundedElastic-3] I                at.test.AzureTestMain - Got message 44 in thread boundedElastic-3
2023-02-14 10:55:36.861+0100 07 [boundedElastic-4] I                at.test.AzureTestMain - Got message 45 in thread boundedElastic-4
2023-02-14 10:55:36.998+0100 07 [boundedElastic-5] I                at.test.AzureTestMain - Got message 46 in thread boundedElastic-5
2023-02-14 10:55:37.030+0100 07 [boundedElastic-6] I                at.test.AzureTestMain - Got message 47 in thread boundedElastic-6
2023-02-14 10:55:37.120+0100 07 [boundedElastic-7] I                at.test.AzureTestMain - Got message 48 in thread boundedElastic-7
2023-02-14 10:55:37.152+0100 07 [boundedElastic-10] I                at.test.AzureTestMain - Got message 49 in thread boundedElastic-10
2023-02-14 10:55:37.302+0100 07 [boundedElastic-11] I                at.test.AzureTestMain - Got message 50 in thread boundedElastic-11
2023-02-14 10:55:37.334+0100 07 [boundedElastic-12] I                at.test.AzureTestMain - Got message 51 in thread boundedElastic-12
2023-02-14 10:55:37.662+0100 07 [boundedElastic-3] I                at.test.AzureTestMain - Got message 52 in thread boundedElastic-3
2023-02-14 10:55:37.704+0100 07 [boundedElastic-4] I                at.test.AzureTestMain - Got message 53 in thread boundedElastic-4
2023-02-14 10:55:37.832+0100 07 [boundedElastic-5] I                at.test.AzureTestMain - Got message 54 in thread boundedElastic-5
2023-02-14 10:55:37.867+0100 07 [boundedElastic-6] I                at.test.AzureTestMain - Got message 55 in thread boundedElastic-6
2023-02-14 10:55:37.944+0100 07 [boundedElastic-7] I                at.test.AzureTestMain - Got message 56 in thread boundedElastic-7
2023-02-14 10:55:37.978+0100 07 [boundedElastic-10] I                at.test.AzureTestMain - Got message 57 in thread boundedElastic-10
2023-02-14 10:55:38.105+0100 07 [boundedElastic-11] I                at.test.AzureTestMain - Got message 58 in thread boundedElastic-11
2023-02-14 10:55:38.137+0100 07 [boundedElastic-12] I                at.test.AzureTestMain - Got message 59 in thread boundedElastic-12
2023-02-14 10:55:38.237+0100 07 [boundedElastic-3] I                at.test.AzureTestMain - Got message 60 in thread boundedElastic-3
2023-02-14 10:55:38.268+0100 07 [boundedElastic-4] I                at.test.AzureTestMain - Got message 61 in thread boundedElastic-4
2023-02-14 10:55:38.332+0100 07 [boundedElastic-5] I                at.test.AzureTestMain - Got message 62 in thread boundedElastic-5
2023-02-14 10:55:38.364+0100 07 [boundedElastic-6] I                at.test.AzureTestMain - Got message 63 in thread boundedElastic-6
2023-02-14 10:55:38.451+0100 07 [boundedElastic-7] I                at.test.AzureTestMain - Got message 64 in thread boundedElastic-7
2023-02-14 10:55:38.484+0100 07 [boundedElastic-10] I                at.test.AzureTestMain - Got message 65 in thread boundedElastic-10
2023-02-14 10:55:38.588+0100 07 [boundedElastic-11] I                at.test.AzureTestMain - Got message 66 in thread boundedElastic-11
2023-02-14 10:55:38.621+0100 07 [boundedElastic-12] I                at.test.AzureTestMain - Got message 67 in thread boundedElastic-12
2023-02-14 10:55:38.662+0100 07 [boundedElastic-3] I                at.test.AzureTestMain - Got message 68 in thread boundedElastic-3
2023-02-14 10:55:38.693+0100 07 [boundedElastic-4] I                at.test.AzureTestMain - Got message 69 in thread boundedElastic-4
2023-02-14 10:55:38.725+0100 07 [boundedElastic-5] I                at.test.AzureTestMain - Got message 70 in thread boundedElastic-5
2023-02-14 10:55:38.810+0100 07 [boundedElastic-6] I                at.test.AzureTestMain - Got message 71 in thread boundedElastic-6
2023-02-14 10:55:38.841+0100 07 [boundedElastic-7] I                at.test.AzureTestMain - Got message 72 in thread boundedElastic-7
2023-02-14 10:55:38.902+0100 07 [boundedElastic-10] I                at.test.AzureTestMain - Got message 73 in thread boundedElastic-10
2023-02-14 10:55:38.933+0100 07 [boundedElastic-11] I                at.test.AzureTestMain - Got message 74 in thread boundedElastic-11
2023-02-14 10:55:39.016+0100 07 [boundedElastic-12] I                at.test.AzureTestMain - Got message 75 in thread boundedElastic-12
2023-02-14 10:55:39.050+0100 07 [boundedElastic-3] I                at.test.AzureTestMain - Got message 76 in thread boundedElastic-3
2023-02-14 10:55:39.124+0100 07 [boundedElastic-4] I                at.test.AzureTestMain - Got message 77 in thread boundedElastic-4
2023-02-14 10:55:39.165+0100 07 [boundedElastic-5] I                at.test.AzureTestMain - Got message 78 in thread boundedElastic-5
2023-02-14 10:55:39.311+0100 07 [boundedElastic-6] I                at.test.AzureTestMain - Got message 79 in thread boundedElastic-6
2023-02-14 10:55:39.355+0100 07 [boundedElastic-7] I                at.test.AzureTestMain - Got message 80 in thread boundedElastic-7
2023-02-14 10:55:39.445+0100 07 [boundedElastic-10] I                at.test.AzureTestMain - Got message 81 in thread boundedElastic-10
2023-02-14 10:55:39.734+0100 07 [boundedElastic-11] I                at.test.AzureTestMain - Got message 82 in thread boundedElastic-11
2023-02-14 10:55:41.811+0100 07 [main] I                at.test.AzureTestMain - Not all messages received yet only 82, sleeping for 10 seconds
2023-02-14 10:55:47.943+0100 07 [boundedElastic-12] I                at.test.AzureTestMain - Got message 83 in thread boundedElastic-12
2023-02-14 10:55:48.256+0100 07 [boundedElastic-3] I                at.test.AzureTestMain - Got message 84 in thread boundedElastic-3
2023-02-14 10:55:48.491+0100 07 [boundedElastic-4] I                at.test.AzureTestMain - Got message 85 in thread boundedElastic-4
2023-02-14 10:55:48.628+0100 07 [boundedElastic-5] I                at.test.AzureTestMain - Got message 86 in thread boundedElastic-5
2023-02-14 10:55:49.057+0100 07 [boundedElastic-6] I                at.test.AzureTestMain - Got message 87 in thread boundedElastic-6
2023-02-14 10:55:49.173+0100 07 [boundedElastic-7] I                at.test.AzureTestMain - Got message 88 in thread boundedElastic-7
2023-02-14 10:55:49.291+0100 07 [boundedElastic-10] I                at.test.AzureTestMain - Got message 89 in thread boundedElastic-10
2023-02-14 10:55:49.403+0100 07 [boundedElastic-11] I                at.test.AzureTestMain - Got message 90 in thread boundedElastic-11
2023-02-14 10:55:49.474+0100 07 [boundedElastic-12] I                at.test.AzureTestMain - Got message 91 in thread boundedElastic-12
2023-02-14 10:55:49.579+0100 07 [boundedElastic-3] I                at.test.AzureTestMain - Got message 92 in thread boundedElastic-3
2023-02-14 10:55:49.661+0100 07 [boundedElastic-4] I                at.test.AzureTestMain - Got message 93 in thread boundedElastic-4
2023-02-14 10:55:49.715+0100 07 [boundedElastic-5] I                at.test.AzureTestMain - Got message 94 in thread boundedElastic-5
2023-02-14 10:55:49.775+0100 07 [boundedElastic-6] I                at.test.AzureTestMain - Got message 95 in thread boundedElastic-6
2023-02-14 10:55:49.846+0100 07 [boundedElastic-7] I                at.test.AzureTestMain - Got message 96 in thread boundedElastic-7
2023-02-14 10:55:50.056+0100 07 [boundedElastic-10] I                at.test.AzureTestMain - Got message 97 in thread boundedElastic-10
2023-02-14 10:55:50.122+0100 07 [boundedElastic-11] I                at.test.AzureTestMain - Got message 98 in thread boundedElastic-11
2023-02-14 10:55:50.212+0100 07 [boundedElastic-12] I                at.test.AzureTestMain - Got message 99 in thread boundedElastic-12
2023-02-14 10:55:50.268+0100 07 [boundedElastic-3] I                at.test.AzureTestMain - Got message 100 in thread boundedElastic-3
2023-02-14 10:55:51.814+0100 07 [main] I                at.test.AzureTestMain - All messages processed, completing
2023-02-14 10:55:51.816+0100 07 [main] I c.a.m.s.ServiceBusReceiverAsyncClient - Removing receiver links.
2023-02-14 10:55:51.816+0100 07 [main] I      c.a.m.s.ServiceBusClientBuilder - {"az.sdk.message":"Closing a dependent client.","numberOfOpenClients":0}
2023-02-14 10:55:51.816+0100 07 [main] I      c.a.m.s.ServiceBusClientBuilder - No more open clients, closing shared connection.
2023-02-14 10:55:51.816+0100 07 [main] I c.a.m.s.i.ServiceBusConnectionProcessor - {"az.sdk.message":"Upstream connection publisher was completed. Terminating processor.","entityPath":"N/A"}
2023-02-14 10:55:51.817+0100 07 [main] I          c.a.c.a.i.ReactorConnection - {"az.sdk.message":"Disposing of ReactorConnection.","connectionId":"MF_46079f_1676368501764","isTransient":false,"isInitiatedByClient":true,"shutdownMessage":"Disposed by client."}
2023-02-14 10:55:51.817+0100 07 [main] I c.a.m.s.i.ServiceBusConnectionProcessor - {"az.sdk.message":"Channel is disposed.","entityPath":"N/A"}

What can be seen there is that initially the boundedElastic threads 3 to 12 without any break in between were processing messages, for a total of 10 as configured. After pulling my network cable after message 17 and plugging it back in only 8 threads recovered: 3,4,5,6,7,10,11,12. However 2 threads 8 and 9 never did anything again, also no different thread took over to ensure it's 10 in total again. Checking jconsole while the program is running shows threads 8 and 9 once again waiting in the complete().block()

I think the example should answer most questions but to make sure all are answered:

What is your service bus sdk version?

there are some slight variances between real apps at around 7.12.x and 7.13.x but all are pretty recent releases

What is the average size of each message?

This actually largely fluctuates between apps some are sending very small a few byte single messages while others are sending 1kb+ messages with a large "batch" body in our logic. In the example there's pretty much no significant body, increasing it does not seem to change anything for the frequency it happens either

What is the average processing time for each message?

I feel like I'm repeating myself but that's just how it is, in real apps anything from <100ms to upwards of ~2 minutes for some processings. It really depends a lot on what is being done but we have observed it for both. The example right now has pretty much 0 processing time other than writing one log, could easily add some sleep there to make it longer

Are you using the AKS for deployment, does it deploy on the same region as the service bus?

No we are running an on prem openshift cluster for now. The cluster is located in Europe as is our service bus subscription

For deployment, I think you are running the processor client on 10 pods and consuming messages from one queue.
Please correct me if I am wrong.

The application above from my kibana screenshot that shows that some threads are no longer logging has a consumer client for 1 subscription on 1 topic, as does my code example above. This is often the most common case for our applications, however we have a few applications that consume from more than 1 subscription each with a different client object in the same process JVM where we are seeing the same behavior for all the individual client object consumer threads

How about memory allocation for each pod?

Our applications range from around 1GB java heap with around 2GB pod memory to up to 9GB java heap with 12GB pod memory. We usually also monitor for long garbage collections and OOM issues and pretty much all of them are mostly running stable small GC cycles with only very infrequent larger ones. With the start of more and more customers with are slowly but steadily increasing the pod amount whenever we notice GCs becoming more frequent. But we are still just a normal human run project so I won't deny there's some long GC pause ever now and then during some unexpected traffic spikes and so on but it's only rare and I could not spot any correlation to the timing when threads seemed to stop their processing (by checking that they stopped to produce logs)

Hope this helps, if anything is missing just ask!

@TheDevOps
Copy link

One more thing I've actually noticed the first time with this newly created test program is that often (but not always weirdly) on the final serviceBusProcessorClient.close() I would see some exceptions that happens for the threads that were no longer processing messages before:

2023-02-14 14:43:38.441+0100 07 [boundedElastic-7] E                at.test.AzureTestMain - Error in message processing in thread boundedElastic-7
com.azure.messaging.servicebus.ServiceBusException: java.lang.InterruptedException
	at com.azure.messaging.servicebus.ServiceBusProcessorClient$1.onNext(ServiceBusProcessorClient.java:387)
	at com.azure.messaging.servicebus.ServiceBusProcessorClient$1.onNext(ServiceBusProcessorClient.java:362)
	at reactor.core.publisher.FluxPublishOn$PublishOnSubscriber.runAsync(FluxPublishOn.java:440)
	at reactor.core.publisher.FluxPublishOn$PublishOnSubscriber.run(FluxPublishOn.java:527)
	at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:84)
	at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:37)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: reactor.core.Exceptions$ReactiveException: java.lang.InterruptedException
	at reactor.core.Exceptions.propagate(Exceptions.java:396)
	at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:91)
	at reactor.core.publisher.Mono.block(Mono.java:1742)
	at com.azure.messaging.servicebus.ServiceBusReceivedMessageContext.complete(ServiceBusReceivedMessageContext.java:81)
	at at.test.AzureTestMain.receiveMessage(AzureTestMain.java:115)
	at com.azure.messaging.servicebus.ServiceBusProcessorClient$1.onNext(ServiceBusProcessorClient.java:383)
	... 10 common frames omitted
Caused by: java.lang.InterruptedException: null
	at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1040)
	at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1345)
	at java.base/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:232)
	at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:87)
	... 14 common frames omitted
2023-02-14 14:43:38.441+0100 07 [boundedElastic-6] E                at.test.AzureTestMain - Error in message processing in thread boundedElastic-6
com.azure.messaging.servicebus.ServiceBusException: java.lang.InterruptedException
	at com.azure.messaging.servicebus.ServiceBusProcessorClient$1.onNext(ServiceBusProcessorClient.java:387)
	at com.azure.messaging.servicebus.ServiceBusProcessorClient$1.onNext(ServiceBusProcessorClient.java:362)
	at reactor.core.publisher.FluxPublishOn$PublishOnSubscriber.runAsync(FluxPublishOn.java:440)
	at reactor.core.publisher.FluxPublishOn$PublishOnSubscriber.run(FluxPublishOn.java:527)
	at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:84)
	at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:37)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: reactor.core.Exceptions$ReactiveException: java.lang.InterruptedException
	at reactor.core.Exceptions.propagate(Exceptions.java:396)
	at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:91)
	at reactor.core.publisher.Mono.block(Mono.java:1742)
	at com.azure.messaging.servicebus.ServiceBusReceivedMessageContext.complete(ServiceBusReceivedMessageContext.java:81)
	at at.test.AzureTestMain.receiveMessage(AzureTestMain.java:115)
	at com.azure.messaging.servicebus.ServiceBusProcessorClient$1.onNext(ServiceBusProcessorClient.java:383)
	... 10 common frames omitted
Caused by: java.lang.InterruptedException: null
	at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1040)
	at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1345)
	at java.base/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:232)
	at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:87)
	... 14 common frames omitted
2023-02-14 14:43:38.441+0100 07 [boundedElastic-5] E                at.test.AzureTestMain - Error in message processing in thread boundedElastic-5
com.azure.messaging.servicebus.ServiceBusException: java.lang.InterruptedException
	at com.azure.messaging.servicebus.ServiceBusProcessorClient$1.onNext(ServiceBusProcessorClient.java:387)
	at com.azure.messaging.servicebus.ServiceBusProcessorClient$1.onNext(ServiceBusProcessorClient.java:362)
	at reactor.core.publisher.FluxPublishOn$PublishOnSubscriber.runAsync(FluxPublishOn.java:440)
	at reactor.core.publisher.FluxPublishOn$PublishOnSubscriber.run(FluxPublishOn.java:527)
	at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:84)
	at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:37)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: reactor.core.Exceptions$ReactiveException: java.lang.InterruptedException
	at reactor.core.Exceptions.propagate(Exceptions.java:396)
	at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:91)
	at reactor.core.publisher.Mono.block(Mono.java:1742)
	at com.azure.messaging.servicebus.ServiceBusReceivedMessageContext.complete(ServiceBusReceivedMessageContext.java:81)
	at at.test.AzureTestMain.receiveMessage(AzureTestMain.java:115)
	at com.azure.messaging.servicebus.ServiceBusProcessorClient$1.onNext(ServiceBusProcessorClient.java:383)
	... 10 common frames omitted
Caused by: java.lang.InterruptedException: null
	at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1040)
	at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1345)
	at java.base/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:232)
	at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:87)
	... 14 common frames omitted
2023-02-14 14:43:38.441+0100 07 [boundedElastic-4] E                at.test.AzureTestMain - Error in message processing in thread boundedElastic-4
com.azure.messaging.servicebus.ServiceBusException: java.lang.InterruptedException
	at com.azure.messaging.servicebus.ServiceBusProcessorClient$1.onNext(ServiceBusProcessorClient.java:387)
	at com.azure.messaging.servicebus.ServiceBusProcessorClient$1.onNext(ServiceBusProcessorClient.java:362)
	at reactor.core.publisher.FluxPublishOn$PublishOnSubscriber.runAsync(FluxPublishOn.java:440)
	at reactor.core.publisher.FluxPublishOn$PublishOnSubscriber.run(FluxPublishOn.java:527)
	at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:84)
	at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:37)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: reactor.core.Exceptions$ReactiveException: java.lang.InterruptedException
	at reactor.core.Exceptions.propagate(Exceptions.java:396)
	at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:91)
	at reactor.core.publisher.Mono.block(Mono.java:1742)
	at com.azure.messaging.servicebus.ServiceBusReceivedMessageContext.complete(ServiceBusReceivedMessageContext.java:81)
	at at.test.AzureTestMain.receiveMessage(AzureTestMain.java:115)
	at com.azure.messaging.servicebus.ServiceBusProcessorClient$1.onNext(ServiceBusProcessorClient.java:383)
	... 10 common frames omitted
Caused by: java.lang.InterruptedException: null
	at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1040)
	at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1345)
	at java.base/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:232)
	at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:87)
	... 14 common frames omitted

My previous local setup was a small springboot application similar to our real application and the Eclipse termination action for apps is just ruthlessly killing them so there never was a clean shutdown with a close so that explains why I never saw them before.
Checking our server logs I can also confirm they are indeed also present regularly there on shutdown of the applications with previous hanging threads, it's just they usually happen several hours later and no one ever seemed to ever correlate them to the hanging threads, but maybe this can provide some further input on where exactly the problem is

@liukun-msft liukun-msft modified the milestones: Backlog, 2023-03 Feb 15, 2023
@liukun-msft
Copy link
Contributor Author

Hi @TheDevOps
Thank you for the details. Now I can repro the same issue. I will try to find the root cause and keep you updated.

@liukun-msft
Copy link
Contributor Author

liukun-msft commented Feb 16, 2023

I found the root cause is that the internal timeout we set for the complete()/abandon() calls are removed when link is closed completely. So when connection is recovering, threads those are blocked in complete()/abandon() calls will wait forever.

This is why if you add a timeout option to block() you can solve this problem, but I am afraid it may bring in side effects since there is an internal timeout already.

Give more internal context for this issue:

  1. The complete()/abandon() will send update disposition request to service bus with different disposition status, and we wrap these update disposition tasks as UpdateDispositionWorkItem and set a timeout value for them.
  2. We have a subscription that periodically cleans up workItems which has exceeded its timeout.
  3. When the link is closing, ideally we will wait all pending workItems finished and then dispose the periodically clean up subscription.
  4. However, the 'onLinkFinal' signal will dispose the link closure subscription and dispose the periodically clean up subscription without waiting all pending workItems finished, so threads will hang on this stage after connection recovered.

I'll continue to work for a fix for this issue.

@liukun-msft liukun-msft added amqp Label for tracking issues related to AMQP issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. and removed issue-addressed Workflow: The Azure SDK team believes it to be addressed and ready to close. labels Feb 16, 2023
@Azure Azure deleted a comment Feb 16, 2023
@TheDevOps
Copy link

Great to hear you already managed to identify the root cause and thanks for the detailed explanation, everything makes a lot of sense now!
I've also gone ahead and closed my pull request, since as you said I can totally see how having another timeout in a different place for the same thing is likely to cause further problems down the line and since you are already looking into addressing the root cause it's not needed anyway.

@anuchandy
Copy link
Member

@TheDevOps We've merged the fix for this issue that was root caused using the sample and the logs you shared. Towards the 2nd last week of March, this will be released. Thank you for collaborating on this!

@github-actions github-actions bot locked and limited conversation to collaborators May 28, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
amqp Label for tracking issues related to AMQP Service Bus
Projects
None yet
3 participants