grpc-js 1.4.1 - Channelz tracing uses unbounded memory which causes OOM #1941

tom-engineering · 2021-10-17T20:08:52Z

Problem description

Up front: I wouldn't necessarily call this a bug but it has caused an issue in our code.

We use a single grpc-js client to service calls to the Google Ads API. This is instantiated when the server is initialized and used throughout the life of the program to try and reduce memory leakage. This works quite well, memory use is relatively stable, but when grpc-js is upgraded to 1.4.1 (from 1.3.7), memory usage starts to increase linearly even though our implementation is the same. This seems to be caused by Date objects (and others) being created when events are appended to the ChannelzTrace object associated with our channel.

This may otherwise be a small amount of memory but we service a very large amount of traffic through this server and it eventually causes an OOM event.

Should we cease caching our grpc client? What would be the recommended lifecycle of a client in that case? Or is there a way to opt out of tracing?

Reproduction steps

Create a single grpc Client object and fire traffic to it at a rate of 5 requests per second over 10 minutes and observe memory usage climbing without reaching a cap.

Environment

OS name, version and architecture - Tested on Linux ARM64 and macOS ARM64
Node version - 16
Node installation method - Docker image on server and nvm locally
Package name and version - 1.4.1

Additional context

I'm not sure what would be most helpful but I am happy to provide anything required.

murgatroid99 · 2021-10-18T16:08:28Z

It's not supposed to do traces on a per-request basis. The traces are all about connection state, which shouldn't change more quickly when there are more requests. Because of that, if anything, caching the client should decrease the number of traces that occur, so I wouldn't recommend changing that.

It might help to see what information is actually getting stored in channelz in your case. You can run a channelz server with code like this:

const server = new grpc.Server();
grpc.addAdminServicesToServer(server);
// Modify the first argument to suit your environment
server.bindAsync('localhost:0', ServerCredentials.createInsecure(), (error, port) => {
  console.log(`Serving channelz on port ${port}`);
  server.start();
});

Then you can access it using the grpcdebug tool. I would be interested to see the output of grpcdebug <server address> channelz channels and grpcdebug <server address> channelz channel <channel id> where "server address" is the address of the channelz server, and "channel id" is the ID of one of the channels from the first command that has a significant number of calls started.

In the meantime, I will push out an update that rotates out old channelz trace events on a per-channel/subchannel/server basis.

tom-engineering · 2021-10-19T09:09:39Z

Thank you for the swift and helpful response! We have had issues with pushing too many concurrent requests down the same channel which seems to (at least implicitly) affect channel status, and we force round-robin on as well - I don't suppose that may be adding to the volume of trace events? I will find some time to inspect the channelz logs.

murgatroid99 · 2021-10-19T15:38:24Z

I would be interested to see a trace log for one of your tests, which can be output by setting the environment variables GRPC_TRACE=all and GRPC_VERBOSITY=DEBUG. They should show the details of the things you are describing.

murgatroid99 · 2021-10-26T21:31:38Z

This took a little longer than I expected, but grpc-js version 1.4.2 is out with two relevant changes: Each channel, subchannel, and server will now store only a limited number of trace events at any one time, and channelz stats tracking can be disabled entirely for a channel and its subchannels, or a server, with the channel option 'grpc.enable_channelz': 0.

murgatroid99 · 2021-11-01T15:31:04Z

If that change does solve the problem, it would still help to see a log to understand what caused the problem in the first place.

tom-engineering · 2021-11-05T14:29:57Z

Posting this here as a potential FYI as it may be a bug with Google Ads API (or us messing something up) but setting the 'grpc.enable_channelz': 0 option using grpc-js 1.4.2 makes all our calls to the Google Ads API exceed deadline (set to 60s). I've tried to condense down the code we actually use to make the request but I've had to cut quite a bit, hopefully it may provide some clues. I'll post any updates if we find them.

Not setting the "grpc.enable_channelz" option at all works seemingly fine, and setting a random unknown option of "grpc.unknown_option": 0 works fine as well (was testing to see if it passes the value to the GA API which potentially doesn't like it).

const Client = grpc.makeGenericClientConstructor({}, "", {});

const auth = new JWT(authOptions);
const callCredentials = grpc.credentials.createFromGoogleCredential(auth);
const channelCredentials = grpc.credentials.createSsl();
const serviceConfig = JSON.stringify({
  loadBalancingConfig: [{ round_robin: {} }],
});

const client = new Client(
  GOOGLE_ADS_ENDPOINT,
  grpc.credentials.combineChannelCredentials(
    channelCredentials,
    callCredentials
  ),
  { "grpc.service_config": serviceConfig, "grpc.enable_channelz": 0 }
);

client.makeUnaryRequest(
  `/google.ads.googleads.${GOOGLE_ADS_VERSION}.services.${serviceName}/${method.name}`,
  (value: Uint8Array) => Buffer.from(value),
  (value: Buffer) => value,
  requestData,
  this.metadata,
  {
    deadline: timeout ? Date.now() + timeout : undefined,
  },
  function (err, value) {
    if (isServiceError(err)) {
      err = new GaClientError(err);
    }
    callback(err, value);
  }
);

murgatroid99 · 2021-11-05T16:44:18Z

No, now that you pointed it out, I found a bug with the handling of that option in grpc-js. I will fix that. In the meantime, without that option, do you still see the unbounded memory growth?

tom-engineering · 2021-11-05T17:04:22Z

We are still seeing significant additional memory usage over the baseline (so far about 2.5x memory usage - 800MiB to 2GiB), but we do now utilise a static pool of grpc clients so there may be many multiples of channels and subchannels logging. I'm just waiting to see if it tops out or keeps going in prod but we may hit our memory limit still so bear with me on that. I'll try and get a sample of some logs in the meantime.

murgatroid99 · 2021-11-05T20:56:07Z

The bug with that option is fixed in version 1.4.3.

Gilwe · 2021-11-07T10:01:47Z

Hi, we're in the same boat but we see also an increase in memory with 1.4.3 (also node 16).

Our case is an express server that serves graphql and for each graphql will call an internal grpc service. when we used grpc library we had no memory problems but with @grpc/grpc-node the memory is climbing on every request.
When I took a heap snapshot I noticed that we had a lot of ChannelImplementation rows so I explicitly ran close for all the grpc clients (client = all services created from grpc.loadPackageDefinition()) which made the memory increment not to be as steep but the memory leak was still there.

I tried to specify enable_channelz:0 but saw no improvement.

@murgatroid99 logs from what would be helpful for you?

tom-engineering · 2021-11-08T09:07:15Z

@murgatroid99 I am casting around in the dark so I really apologise if this is way off and a distraction but is this line the culprit?

grpc-node/packages/grpc-js/src/subchannel.ts

Line 558 in cb29b6a

    
           registerChannelzSocket(this.subchannelAddressString, () => this.getChannelzSocketInfo()!);

There's a guarded invocation in the same function that assigns the output to a ref, but then this unguarded invocation that does nothing with the output at the end of the function. I'm wondering if the unguarded version should be there at all?

private createSession(proxyConnectionResult: ProxyConnectionResult) {
  [...]
  if (this.channelzEnabled) {
    this.channelzSocketRef = registerChannelzSocket(this.subchannelAddressString, () => this.getChannelzSocketInfo()!);
    this.childrenTracker.refChild(this.channelzSocketRef);
  }
  [...]
  registerChannelzSocket(this.subchannelAddressString, () => this.getChannelzSocketInfo()!);
}

I don't seem to get a memory leak if I comment out the unguarded line when enable_channelz:0 is set, so it's possible that this may be the cause of the actual leak and why turning enable_channelz off doesn't seem to help.

murgatroid99 · 2021-11-08T18:17:38Z

You're right, that line shouldn't be there at all, which explains why I missed it when adding the guards, and the output never gets recorded, so it never gets unregistered. I have published a new version 1.4.4 that removes it.

@Gilwe A lot of ChannelImplementation objects is a big red flag here. If you are creating a new client object for every request you make, that would leak a lot of memory. We discourage that usage pattern, instead you should create one client, and use it to make all of your requests. If you do have to create many clients, you should close each one when you are done using it.

Even with the extra line mentioned in the last comment, setting the channel option to disable channelz should have had a significant impact on the rate of the memory leak, if you set the option 'grpc.enable_channelz': 0 as stated in my earlier comment. If you actually tried to set an option called enable_channelz, that would do nothing because it's not a recognized option.

If you still have problems with the new version, as stated in my earlier comment, the logs that would help here are client trace logs, which you can get by running your code with the environment variables GRPC_TRACE=all and GRPC_VERBOSITY=DEBUG. Just the first few hundred lines would probably be enough to understand what is happening.

tom-engineering · 2021-11-08T21:05:47Z

1.4.4 has completely fixed the memory leak for me with channelz tracing disabled, thank you! Not sure if the above was to me or @Gilwe but I had been setting "grpc.enable_channelz" correctly, I had just abbreviated in my comment.

In terms of the actual channelz tracing itself, in our case it looks like it might be frequent TRANSIENT_FAILURE state transitions that fill up the logs. This is a set of entries from a completely idle application I left running by accident on my local machine just now with no traffic going through it:

D 2021-11-08T20:47:51.384Z | subchannel | (-1) 2a00:1450:4009:819::200a:443 connection closed with error connect ETIMEDOUT 2a00:1450:4009:819::200a:443
D 2021-11-08T20:47:51.385Z | subchannel | (-1) 2a00:1450:4009:819::200a:443 connection closed
D 2021-11-08T20:47:51.385Z | subchannel | (-1) 2a00:1450:4009:819::200a:443 CONNECTING -> TRANSIENT_FAILURE
D 2021-11-08T20:47:51.385Z | round_robin | READY -> READY
D 2021-11-08T20:47:51.385Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:47:51.385Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:47:51.385Z | dns_resolver | Resolution update requested for target dns:googleads.googleapis.com:443
D 2021-11-08T20:47:51.385Z | subchannel | (-1) 2a00:1450:4009:819::200a:443 TRANSIENT_FAILURE -> CONNECTING
D 2021-11-08T20:47:51.385Z | round_robin | READY -> READY
D 2021-11-08T20:47:51.385Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:47:51.386Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:47:51.386Z | subchannel | (-1) 2a00:1450:4009:819::200a:443 creating HTTP/2 session
D 2021-11-08T20:47:51.387Z | dns_resolver | Resolved addresses for target dns:googleads.googleapis.com:443: [2a00:1450:4009:819::200a:443,142.250.179.234:443]
D 2021-11-08T20:47:51.387Z | subchannel_refcount | (-1) 2a00:1450:4009:819::200a:443 refcount 2 -> 1
D 2021-11-08T20:47:51.387Z | subchannel_refcount | (-1) 142.250.179.234:443 refcount 2 -> 1
D 2021-11-08T20:47:51.387Z | round_robin | Connect to address list 2a00:1450:4009:819::200a:443,142.250.179.234:443
D 2021-11-08T20:47:51.387Z | subchannel_refcount | (-1) 2a00:1450:4009:819::200a:443 refcount 1 -> 2
D 2021-11-08T20:47:51.387Z | subchannel_refcount | (-1) 142.250.179.234:443 refcount 1 -> 2
D 2021-11-08T20:47:51.387Z | round_robin | READY -> READY
D 2021-11-08T20:47:51.387Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:47:51.387Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:49:06.385Z | subchannel | (-1) 2a00:1450:4009:819::200a:443 connection closed with error connect ETIMEDOUT 2a00:1450:4009:819::200a:443
D 2021-11-08T20:49:06.385Z | subchannel | (-1) 2a00:1450:4009:819::200a:443 connection closed
D 2021-11-08T20:49:06.385Z | subchannel | (-1) 2a00:1450:4009:819::200a:443 CONNECTING -> TRANSIENT_FAILURE
D 2021-11-08T20:49:06.386Z | round_robin | READY -> READY
D 2021-11-08T20:49:06.386Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:49:06.386Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:49:06.386Z | dns_resolver | Resolution update requested for target dns:googleads.googleapis.com:443
D 2021-11-08T20:49:06.386Z | subchannel | (-1) 2a00:1450:4009:819::200a:443 TRANSIENT_FAILURE -> CONNECTING
D 2021-11-08T20:49:06.386Z | round_robin | READY -> READY
D 2021-11-08T20:49:06.386Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:49:06.386Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:49:06.386Z | subchannel | (-1) 2a00:1450:4009:819::200a:443 creating HTTP/2 session
D 2021-11-08T20:49:06.388Z | dns_resolver | Resolved addresses for target dns:googleads.googleapis.com:443: [2a00:1450:4009:819::200a:443,142.250.179.234:443]
D 2021-11-08T20:49:06.388Z | subchannel_refcount | (-1) 2a00:1450:4009:819::200a:443 refcount 2 -> 1
D 2021-11-08T20:49:06.388Z | subchannel_refcount | (-1) 142.250.179.234:443 refcount 2 -> 1
D 2021-11-08T20:49:06.388Z | round_robin | Connect to address list 2a00:1450:4009:819::200a:443,142.250.179.234:443
D 2021-11-08T20:49:06.388Z | subchannel_refcount | (-1) 2a00:1450:4009:819::200a:443 refcount 1 -> 2
D 2021-11-08T20:49:06.388Z | subchannel_refcount | (-1) 142.250.179.234:443 refcount 1 -> 2
D 2021-11-08T20:49:06.388Z | round_robin | READY -> READY
D 2021-11-08T20:49:06.388Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:49:06.388Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:49:21.456Z | subchannel | (-1) 142.250.179.234:443 connection closed by GOAWAY with code 0
D 2021-11-08T20:49:21.456Z | subchannel | (-1) 142.250.179.234:443 READY -> IDLE
D 2021-11-08T20:49:21.457Z | round_robin | READY -> CONNECTING
D 2021-11-08T20:49:21.457Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> CONNECTING
D 2021-11-08T20:49:21.457Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> CONNECTING
D 2021-11-08T20:49:21.457Z | dns_resolver | Resolution update requested for target dns:googleads.googleapis.com:443
D 2021-11-08T20:49:21.457Z | subchannel | (-1) 142.250.179.234:443 IDLE -> CONNECTING
D 2021-11-08T20:49:21.457Z | round_robin | CONNECTING -> CONNECTING
D 2021-11-08T20:49:21.457Z | resolving_load_balancer | dns:googleads.googleapis.com:443 CONNECTING -> CONNECTING
D 2021-11-08T20:49:21.457Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 CONNECTING -> CONNECTING
D 2021-11-08T20:49:21.457Z | subchannel | (-1) 142.250.179.234:443 creating HTTP/2 session
D 2021-11-08T20:49:21.459Z | dns_resolver | Resolved addresses for target dns:googleads.googleapis.com:443: [2a00:1450:4009:819::200a:443,142.250.179.234:443]
D 2021-11-08T20:49:21.459Z | subchannel_refcount | (-1) 2a00:1450:4009:819::200a:443 refcount 2 -> 1
D 2021-11-08T20:49:21.459Z | subchannel_refcount | (-1) 142.250.179.234:443 refcount 2 -> 1
D 2021-11-08T20:49:21.459Z | round_robin | Connect to address list 2a00:1450:4009:819::200a:443,142.250.179.234:443
D 2021-11-08T20:49:21.459Z | subchannel_refcount | (-1) 2a00:1450:4009:819::200a:443 refcount 1 -> 2
D 2021-11-08T20:49:21.459Z | subchannel_refcount | (-1) 142.250.179.234:443 refcount 1 -> 2
D 2021-11-08T20:49:21.459Z | round_robin | CONNECTING -> CONNECTING
D 2021-11-08T20:49:21.459Z | resolving_load_balancer | dns:googleads.googleapis.com:443 CONNECTING -> CONNECTING
D 2021-11-08T20:49:21.459Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 CONNECTING -> CONNECTING
D 2021-11-08T20:49:21.513Z | subchannel | (-1) 142.250.179.234:443 CONNECTING -> READY
D 2021-11-08T20:49:21.513Z | round_robin | CONNECTING -> READY
D 2021-11-08T20:49:21.514Z | resolving_load_balancer | dns:googleads.googleapis.com:443 CONNECTING -> READY
D 2021-11-08T20:49:21.514Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 CONNECTING -> READY
D 2021-11-08T20:50:21.387Z | subchannel | (-1) 2a00:1450:4009:819::200a:443 connection closed with error connect ETIMEDOUT 2a00:1450:4009:819::200a:443
D 2021-11-08T20:50:21.387Z | subchannel | (-1) 2a00:1450:4009:819::200a:443 connection closed
D 2021-11-08T20:50:21.387Z | subchannel | (-1) 2a00:1450:4009:819::200a:443 CONNECTING -> TRANSIENT_FAILURE
D 2021-11-08T20:50:21.388Z | round_robin | READY -> READY
D 2021-11-08T20:50:21.388Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:50:21.388Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:50:21.388Z | dns_resolver | Resolution update requested for target dns:googleads.googleapis.com:443
D 2021-11-08T20:50:21.388Z | subchannel | (-1) 2a00:1450:4009:819::200a:443 TRANSIENT_FAILURE -> CONNECTING
D 2021-11-08T20:50:21.388Z | round_robin | READY -> READY
D 2021-11-08T20:50:21.388Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:50:21.388Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:50:21.388Z | subchannel | (-1) 2a00:1450:4009:819::200a:443 creating HTTP/2 session
D 2021-11-08T20:50:21.390Z | dns_resolver | Resolved addresses for target dns:googleads.googleapis.com:443: [2a00:1450:4009:823::200a:443,172.217.169.74:443]
D 2021-11-08T20:50:21.390Z | subchannel_refcount | (-1) 2a00:1450:4009:819::200a:443 refcount 2 -> 1
D 2021-11-08T20:50:21.390Z | subchannel_refcount | (-1) 142.250.179.234:443 refcount 2 -> 1
D 2021-11-08T20:50:21.390Z | round_robin | Connect to address list 2a00:1450:4009:823::200a:443,172.217.169.74:443
D 2021-11-08T20:50:21.390Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 Subchannel constructed with options {
  "grpc.service_config": "{\"loadBalancingConfig\":[{\"round_robin\":{}}]}",
  "grpc.enable_channelz": 0
}
D 2021-11-08T20:50:21.390Z | subchannel_refcount | (-1) 2a00:1450:4009:823::200a:443 refcount 0 -> 1
D 2021-11-08T20:50:21.390Z | subchannel | (-1) 172.217.169.74:443 Subchannel constructed with options {
  "grpc.service_config": "{\"loadBalancingConfig\":[{\"round_robin\":{}}]}",
  "grpc.enable_channelz": 0
}
D 2021-11-08T20:50:21.390Z | subchannel_refcount | (-1) 172.217.169.74:443 refcount 0 -> 1
D 2021-11-08T20:50:21.390Z | subchannel_refcount | (-1) 2a00:1450:4009:823::200a:443 refcount 1 -> 2
D 2021-11-08T20:50:21.391Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 IDLE -> CONNECTING
D 2021-11-08T20:50:21.391Z | round_robin | READY -> CONNECTING
D 2021-11-08T20:50:21.391Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> CONNECTING
D 2021-11-08T20:50:21.391Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> CONNECTING
D 2021-11-08T20:50:21.391Z | subchannel_refcount | (-1) 172.217.169.74:443 refcount 1 -> 2
D 2021-11-08T20:50:21.391Z | subchannel | (-1) 172.217.169.74:443 IDLE -> CONNECTING
D 2021-11-08T20:50:21.391Z | round_robin | CONNECTING -> CONNECTING
D 2021-11-08T20:50:21.391Z | resolving_load_balancer | dns:googleads.googleapis.com:443 CONNECTING -> CONNECTING
D 2021-11-08T20:50:21.391Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 CONNECTING -> CONNECTING
D 2021-11-08T20:50:21.391Z | round_robin | CONNECTING -> CONNECTING
D 2021-11-08T20:50:21.391Z | resolving_load_balancer | dns:googleads.googleapis.com:443 CONNECTING -> CONNECTING
D 2021-11-08T20:50:21.391Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 CONNECTING -> CONNECTING
D 2021-11-08T20:50:21.391Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 creating HTTP/2 session
D 2021-11-08T20:50:21.394Z | subchannel | (-1) 172.217.169.74:443 creating HTTP/2 session
D 2021-11-08T20:50:21.449Z | subchannel | (-1) 172.217.169.74:443 CONNECTING -> READY
D 2021-11-08T20:50:21.449Z | round_robin | CONNECTING -> READY
D 2021-11-08T20:50:21.449Z | resolving_load_balancer | dns:googleads.googleapis.com:443 CONNECTING -> READY
D 2021-11-08T20:50:21.449Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 CONNECTING -> READY
D 2021-11-08T20:50:26.887Z | subchannel_refcount | (-1) 2a00:1450:4009:819::200a:443 refcount 1 -> 0
D 2021-11-08T20:50:26.887Z | subchannel | (-1) 2a00:1450:4009:819::200a:443 CONNECTING -> TRANSIENT_FAILURE
D 2021-11-08T20:50:26.888Z | subchannel_refcount | (-1) 142.250.179.234:443 refcount 1 -> 0
D 2021-11-08T20:50:26.888Z | subchannel | (-1) 142.250.179.234:443 READY -> TRANSIENT_FAILURE
D 2021-11-08T20:50:26.889Z | subchannel | (-1) 2a00:1450:4009:819::200a:443 TRANSIENT_FAILURE -> IDLE
D 2021-11-08T20:50:26.889Z | subchannel | (-1) 142.250.179.234:443 TRANSIENT_FAILURE -> IDLE
D 2021-11-08T20:51:36.394Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 connection closed with error connect ETIMEDOUT 2a00:1450:4009:823::200a:443
D 2021-11-08T20:51:36.394Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 connection closed
D 2021-11-08T20:51:36.394Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 CONNECTING -> TRANSIENT_FAILURE
D 2021-11-08T20:51:36.395Z | round_robin | READY -> READY
D 2021-11-08T20:51:36.395Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:51:36.395Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:51:36.395Z | dns_resolver | Resolution update requested for target dns:googleads.googleapis.com:443
D 2021-11-08T20:51:36.395Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 TRANSIENT_FAILURE -> CONNECTING
D 2021-11-08T20:51:36.395Z | round_robin | READY -> READY
D 2021-11-08T20:51:36.395Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:51:36.395Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:51:36.395Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 creating HTTP/2 session
D 2021-11-08T20:51:36.398Z | dns_resolver | Resolved addresses for target dns:googleads.googleapis.com:443: [2a00:1450:4009:823::200a:443,172.217.169.74:443]
D 2021-11-08T20:51:36.398Z | subchannel_refcount | (-1) 2a00:1450:4009:823::200a:443 refcount 2 -> 1
D 2021-11-08T20:51:36.398Z | subchannel_refcount | (-1) 172.217.169.74:443 refcount 2 -> 1
D 2021-11-08T20:51:36.398Z | round_robin | Connect to address list 2a00:1450:4009:823::200a:443,172.217.169.74:443
D 2021-11-08T20:51:36.398Z | subchannel_refcount | (-1) 2a00:1450:4009:823::200a:443 refcount 1 -> 2
D 2021-11-08T20:51:36.398Z | subchannel_refcount | (-1) 172.217.169.74:443 refcount 1 -> 2
D 2021-11-08T20:51:36.398Z | round_robin | READY -> READY
D 2021-11-08T20:51:36.398Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:51:36.398Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:52:51.396Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 connection closed with error connect ETIMEDOUT 2a00:1450:4009:823::200a:443
D 2021-11-08T20:52:51.396Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 connection closed
D 2021-11-08T20:52:51.396Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 CONNECTING -> TRANSIENT_FAILURE
D 2021-11-08T20:52:51.397Z | round_robin | READY -> READY
D 2021-11-08T20:52:51.397Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:52:51.397Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:52:51.397Z | dns_resolver | Resolution update requested for target dns:googleads.googleapis.com:443
D 2021-11-08T20:52:51.397Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 TRANSIENT_FAILURE -> CONNECTING
D 2021-11-08T20:52:51.397Z | round_robin | READY -> READY
D 2021-11-08T20:52:51.397Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:52:51.397Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:52:51.398Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 creating HTTP/2 session
D 2021-11-08T20:52:51.399Z | dns_resolver | Resolved addresses for target dns:googleads.googleapis.com:443: [2a00:1450:4009:823::200a:443,172.217.169.74:443]
D 2021-11-08T20:52:51.399Z | subchannel_refcount | (-1) 2a00:1450:4009:823::200a:443 refcount 2 -> 1
D 2021-11-08T20:52:51.399Z | subchannel_refcount | (-1) 172.217.169.74:443 refcount 2 -> 1
D 2021-11-08T20:52:51.399Z | round_robin | Connect to address list 2a00:1450:4009:823::200a:443,172.217.169.74:443
D 2021-11-08T20:52:51.399Z | subchannel_refcount | (-1) 2a00:1450:4009:823::200a:443 refcount 1 -> 2
D 2021-11-08T20:52:51.399Z | subchannel_refcount | (-1) 172.217.169.74:443 refcount 1 -> 2
D 2021-11-08T20:52:51.399Z | round_robin | READY -> READY
D 2021-11-08T20:52:51.399Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:52:51.399Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:54:06.406Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 connection closed with error connect ETIMEDOUT 2a00:1450:4009:823::200a:443
D 2021-11-08T20:54:06.407Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 connection closed
D 2021-11-08T20:54:06.407Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 CONNECTING -> TRANSIENT_FAILURE
D 2021-11-08T20:54:06.407Z | round_robin | READY -> READY
D 2021-11-08T20:54:06.407Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:54:06.407Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:54:06.407Z | dns_resolver | Resolution update requested for target dns:googleads.googleapis.com:443
D 2021-11-08T20:54:06.408Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 TRANSIENT_FAILURE -> CONNECTING
D 2021-11-08T20:54:06.408Z | round_robin | READY -> READY
D 2021-11-08T20:54:06.408Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:54:06.408Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:54:06.408Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 creating HTTP/2 session
D 2021-11-08T20:54:06.410Z | dns_resolver | Resolved addresses for target dns:googleads.googleapis.com:443: [2a00:1450:4009:823::200a:443,172.217.169.74:443]
D 2021-11-08T20:54:06.411Z | subchannel_refcount | (-1) 2a00:1450:4009:823::200a:443 refcount 2 -> 1
D 2021-11-08T20:54:06.411Z | subchannel_refcount | (-1) 172.217.169.74:443 refcount 2 -> 1
D 2021-11-08T20:54:06.411Z | round_robin | Connect to address list 2a00:1450:4009:823::200a:443,172.217.169.74:443
D 2021-11-08T20:54:06.411Z | subchannel_refcount | (-1) 2a00:1450:4009:823::200a:443 refcount 1 -> 2
D 2021-11-08T20:54:06.411Z | subchannel_refcount | (-1) 172.217.169.74:443 refcount 1 -> 2
D 2021-11-08T20:54:06.411Z | round_robin | READY -> READY
D 2021-11-08T20:54:06.411Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:54:06.411Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:54:21.474Z | subchannel | (-1) 172.217.169.74:443 connection closed by GOAWAY with code 0
D 2021-11-08T20:54:21.474Z | subchannel | (-1) 172.217.169.74:443 READY -> IDLE
D 2021-11-08T20:54:21.474Z | round_robin | READY -> CONNECTING
D 2021-11-08T20:54:21.474Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> CONNECTING
D 2021-11-08T20:54:21.474Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> CONNECTING
D 2021-11-08T20:54:21.474Z | dns_resolver | Resolution update requested for target dns:googleads.googleapis.com:443
D 2021-11-08T20:54:21.474Z | subchannel | (-1) 172.217.169.74:443 IDLE -> CONNECTING
D 2021-11-08T20:54:21.474Z | round_robin | CONNECTING -> CONNECTING
D 2021-11-08T20:54:21.475Z | resolving_load_balancer | dns:googleads.googleapis.com:443 CONNECTING -> CONNECTING
D 2021-11-08T20:54:21.475Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 CONNECTING -> CONNECTING
D 2021-11-08T20:54:21.475Z | subchannel | (-1) 172.217.169.74:443 creating HTTP/2 session
D 2021-11-08T20:54:21.475Z | dns_resolver | Resolved addresses for target dns:googleads.googleapis.com:443: [2a00:1450:4009:823::200a:443,172.217.169.74:443]
D 2021-11-08T20:54:21.476Z | subchannel_refcount | (-1) 2a00:1450:4009:823::200a:443 refcount 2 -> 1
D 2021-11-08T20:54:21.476Z | subchannel_refcount | (-1) 172.217.169.74:443 refcount 2 -> 1
D 2021-11-08T20:54:21.476Z | round_robin | Connect to address list 2a00:1450:4009:823::200a:443,172.217.169.74:443
D 2021-11-08T20:54:21.476Z | subchannel_refcount | (-1) 2a00:1450:4009:823::200a:443 refcount 1 -> 2
D 2021-11-08T20:54:21.476Z | subchannel_refcount | (-1) 172.217.169.74:443 refcount 1 -> 2
D 2021-11-08T20:54:21.476Z | round_robin | CONNECTING -> CONNECTING
D 2021-11-08T20:54:21.476Z | resolving_load_balancer | dns:googleads.googleapis.com:443 CONNECTING -> CONNECTING
D 2021-11-08T20:54:21.476Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 CONNECTING -> CONNECTING
D 2021-11-08T20:54:21.528Z | subchannel | (-1) 172.217.169.74:443 CONNECTING -> READY
D 2021-11-08T20:54:21.528Z | round_robin | CONNECTING -> READY
D 2021-11-08T20:54:21.528Z | resolving_load_balancer | dns:googleads.googleapis.com:443 CONNECTING -> READY
D 2021-11-08T20:54:21.528Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 CONNECTING -> READY
D 2021-11-08T20:55:21.410Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 connection closed with error connect ETIMEDOUT 2a00:1450:4009:823::200a:443
D 2021-11-08T20:55:21.410Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 connection closed
D 2021-11-08T20:55:21.410Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 CONNECTING -> TRANSIENT_FAILURE
D 2021-11-08T20:55:21.410Z | round_robin | READY -> READY
D 2021-11-08T20:55:21.410Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:55:21.410Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:55:21.410Z | dns_resolver | Resolution update requested for target dns:googleads.googleapis.com:443
D 2021-11-08T20:55:21.411Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 TRANSIENT_FAILURE -> CONNECTING
D 2021-11-08T20:55:21.411Z | round_robin | READY -> READY
D 2021-11-08T20:55:21.411Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:55:21.411Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:55:21.411Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 creating HTTP/2 session
D 2021-11-08T20:55:21.412Z | dns_resolver | Resolved addresses for target dns:googleads.googleapis.com:443: [2a00:1450:4009:823::200a:443,172.217.169.74:443]
D 2021-11-08T20:55:21.412Z | subchannel_refcount | (-1) 2a00:1450:4009:823::200a:443 refcount 2 -> 1
D 2021-11-08T20:55:21.412Z | subchannel_refcount | (-1) 172.217.169.74:443 refcount 2 -> 1
D 2021-11-08T20:55:21.412Z | round_robin | Connect to address list 2a00:1450:4009:823::200a:443,172.217.169.74:443
D 2021-11-08T20:55:21.412Z | subchannel_refcount | (-1) 2a00:1450:4009:823::200a:443 refcount 1 -> 2
D 2021-11-08T20:55:21.412Z | subchannel_refcount | (-1) 172.217.169.74:443 refcount 1 -> 2
D 2021-11-08T20:55:21.412Z | round_robin | READY -> READY
D 2021-11-08T20:55:21.412Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:55:21.412Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:56:36.413Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 connection closed with error connect ETIMEDOUT 2a00:1450:4009:823::200a:443
D 2021-11-08T20:56:36.414Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 connection closed
D 2021-11-08T20:56:36.414Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 CONNECTING -> TRANSIENT_FAILURE
D 2021-11-08T20:56:36.415Z | round_robin | READY -> READY
D 2021-11-08T20:56:36.415Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:56:36.415Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:56:36.415Z | dns_resolver | Resolution update requested for target dns:googleads.googleapis.com:443
D 2021-11-08T20:56:36.415Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 TRANSIENT_FAILURE -> CONNECTING
D 2021-11-08T20:56:36.416Z | round_robin | READY -> READY
D 2021-11-08T20:56:36.416Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:56:36.416Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> READY
D 2021-11-08T20:56:36.416Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 creating HTTP/2 session
D 2021-11-08T20:56:36.419Z | dns_resolver | Resolved addresses for target dns:googleads.googleapis.com:443: [2a00:1450:4009:815::200a:443,142.250.179.234:443]
D 2021-11-08T20:56:36.419Z | subchannel_refcount | (-1) 2a00:1450:4009:823::200a:443 refcount 2 -> 1
D 2021-11-08T20:56:36.419Z | subchannel_refcount | (-1) 172.217.169.74:443 refcount 2 -> 1
D 2021-11-08T20:56:36.419Z | round_robin | Connect to address list 2a00:1450:4009:815::200a:443,142.250.179.234:443
D 2021-11-08T20:56:36.419Z | subchannel | (-1) 2a00:1450:4009:815::200a:443 Subchannel constructed with options {
  "grpc.service_config": "{\"loadBalancingConfig\":[{\"round_robin\":{}}]}",
  "grpc.enable_channelz": 0
}
D 2021-11-08T20:56:36.419Z | subchannel_refcount | (-1) 2a00:1450:4009:815::200a:443 refcount 0 -> 1
D 2021-11-08T20:56:36.419Z | subchannel | (-1) 142.250.179.234:443 Subchannel constructed with options {
  "grpc.service_config": "{\"loadBalancingConfig\":[{\"round_robin\":{}}]}",
  "grpc.enable_channelz": 0
}
D 2021-11-08T20:56:36.419Z | subchannel_refcount | (-1) 142.250.179.234:443 refcount 0 -> 1
D 2021-11-08T20:56:36.419Z | subchannel_refcount | (-1) 2a00:1450:4009:815::200a:443 refcount 1 -> 2
D 2021-11-08T20:56:36.419Z | subchannel | (-1) 2a00:1450:4009:815::200a:443 IDLE -> CONNECTING
D 2021-11-08T20:56:36.420Z | round_robin | READY -> CONNECTING
D 2021-11-08T20:56:36.420Z | resolving_load_balancer | dns:googleads.googleapis.com:443 READY -> CONNECTING
D 2021-11-08T20:56:36.420Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 READY -> CONNECTING
D 2021-11-08T20:56:36.420Z | subchannel_refcount | (-1) 142.250.179.234:443 refcount 1 -> 2
D 2021-11-08T20:56:36.420Z | subchannel | (-1) 142.250.179.234:443 IDLE -> CONNECTING
D 2021-11-08T20:56:36.420Z | round_robin | CONNECTING -> CONNECTING
D 2021-11-08T20:56:36.420Z | resolving_load_balancer | dns:googleads.googleapis.com:443 CONNECTING -> CONNECTING
D 2021-11-08T20:56:36.420Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 CONNECTING -> CONNECTING
D 2021-11-08T20:56:36.420Z | round_robin | CONNECTING -> CONNECTING
D 2021-11-08T20:56:36.420Z | resolving_load_balancer | dns:googleads.googleapis.com:443 CONNECTING -> CONNECTING
D 2021-11-08T20:56:36.420Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 CONNECTING -> CONNECTING
D 2021-11-08T20:56:36.420Z | subchannel | (-1) 2a00:1450:4009:815::200a:443 creating HTTP/2 session
D 2021-11-08T20:56:36.420Z | subchannel | (-1) 142.250.179.234:443 creating HTTP/2 session
D 2021-11-08T20:56:36.478Z | subchannel | (-1) 142.250.179.234:443 CONNECTING -> READY
D 2021-11-08T20:56:36.478Z | round_robin | CONNECTING -> READY
D 2021-11-08T20:56:36.478Z | resolving_load_balancer | dns:googleads.googleapis.com:443 CONNECTING -> READY
D 2021-11-08T20:56:36.478Z | connectivity_state | (-1) dns:googleads.googleapis.com:443 CONNECTING -> READY
D 2021-11-08T20:56:36.929Z | subchannel_refcount | (-1) 2a00:1450:4009:823::200a:443 refcount 1 -> 0
D 2021-11-08T20:56:36.929Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 CONNECTING -> TRANSIENT_FAILURE
D 2021-11-08T20:56:36.930Z | subchannel_refcount | (-1) 172.217.169.74:443 refcount 1 -> 0
D 2021-11-08T20:56:36.930Z | subchannel | (-1) 172.217.169.74:443 READY -> TRANSIENT_FAILURE
D 2021-11-08T20:56:36.930Z | subchannel | (-1) 172.217.169.74:443 TRANSIENT_FAILURE -> IDLE
D 2021-11-08T20:56:46.331Z | subchannel | (-1) 2a00:1450:4009:823::200a:443 TRANSIENT_FAILURE -> IDLE

Is that expected behaviour?

From my side, given that we no longer experience a memory leak and we have the option of disabling channelz tracing, I consider my issue completely resolved - but I'm more than happy to continue to contribute logs or testing if there is more that can be done. Thank you again for your help @murgatroid99

murgatroid99 · 2021-11-08T22:45:08Z

It looks like the server is telling the client to go away after the connection is idle for 75 seconds. The client behavior when handling that is normal, specifically for the round_robin load balancing policy. round_robin tries to keep all of its connections alive, so it will reconnect every time the server tells it to go away. The default pick_first, on the other hand, will simply let the connection drop until the user makes another request.

However, the total channelz memory usage should be bounded for the duration of that log. There's no mention of any channel construction, so the number of channels is constant. I see 4 instances of subchannels getting constructed, and 4 instances of them getting removed (when the refcount goes to 0), so the total number of subchannels is approximately constant. And each subchannel should register a socket with channelz when it transitions to CONNECTING, and it should unregister it when it transitions to TRANSIENT_FAILURE or IDLE, so the number of registered sockets should be constant as well. And with the changes in 1.4.3, the number of log statements retained for each is strictly limited.

Have you checked whether you still see the memory leak with channelz enabled as of 1.4.4, and if you do, can you share a log with channelz enabled of a timespan in which memory significantly increases?

tom-engineering changed the title ~~1.4.1 - Channelz tracing uses unbounded memory which causes OOM~~ grpc-js 1.4.1 - Channelz tracing uses unbounded memory which causes OOM Oct 17, 2021

murgatroid99 mentioned this issue Oct 18, 2021

grpc-js: Limit the number of retained channelz trace events #1943

Merged

murgatroid99 mentioned this issue Nov 3, 2021

channelz support might have introduced memory leak #1958

Closed

murgatroid99 closed this as completed Dec 13, 2021

sam-la-compass mentioned this issue Mar 24, 2022

channelz related memory leak (ServerHttp2Session) #2068

Closed

wwilfinger mentioned this issue Jul 21, 2023

Intermittently the client enters a state where it doesnt receive response sent by server #2502

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

grpc-js 1.4.1 - Channelz tracing uses unbounded memory which causes OOM #1941

grpc-js 1.4.1 - Channelz tracing uses unbounded memory which causes OOM #1941

tom-engineering commented Oct 17, 2021

murgatroid99 commented Oct 18, 2021

tom-engineering commented Oct 19, 2021

murgatroid99 commented Oct 19, 2021

murgatroid99 commented Oct 26, 2021 •

edited

Loading

murgatroid99 commented Nov 1, 2021

tom-engineering commented Nov 5, 2021 •

edited

Loading

murgatroid99 commented Nov 5, 2021

tom-engineering commented Nov 5, 2021

murgatroid99 commented Nov 5, 2021

Gilwe commented Nov 7, 2021

tom-engineering commented Nov 8, 2021 •

edited

Loading

murgatroid99 commented Nov 8, 2021

tom-engineering commented Nov 8, 2021 •

edited

Loading

murgatroid99 commented Nov 8, 2021

grpc-js 1.4.1 - Channelz tracing uses unbounded memory which causes OOM #1941

grpc-js 1.4.1 - Channelz tracing uses unbounded memory which causes OOM #1941

Comments

tom-engineering commented Oct 17, 2021

Problem description

Reproduction steps

Environment

Additional context

murgatroid99 commented Oct 18, 2021

tom-engineering commented Oct 19, 2021

murgatroid99 commented Oct 19, 2021

murgatroid99 commented Oct 26, 2021 • edited Loading

murgatroid99 commented Nov 1, 2021

tom-engineering commented Nov 5, 2021 • edited Loading

murgatroid99 commented Nov 5, 2021

tom-engineering commented Nov 5, 2021

murgatroid99 commented Nov 5, 2021

Gilwe commented Nov 7, 2021

tom-engineering commented Nov 8, 2021 • edited Loading

murgatroid99 commented Nov 8, 2021

tom-engineering commented Nov 8, 2021 • edited Loading

murgatroid99 commented Nov 8, 2021

murgatroid99 commented Oct 26, 2021 •

edited

Loading

tom-engineering commented Nov 5, 2021 •

edited

Loading

tom-engineering commented Nov 8, 2021 •

edited

Loading

tom-engineering commented Nov 8, 2021 •

edited

Loading