Skip to content
This repository has been archived by the owner on Jan 24, 2024. It is now read-only.

Use LinkedBlockingQueue as the container of ResponseAndRequest #500

Conversation

BewareMyPower
Copy link
Collaborator

@BewareMyPower BewareMyPower commented May 14, 2021

Motivation

There're requestQueue and responseQueue in each channel. responseQueue stores the pending ResponseAndRequest to make responses be sent in order. However, requestQueue is just to limit the max number of requests and it's a fake queue because it only maintains a request count.

Sometimes an exception may be thrown like

[pulsar-io-28-27] WARN io.netty.util.concurrent.AbstractEventExecutor - A task raised an exception. Task: io.streamnative.pulsar.handlers.kop.KafkaCommandDecoder$$Lambda$1097/0x000000084087f440@60104314
java.lang.IllegalStateException: poll() is called when count is 0

It's because when an empty requestQueue called poll() method. It may be caused by the close() method, which clears the requestQueue. However, the pending response future may complete and trigger writeAndFlushResponseToClient, which triggers another poll() call. In addition, writeAndFlushResponseToClient may not remove any responses from responseQueue but each time it's called the requestQueue.poll() will be called.

Modifications

  • Use LinkedBlockingQueue to store ResponseAndRequest and limit the max number of requests, use the requestQueue as its name to avoid misunderstanding.
  • When the channel is closed, before clear the requestQueue, cancel all response futures and release the response buffer when it's a FETCH response. If the response future is cancelled, don't call writeAndFlushResponseToClient in callback.
  • Check if the response future is completed in DelayedOperation's callback because the callback may be triggered twice.
  • If a response future is completed, send the response even if it's expired.

@BewareMyPower BewareMyPower requested a review from jiazhai as a code owner May 14, 2021 04:01
@BewareMyPower BewareMyPower self-assigned this May 14, 2021
@BewareMyPower
Copy link
Collaborator Author

DifferentNamespaceTestBase still failed, I'll first take a look.

@BewareMyPower BewareMyPower force-pushed the bewaremypower/refactor-max-requests-limit branch from d19fd87 to e293791 Compare May 14, 2021 06:23
@BewareMyPower BewareMyPower changed the title Use LinkedBlockingQueue as the container of ResponseAndRequest [WIP] Use LinkedBlockingQueue as the container of ResponseAndRequest May 14, 2021
@BewareMyPower
Copy link
Collaborator Author

@hangc0276 From your comment, I just thought of that this problem also existed in current code because the responseQueue will be cleared. Even it was not cleared, because isActive was false now, writeAndFlushResponseToClient wouldn't remove the response and the callback wouldn't be called. And for the case that the response is expired, we also completed it with null. It might also cause direct memory leak.

I think we need another way to handle this case.

@BewareMyPower BewareMyPower changed the title [WIP] Use LinkedBlockingQueue as the container of ResponseAndRequest Use LinkedBlockingQueue as the container of ResponseAndRequest May 14, 2021
@BewareMyPower BewareMyPower force-pushed the bewaremypower/refactor-max-requests-limit branch from e293791 to acbef17 Compare May 14, 2021 11:49
@BewareMyPower
Copy link
Collaborator Author

@hangc0276 I've pushed the fix, PTAL again.

@BewareMyPower BewareMyPower changed the title Use LinkedBlockingQueue as the container of ResponseAndRequest [WIP] Use LinkedBlockingQueue as the container of ResponseAndRequest May 14, 2021
@BewareMyPower
Copy link
Collaborator Author

It looks like after rebasing to master, some tests failed. I'll take a look.

@BewareMyPower BewareMyPower changed the title [WIP] Use LinkedBlockingQueue as the container of ResponseAndRequest Use LinkedBlockingQueue as the container of ResponseAndRequest May 14, 2021
@BewareMyPower
Copy link
Collaborator Author

The DifferentNamespacePulsarTest is flaky, I'll open a new PR to fix it.

@BewareMyPower BewareMyPower force-pushed the bewaremypower/refactor-max-requests-limit branch 2 times, most recently from 11c2935 to 74a9590 Compare May 16, 2021 06:11
@BewareMyPower
Copy link
Collaborator Author

@hangc0276 PTAL again

@BewareMyPower
Copy link
Collaborator Author

@hangc0276 I adjusted the logic of response future's callback. The key point is that when the future is cancelled, skip writeAndFlushResponseToClient.

There're two main reasons:

  1. If the response future is cancelled during close, the responseQueue will be cleared by close() in loop. We don't want to trigger writeAndFlushResponseToClient in each loop. Even it will do nothing because isActive is false.
  2. If the response future is cancelled by expired request, we'll print the log and send the error response in writeAndFlushResponseToClient.

If the response future is completed exceptionally or completed with null value, it will be handled in writeAndFlushResponseToClient later so we don't need to do anything in whenComplete.

Copy link
Collaborator

@hangc0276 hangc0276 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hangc0276 hangc0276 merged commit 281d72e into streamnative:master May 18, 2021
@BewareMyPower BewareMyPower deleted the bewaremypower/refactor-max-requests-limit branch May 18, 2021 07:13
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants