Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: ProducerBlockedQuotaExceededException: Cannot create producer on topic with backlog quota exceeded #38030

Open
1 task done
TonyAnn opened this issue Nov 26, 2024 · 5 comments
Assignees
Labels
kind/bug Issues or changes related a bug triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@TonyAnn
Copy link

TonyAnn commented Nov 26, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:2.3.21
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):    pulsar 
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): CentOS
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

The client request timed out

rootcoord throws the following error:
[ERROR] [retry/retry.go:46] ["retry func failed"] ["retry time"=4] [error="server error: ProducerBlockedQuotaExceededException: Cannot create producer on topic with backlog quota exceeded"] [stack="github.com/milvus-io/milvus/pkg/util/retry.Do\n\t/workspace/source/pkg/util/retry/retry.go:46\ngithub.com/milvus-io/milvus/pkg/mq/msgstream.(*mqMsgStream).AsProducer\n\t/workspace/source/pkg/mq/msgstream/mq_msgstream.go:144\ngithub.com/milvus-io/milvus/internal/rootcoord.newDmlChannels\n\t/workspace/source/internal/rootcoord/dml_channels.go:201\ngithub.com/milvus-io/milvus/internal/rootcoord.newTimeTickSync\n\t/workspace/source/internal/rootcoord/timeticksync.go:121\ngithub.com/milvus-io/milvus/internal/rootcoord.(*Core).initInternal\n\t/workspace/source/internal/rootcoord/root_coord.go:473\ngithub.com/milvus-io/milvus/internal/rootcoord.(*Core).Init.func1.1\n\t/workspace/source/internal/rootcoord/root_coord.go:530\nsync.(*Once).doSlow\n\t/usr/local/go/src/sync/once.go:74\nsync.(*Once).Do\n\t/usr/local/go/src/sync/once.go:65\ngithub.com/milvus-io/milvus/internal/rootcoord.(*Core).Init.func1\n\t/workspace/source/internal/rootcoord/root_coord.go:529\ngithub.com/milvus-io/milvus/internal/util/sessionutil.(*Session).ProcessActiveStandBy\n\t/workspace/source/internal/util/sessionutil/session_util.go:1103\ngithub.com/milvus-io/milvus/internal/rootcoord.(*Core).Register.func2\n\t/workspace/source/internal/rootcoord/root_coord.go:283"]

At the same time, some collection checkpoints are not updated in time:
list the latest checkpoint of all physical channels:
pchannel: by-dev-rootcoord-dml_0, the lastest checkpoint ts: 2024-11-25 15:49:49.874 +0800 CST
pchannel: by-dev-rootcoord-dml_10, the lastest checkpoint ts: 2024-11-13 20:18:21.874 +0800 CST
pchannel: by-dev-rootcoord-dml_14, the lastest checkpoint ts: 2024-11-21 22:52:58.473 +0800 CST
pchannel: by-dev-rootcoord-dml_1, the lastest checkpoint ts: 2024-11-26 15:35:59.111 +0800 CST
pchannel: by-dev-rootcoord-dml_11, the lastest checkpoint ts: 2024-11-26 15:35:44.211 +0800 CST
pchannel: by-dev-rootcoord-dml_13, the lastest checkpoint ts: 2024-11-21 23:40:58.474 +0800 CST
pchannel: by-dev-rootcoord-dml_4, the lastest checkpoint ts: 2024-11-25 14:51:15.273 +0800 CST
pchannel: by-dev-rootcoord-dml_2, the lastest checkpoint ts: 2024-11-22 13:59:01.873 +0800 CST
pchannel: by-dev-rootcoord-dml_7, the lastest checkpoint ts: 2024-11-23 15:09:22.274 +0800 CST
pchannel: by-dev-rootcoord-dml_9, the lastest checkpoint ts: 2024-11-23 15:10:02.673 +0800 CST
pchannel: by-dev-rootcoord-dml_15, the lastest checkpoint ts: 2024-11-25 14:57:53.673 +0800 CST
pchannel: by-dev-rootcoord-dml_8, the lastest checkpoint ts: 2024-11-19 14:16:38.274 +0800 CST
pchannel: by-dev-rootcoord-dml_12, the lastest checkpoint ts: 2024-11-21 16:17:38.074 +0800 CST
pchannel: by-dev-rootcoord-dml_6, the lastest checkpoint ts: 2024-11-26 15:45:44.117 +0800 CST
vchannel: doesn't exists in collection: 453281916976337927

Expected Behavior

Please help me locate the problem and how to fix the situation where the checkpoint is not updated.

Steps To Reproduce

No response

Milvus Log

milvus-log.tar.gz

Anything else?

No response

@TonyAnn TonyAnn added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 26, 2024
@xiaofan-luan
Copy link
Collaborator

@TonyAnn

@xiaofan-luan
Copy link
Collaborator

try to use the pulsarctl tool and find who is the subscriber of this topic and remove the pulsar topic should resolve this problem

@yanliang567
Copy link
Contributor

/assign @TonyAnn

@yanliang567 yanliang567 added triage/needs-information Indicates an issue needs more information in order to work on it. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 26, 2024
@TonyAnn
Copy link
Author

TonyAnn commented Nov 27, 2024

try to use the pulsarctl tool and find who is the subscriber of this topic and remove the pulsar topic should resolve this problem
@xiaofan-luan I have a question. If I manually clean up the pulsar topic, will it cause data loss?
I understand that the current error is due to the backlog caused by the datanode not consuming in time? Is my understanding correct?

@yanliang567
Copy link
Contributor

@LoveEachDay any comments?

/assign @LoveEachDay
/unassign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

4 participants