-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All storage instances crashed after 7 hours of pressure test #3373
Comments
Did graph and meta crash? |
no, they are running well. |
There are thread unsafe operations (see below), but I think it should not be the cause of crash threadLocalInfo.localCache_[spaceId] = infoDeepCopy; // infoDeepCopy is a shared_ptr |
perhaps related to #3192, there is a hidden bug in MetaClient. The leader change frequently would make meta version updated, and meta client will pull data from meta server in consequence |
facebook/folly#1252 I find a similar bug in folly repo. |
The current guess is that the NULL pointer is not checked during OOM, which causes the memory near the 0x00 address to be modified, which causes the program to crash when the destructor is called at stop. |
Please check the FAQ documentation before raising an issue
Describe the bug (required)
A nebula cluster of 3storages + 1graph + 1meta, we keep inserting edge and trigger leader, after running for about 7 hours, all storage instances crash almost at the same time and the all have similar crash stack:
Your Environments (required)
g++ --version
orclang++ --version
lscpu
a3ffc7d8
) c6d1046How To Reproduce(required)
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Additional context
Provide logs and configs, or any other context to trace the problem.
The text was updated successfully, but these errors were encountered: