-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node fails to perform data write on replica shards while upgrading to OS 2.12 #4085
Comments
@cwperks Tagging you if you can take a look at this further opensearch-project/OpenSearch#11491 |
This issue is in performance-analyzer, transferring to that repo |
When the second node is being rebooted, could there be an inflight transport request that gets resumed when the second node is brought back up as a 2.12 node? I'd have to dig into it further, but this line determines what serialization to use when sending a transport request. It could be the case that the first node that was replaced with a 2.12 node is in the middle of sending a request to the second 1.3 node that is rebooted before replying back to the 2.12 node. When the node comes back online, could it be that the transport request is being replayed? |
@peternied We don't have PA enabled, this is happening without that plugin and this seems to be generic security module problem. We should move it back to security repo. Thanks |
@opensearch-project/admin Please transfer this issue to the security repo. |
[Triage] Hi @Dhruvan1217, thanks for filing this issue and providing detailed reproduction steps. Someone will take a look and see what the problem is and note steps forward. |
Also AFAIU, there were no issues in Opensearch 2.10, so maybe we can take a look at what was changed there after (I believe introduction of custom serializartion/de-serialization) |
What is the bug?
While upgrading the Opensearch cluster from 1.3.9 to 2.12, the shards are failing to recover. The data is unable to replicate to the replicas with following exception in the logs.
NOTE: This issue was also showing up in 2.11.x and was expected to be fixed in 2.12.0, Reference: opensearch-project/OpenSearch#11491
How can one reproduce the bug?
Steps to reproduce the behavior:
primaries
all
) before restarting the following nodes.primaries
again and Reboot the second node.all
, if a replica shard comes to this node for which the primary shard is on the first node upgraded, you will see errors in the logs when the first node tries to write/replicate the data to the replica shard.What is the expected behavior?
The data write to shard replicas should be successful.
What is your host/environment?
The text was updated successfully, but these errors were encountered: