[BUG] Incorrect awareness attribute count captured in decisionExplanationString in AwarenessAllocationDecider #3413
Labels
bug
Something isn't working
pending backport
Identifies an issue or PR that still needs to be backported
Describe the bug
In some scenarios, when allocation decision is No by AwarenessAllocationDecider. No Allocation explanation reason message is showing incorrect number of awareness attributes when forced zone awareness is enabled.
For ex - A domain with 3 availability zones and forced zone awareness set as
“cluster.routing.allocation.awareness.force.zone.values” = “a,b,c”
gives shard not allocated reasons as:there are too many copies of the shard allocated to nodes with attribute [zone], there are [3] total configured shard copies for this shard id and **[4] total attribute values**, expected the allocated shard count per attribute [2] to be less than or equal to the upper bound of the required number of shards per attribute [1]
Though count of awareness attribute is 3 but it is captured as 4.
To Reproduce
Steps to reproduce the behavior:
Create an unbalanced domain with 3 availability zones and node count ( 2,1,0). ie 2 data nodes in zone a, 1 data node in b and no data node in zone c. This distribution can be seen at times during zonal failures.
Set
cluster.routing.allocation.awareness.force.zone.values = a,b,c
Create an index with 1 shard and 2 replica ie 3 shard copies. 2 shard copied get assigned to nodes in zone a and b but 1 shard copy stay unassigned due to forced zone awareness.
Add a data node in zone c and try to assign unassigned shard copy to a node in zone a.
Shard allocation failed with reason :
there are too many copies of the shard allocated to nodes with attribute [zone], there are [3] total configured shard copies for this shard id and [4] total attribute values, expected the allocated shard count per attribute [2] to be less than or equal to the upper bound of the required number of shards per attribute [1] by AwarenessAllocationDecider
Wrote a unit test that failed due to this bug
Expected behavior
Decision explanation should capute correct number of attributes/zones = 3
It should be :-
there are too many copies of the shard allocated to nodes with attribute [zone], there are [3] total configured shard copies for this shard id and [3] total attribute values, expected the allocated shard count per attribute [2] to be less than or equal to the upper bound of the required number of shards per attribute [1]
Screenshots
If applicable, add screenshots to help explain your problem.
The text was updated successfully, but these errors were encountered: