Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Do not fail retry of scale up operation (#511)
Scale up operation can some times take longer depending on the size of data in each partition. By default the job is locked for 5 minutes and retried after. If the scale up operation do not complete with in this 5 minutes, the retry failed because there is already an operation in progress. This results in an incident after 3 retries. Manual retry to resolve incident also failed because the command fails when the clusterSize is already the requested one. This PR fixes this by: 1. Do not fail scale up command if clusterSize is already the requested one 2. Do not wait for the operation to complete in scale up command. Instead, run two commands separately - scale and wait. This way only `wait` has to be retried. Since it is a query, it can be safely retried. 3. To allow using `wait` in chaos experiments and e2e tests, allow it to run without specifiying a changeId. When no changeId is specified, it reads the changeId from the `pendingChange` or `lastChange`.
- Loading branch information