You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.
Describe the bug
There is currently a bug in force merge which is known and being worked on.
The call to force merge API from the ForceMerge action does not return an acknowledgement, but actually holds the connection open until a rejection is thrown or the force merge completes.
Since force merges can take hours for large indices this causes ISM to go into a weird state for this index as after an hour the lock will expire on the job and it will execute again.
This second execution sees the StepStatus.STARTING from the previous ongoing execution and then decides to disable the job and move it into a StepStatus.FAILED state.
The index will show this state for a while until the first execution potentially completes and then updates the status to StepStatus.COMPLETED and the info to "Starting force merge" while the job is still disabled which is confusing.
Unfortunately the force merge API does not have a way to gracefully close the connection after confirming the force merge task is queued or ongoing so our current idea for a workaround is to let the force merge connection stay open for up to 5 minutes and if it has not thrown a rejection in that 5 minutes (or completed) then just assume it's good to move forward to the WaitFor step. The original call to the force merge API will continue and log the response in a separate coroutine.
The text was updated successfully, but these errors were encountered:
Describe the bug
There is currently a bug in force merge which is known and being worked on.
The call to force merge API from the ForceMerge action does not return an acknowledgement, but actually holds the connection open until a rejection is thrown or the force merge completes.
Since force merges can take hours for large indices this causes ISM to go into a weird state for this index as after an hour the lock will expire on the job and it will execute again.
This second execution sees the StepStatus.STARTING from the previous ongoing execution and then decides to disable the job and move it into a StepStatus.FAILED state.
The index will show this state for a while until the first execution potentially completes and then updates the status to StepStatus.COMPLETED and the info to "Starting force merge" while the job is still disabled which is confusing.
Unfortunately the force merge API does not have a way to gracefully close the connection after confirming the force merge task is queued or ongoing so our current idea for a workaround is to let the force merge connection stay open for up to 5 minutes and if it has not thrown a rejection in that 5 minutes (or completed) then just assume it's good to move forward to the WaitFor step. The original call to the force merge API will continue and log the response in a separate coroutine.
The text was updated successfully, but these errors were encountered: