[BUG] Force merge fails when taking too long #266

dbbaughe · 2020-08-01T00:01:38Z

Describe the bug
There is currently a bug in force merge which is known and being worked on.

The call to force merge API from the ForceMerge action does not return an acknowledgement, but actually holds the connection open until a rejection is thrown or the force merge completes.

Since force merges can take hours for large indices this causes ISM to go into a weird state for this index as after an hour the lock will expire on the job and it will execute again.

This second execution sees the StepStatus.STARTING from the previous ongoing execution and then decides to disable the job and move it into a StepStatus.FAILED state.

The index will show this state for a while until the first execution potentially completes and then updates the status to StepStatus.COMPLETED and the info to "Starting force merge" while the job is still disabled which is confusing.

Unfortunately the force merge API does not have a way to gracefully close the connection after confirming the force merge task is queued or ongoing so our current idea for a workaround is to let the force merge connection stay open for up to 5 minutes and if it has not thrown a rejection in that 5 minutes (or completed) then just assume it's good to move forward to the WaitFor step. The original call to the force merge API will continue and log the response in a separate coroutine.

dbbaughe · 2020-08-03T23:42:52Z

Fixed in:

#267

dbbaughe added the bug Something isn't working label Aug 1, 2020

dbbaughe self-assigned this Aug 1, 2020

dbbaughe mentioned this issue Aug 3, 2020

Fixes force merge failing on long executions, changes some action mes… #267

Merged

dbbaughe closed this as completed Aug 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Force merge fails when taking too long #266

[BUG] Force merge fails when taking too long #266

dbbaughe commented Aug 1, 2020

dbbaughe commented Aug 3, 2020

[BUG] Force merge fails when taking too long #266

[BUG] Force merge fails when taking too long #266

Comments

dbbaughe commented Aug 1, 2020

dbbaughe commented Aug 3, 2020