Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

[BUG] Force merge fails when taking too long #266

Closed
dbbaughe opened this issue Aug 1, 2020 · 1 comment
Closed

[BUG] Force merge fails when taking too long #266

dbbaughe opened this issue Aug 1, 2020 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@dbbaughe
Copy link
Contributor

dbbaughe commented Aug 1, 2020

Describe the bug
There is currently a bug in force merge which is known and being worked on.

The call to force merge API from the ForceMerge action does not return an acknowledgement, but actually holds the connection open until a rejection is thrown or the force merge completes.

Since force merges can take hours for large indices this causes ISM to go into a weird state for this index as after an hour the lock will expire on the job and it will execute again.

This second execution sees the StepStatus.STARTING from the previous ongoing execution and then decides to disable the job and move it into a StepStatus.FAILED state.

The index will show this state for a while until the first execution potentially completes and then updates the status to StepStatus.COMPLETED and the info to "Starting force merge" while the job is still disabled which is confusing.

Unfortunately the force merge API does not have a way to gracefully close the connection after confirming the force merge task is queued or ongoing so our current idea for a workaround is to let the force merge connection stay open for up to 5 minutes and if it has not thrown a rejection in that 5 minutes (or completed) then just assume it's good to move forward to the WaitFor step. The original call to the force merge API will continue and log the response in a separate coroutine.

@dbbaughe
Copy link
Contributor Author

dbbaughe commented Aug 3, 2020

Fixed in:

#267

@dbbaughe dbbaughe closed this as completed Aug 3, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant