You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a job is stopped and the workers take some time to exit you get the following error.
curl -XPOST localhost:5678/ex/56dea3e4-820c-4508-a946-5704ff273675/_stop
{
"error": 500,
"message": "Could not stop job, error: Error communicating with node: localhost, Could not send msg: cluster:job:stop, data: {\"ex_id\":\"56dea3e4-820c-4508-a946-5704ff273675\",\"_msgID\":\"SJQofzdQb\"}"
}
This appears to be the result of a timeout but I think there are several issues here.
The error message is misleading. If this is a timeout it should probably be more explicit about it.
The default timeout is probably too short. What is the impact from increasing it to something like 5m?
The job does appear to stop and all workers exit however it is not labeled as stopped and remains in it's prior state. Immediately re-running _stop will work and set the state to stopped even though the job doesn't appear to actually be running anymore.
The text was updated successfully, but these errors were encountered:
this is mainly resolved and linked to #436 . The error message was change and the timeout is now 5mins. I left that code as is since we need the guarantee that the job actually stoped before marking it as such.
I've been seeing this issue consistently. On first call to _stop, the slicer stops but the job stays running and the response is Request timed out (30s). The 2nd call to _stop returns successfully and stops the job.
My job includes a kafka reader with wait:30s. Changing to wait:10s & interval:10s did not have an impact - still timed out after 30s.
When a job is stopped and the workers take some time to exit you get the following error.
This appears to be the result of a timeout but I think there are several issues here.
stopped
and remains in it's prior state. Immediately re-running_stop
will work and set the state tostopped
even though the job doesn't appear to actually be running anymore.The text was updated successfully, but these errors were encountered: