-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parent job does not execute when child job fails #800
Comments
Currently by design, until all child jobs have been completed the parent job will not be processed. |
@manast Gotcha thanks, so then the only way would be to catch errors and manually mark child jobs as completed or is there an option for this ? I assumed that |
@manast to clarify what I am trying to achieve is:
|
Regarding the two last points:
This would imply new functionality. Namely that a parent job should be "processed" OR "failed" depending on different criteria. For example, it could be the case that we want the parent to fail if one child fails already, OR as you wrote complete if all children either fails or completes, etc. |
This affects us as well. Is there a possibility to see a configurable outcome in the near future? |
@Slind14 what is your usecase? |
We would like to keep a history of recently failed jobs within Redis. Otherwise |
We have the same issue. We have one top-level job with multiple sub-components. If any of the components fail, we would like for the parent job to be marked as failed so that we can log the errors into a database based on the status of the top-level job |
@bilalshaikh42 the problem here is that for example, you could have retries set up on the children, so you probably do not want to fail the parent at least until all the retries have been exhausted. The other common case is that a child job fails, but then you fix the reason for the failure, and then you want to manually retry the job, if the job finally completes we should complete the parent. |
@roggervalf you could link to this issue regarding the new feature to support failing parents. |
Sure, in this case I would expect all the retires to be exhausted before the child job or the parent job is marked as failed. Similarly, even though one child job fails, I expect other child jobs would still be attempted/pass as long as they do not also depend on child job. I had not considered the case you mention about manual intervention into a failed job to get the parent job to fail. To support that I guess there would need to be some status for parent other the failed, such as "failed-children" just as there is waiting and "waiting-children". This way if a child job is retired you could revaluate or rerun the parent. In the way things are set up now, given just the parent job, is there a way to know when the child jobs have failed without iterating through them all? The status of the parent job will just be stuck on "wait-children" so our code was initially waiting forever instead of realizing that a child job has failed. |
@bilalshaikh42 we are trying to iron out different edge cases, so the case when child jobs fail will be improved. But we need to be very careful as any new change usually implies lots of new edge cases. |
@manast yup, makes sense! Would be happy to help with feedback and/or testing out ideas In the meantime, do you have any recommendations on the best way to check for a parent job who's dependencies have failed? |
@roggervalf do we have anything new to update here after the last updates to flows? |
This is in my pending list, this week I can work on this feature 👀 |
Hi, |
+1 on this, with some notes:
|
Hi, I stubled upon this issue as well. I have failing children jobs, and when all execution attempts on children jobs have been made (and failed), it should return execution to the parent job. Perhaps adding With current design, one should write children workers as bulletproof so they never fail, I don't think it is feasible. |
Hi, this is affecting us as well. I agree with passed comments, I think it's unrealistic to expect that child jobs should never fail. Is this issue still worked on? Thanks! |
hi @Leobaillard, we currently have failParentOnFailure option https://docs.bullmq.io/guide/flows/fail-parent, could it address your case? |
Hi! Thanks for your quick answer! It does, in part. There are still use cases where it would be nice to allow child jobs to fail if they are not critical to the success (or partial success) of the parent. Users would then be expected to handle this "partial" state in their business logic. This is useful to keep failed child job history while retaining the possibility to have the parent job executed. Some sort of job report can then be generated by the app, listing the successful and failed child tasks. |
I'm in the same boat as everyone else. It seems odd that the only two options right now for when a child fails are:
I wish we had, either as the default or a different option, something like Adding this option to children would allow us to let some children actually fail the parent, while letting others not affect the parent. But doing nothing—not even appearing in the queue—just seems strange and not an obvious result. |
hey @theDanielJLewis, we have a pr for it #1953 You also can take a look on that one, @manast and I are evaluating this new feature |
Awesome, @roggervalf! Only a week old, so I can have hope it will be merged soon. Thanks! |
This is also causing my project a problem now, thought I'd worked out how to run a tree of jobs and fetch the states afterwards to display what passed and failed, but having to detach failed children with |
I think this issue is related to #2092 |
I assume there's an underlying architectural problem, because it's not entirely clear why we need to disconnect a child and then find it again as per #2092, instead of the parent having a flag to continue on child failures and we use some API to check children manually and decide what state the Parent should end up in. We're going to end up doing that last leg regardless but it's a bit convoluted. Anyway workaround for now seem to be:
Arguably that's actually quite a good way to go about it regardless as it's more obvious than an incantation of several flags, and "completed" doesn't have to mean "succeeded" |
The workaround you suggest is precisely what we do. Then we have some helper utils to check for the special "succeeded but actually failed" return value, and allow the parent to continue or failed based on things like the percentage of child jobs that truly succeeded or failed. |
Ran into the same issue here. It feels like the parent should have control to whether or not it should fail or not, based off of it's children. This is a little more intuitive. If we funneled all failures/completions etc to the parent to process, it would really make things easier, and the processor would actually hit (and leave waiting children depending on your action). I ended up returning failed data from a try catch in the child processor and works fine (forcing a completion), although a little misleading, due to not seeing failed children in the queue. ignoreDependencyOnFailure didn't appear to do anything after testing. It would be nice to see this feature allowing processing of the parent no matter the failures/completions etc, where the parent would decide it's fate and take action where needed. |
hi @sick-sw, sounds like ignoreDependencyOnFailure is in fact what you are looking for https://docs.bullmq.io/guide/flows/ignore-dependency, in order to get children failures you need to use getFailedChildrenValues method if you want to do some logic with them. |
I have a FlowProducer that runs a parent job after a set of children jobs have been executed. There are about 10k children that are being run concurrently. I've set the option
removeOnFail
totrue
on both the children and the parents but it seems that if a child fail the execution of the parent just hangs.Could this be a bug or is it a configuration issue on my side?
Here is what the code that create the job looks like:
and the worker that run the children job looks like this:
The text was updated successfully, but these errors were encountered: