-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timeout execution requirement doesn't work on starlark rules and genrule #12349
Comments
Local execution already enforces action timeouts where specified. However, there is no default timeout. You can add a tag "timeout:" to enforce a timeout for local execution (I think). That might be enough to cover the use case you have. It would also not be too difficult to add a flag to Bazel to set a default timeout for all locally executed actions. I think you're proposing to add a feature to |
I see the I couldn't find any documentation on it, am I using it wrong? I'll change the issue in the meantime. |
@jin would you mind double checking the tags on this issue? Now that it's about making the |
@ishikhman do you know if there's a known issue with the interaction between |
That is odd. Do you have a simple reproduction that we could look at? |
Can you show a bit more of what you're doing that's not working? A rule definition or a build log? I just ran across the timeout in connection with multiplex workers, and it certainly looks like timeouts are not treated right in either worker - the timeout is set on the process, but for workers the process lives forever, and nobody seems to call the methods that check the timeout under normal circumstances. |
Please reply, if you have a repro case and we can reopen this issue. |
Here's a minimal reproduction under remote execution (bazel client v3.0.0):
The action succeeds:
Of course, this would happen if our remote execution implementation was faulty. However, there's simply no timeout field in the
Full GRPC log: https://gist.github.com/sushain97/366d9efa18a529fb2c62cb18117d1ce0 A naive inspection of the relevant code doesn't reveal an obvious bug:
However, there are many layers of abstractions here so I'm probably missing something. Related, I wasn't able to find any documentation for build action |
Grooming the backlog, is "more data needed" still accurate here? |
I can also add more information if needed. |
Remote execution doesn't support timeout execution requirement yet. I have the entry to revisit timeout behavior for remote executor on my TODO list and will address this during that effort. |
It doesn't work on local execution either though, that's what this issue is about. |
Can you please share a repro case with local execution? |
@coeuvre @ulfjack I am running into this problem right now where my user have a buggy / flaky It seems like in this issue, the solution is deviating toward using Are you guys ok with changing this into a Here is a simple reproducible
|
We need this flag too. We've seen some actions taking half an hour on a CI machine, while only taking less than half a minute on a local machine. We are still not sure what's going on in those CI machines, but it would really help if we can timeout a local action so we can restart the build as early, possibly on a different machine. |
@coeuvre, what's the status of getting support for custom action timeouts in remote action execution requirements? |
Hey from 2025! Still lacking for a feature to set timeout for build actions |
#23961 will fix this, but some review comments need to be addressed before it can be merged. |
Description of the problem / feature request:
Support timeout execution requirement on starlark rules and genrules as a way to terminate stuck / hanging actions, so that their stderr / stdout can be printed out to aid debugging.
Feature requests: what underlying problem are you trying to solve with this feature?
Without such a feature when an action hangs, it can be hard to see the output it has printed so far, especially on non-interactive systems like CI. Another option might be to stream the output of the actions like requested in #7213, but that would require running actions sequentially, so it could only used when reproducing the issue.
Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
Create a starlark rule and a genrule that never terminate, set
execution_requirements = {"timeout": "5"}
on the starlark rule, andtags = ["timeout:5"]
on both, then run the build with--experimental_allow_tags_propagation
, note that neither rule is terminated.What operating system are you running Bazel on?
Both macOS and Linux
What's the output of
bazel info release
?Have you found anything relevant by searching the web?
#7766
#8830
The text was updated successfully, but these errors were encountered: