Catch silent subproc_execute failures #2404

pryce-turner · 2024-05-09T23:21:49Z

Why are the changes needed?

Under some circumstances, subproc_execute will exit with 0 but have a non-null stderr.

What changes were proposed in this pull request?

Adds an assert statement to handle 0 exit with stderr.

How was this patch tested?

Unit test to capture AssertionError and ensure the message contains the right content.

Setup process

make setup
make unit_test

Signed-off-by: pryce-turner <[email protected]>

codecov · 2024-05-09T23:30:38Z

Codecov Report

Attention: Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.

Project coverage is 72.97%. Comparing base (4dd4d22) to head (9132cf3).
Report is 53 commits behind head on master.

❗ Current head 9132cf3 differs from pull request most recent head e2f41dd

Please upload reports for the commit e2f41dd to get more accurate results.

Files	Patch %	Lines
flytekit/extras/tasks/shell.py	0.00%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2404      +/-   ##
==========================================
- Coverage   75.81%   72.97%   -2.84%     
==========================================
  Files         181      215      +34     
  Lines       18327    19541    +1214     
  Branches     2580     3602    +1022     
==========================================
+ Hits        13895    14261     +366     
- Misses       3830     4660     +830     
- Partials      602      620      +18

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

eapolinario · 2024-05-10T18:07:14Z

tests/flytekit/unit/extras/tasks/test_shell.py

+    # This is a corner case I ran into that really shouldn't
+    # ever happen. The assert catches anything in stderr despite
+    # a 0 exit.
+    cmd = " ".join(["bcftools", "isec", "|", "gzip", "-c", ">", f"{tmp_path.joinpath('out')}"])


the exitcode 0 only happens because the second command (i.e. gzip) returns a 0. Bash implements a feature called PIPESTATUS, which captures the exit codes of all commands in a pipe. It's just an array that you can verify after the pipe is run, for example:

bash-5.2$ inexistet command --flag | echo bla | gzip -c > out bash: inexistet: command not found bash-5.2$ echo "${PIPESTATUS[@]}" 127 0 0

Notice how the first command returns 127 and the other two return 0.

TIL, thank you!

eapolinario · 2024-05-10T18:08:06Z

flytekit/extras/tasks/shell.py

@@ -80,6 +80,8 @@ def subproc_execute(command: typing.Union[List[str], str], **kwargs) -> ProcessR
        # Execute the command and capture stdout and stderr
        result = subprocess.run(command, **kwargs)

+        assert result.stderr == ""


sometimes commands write debug information to stderr, so we should have a more resilient way of checking if a subprocess ran to completion.

I wish they didn't but you're right.. unfortunately, subprocess.run doesn't capture PIPESTATUS it seems, only showing the return code for the last command. Do you have any thoughts? The only thing I can think of would be to check stderr for "command not found" or "error", but that's pretty awkward.

Unfortunately I don't have a great solution.

pipestatus is a bash feature, so this complicates things a little since we use /bin/sh as the default shell in ShellTask and depending on the OS that's not bash (e.g. on MacOS /bin/sh is symlinked to bash:

❯ uname -a Darwin gondor 23.3.0 Darwin Kernel Version 23.3.0: Wed Dec 20 21:31:00 PST 2023; root:xnu-10002.81.5~7/RELEASE_ARM64_T6020 arm64 ❯ readlink /private/var/select/sh /bin/bash

).

A low-lift solution is to write a warning to the flytekit logs that we detected the use of a pipe in the command and that means we might end up not capturing the output of the command. That's not a great solution either, but at least we let the users know that they are in pipeland, which means they will have to handle it (maybe mention pipestatus?). wdyt?

Yeah, I think that's the best solution given the current constraints. It is a bit of a corner case and I don't think we should be expected to handle every failure state. How does the latest commit look?

Signed-off-by: pryce-turner <[email protected]>

* Made outfile ephemeral Signed-off-by: pryce-turner <[email protected]> * Changed error handling to warn log for pipe with shell commands Signed-off-by: pryce-turner <[email protected]> --------- Signed-off-by: pryce-turner <[email protected]> Signed-off-by: bugra.gedik <[email protected]>

* Made outfile ephemeral Signed-off-by: pryce-turner <[email protected]> * Changed error handling to warn log for pipe with shell commands Signed-off-by: pryce-turner <[email protected]> --------- Signed-off-by: pryce-turner <[email protected]> Signed-off-by: Jan Fiedler <[email protected]>

* Made outfile ephemeral Signed-off-by: pryce-turner <[email protected]> * Changed error handling to warn log for pipe with shell commands Signed-off-by: pryce-turner <[email protected]> --------- Signed-off-by: pryce-turner <[email protected]> Signed-off-by: mao3267 <[email protected]>

Made outfile ephemeral

9132cf3

Signed-off-by: pryce-turner <[email protected]>

pryce-turner requested review from wild-endeavor, kumare3, eapolinario, pingsutw, cosmicBboy and samhita-alla as code owners May 9, 2024 23:21

pryce-turner changed the title ~~Made outfile ephemeral~~ Catch silent subproc_execute failures May 10, 2024

eapolinario reviewed May 10, 2024

View reviewed changes

Changed error handling to warn log for pipe with shell commands

e2f41dd

Signed-off-by: pryce-turner <[email protected]>

eapolinario approved these changes Jun 27, 2024

View reviewed changes

eapolinario merged commit 6d35b75 into master Jun 27, 2024
44 of 46 checks passed

pryce-turner deleted the pryce-turner/catch-shell-error branch June 28, 2024 19:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Catch silent subproc_execute failures #2404

Catch silent subproc_execute failures #2404

pryce-turner commented May 9, 2024

codecov bot commented May 9, 2024 •

edited

Loading

eapolinario May 10, 2024

pryce-turner May 21, 2024

eapolinario May 10, 2024

pryce-turner May 21, 2024

eapolinario May 21, 2024

pryce-turner Jun 17, 2024

Catch silent subproc_execute failures #2404

Catch silent subproc_execute failures #2404

Conversation

pryce-turner commented May 9, 2024

Why are the changes needed?

What changes were proposed in this pull request?

How was this patch tested?

Setup process

codecov bot commented May 9, 2024 • edited Loading

Codecov Report

eapolinario May 10, 2024

Choose a reason for hiding this comment

pryce-turner May 21, 2024

Choose a reason for hiding this comment

eapolinario May 10, 2024

Choose a reason for hiding this comment

pryce-turner May 21, 2024

Choose a reason for hiding this comment

eapolinario May 21, 2024

Choose a reason for hiding this comment

pryce-turner Jun 17, 2024

Choose a reason for hiding this comment

codecov bot commented May 9, 2024 •

edited

Loading