Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce error handling verbosity in CI tests scripts #1113

Merged
merged 2 commits into from
Feb 10, 2023

Conversation

AjayThorve
Copy link
Member

This PR adds a less verbose trap method, for error handling to help ensure that we capture all potential error codes in our test scripts, and works as follows:

  • setting an environment variable, EXITCODE, with a default value of 0
  • setting a trap statement triggered by ERR signals which will set EXITCODE=1 when any commands return a non-zero exit code
    cc @ajschmidt8

@AjayThorve AjayThorve requested a review from a team as a code owner February 8, 2023 23:41
@github-actions github-actions bot added the gpuCI gpuCI issue label Feb 8, 2023
@madsbk madsbk added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 9, 2023
Copy link
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm missing some detail, I must say I don't like this approach. The problem is, and correct me if I'm wrong, the script will terminate upon the first error and thus we would miss running all the other tests, which IMO is not something we want, if there's a problem we would like to see everything that is failing, not just the first failure.

@AjayThorve
Copy link
Member Author

Unless I'm missing some detail, I must say I don't like this approach. The problem is, and correct me if I'm wrong, the script will terminate upon the first error and thus we would miss running all the other tests, which IMO is not something we want, if there's a problem we would like to see everything that is failing, not just the first failure.

@pentschev from what I understand the set -e command makes sure we do not abort the script on error. This allows all tests to run regardless of pass/fail, but relies on the ERR trap above to manage the EXITCODE for the script.

So in essence, the behavior of the tests scripts should not change by these changes, just a less verbose method of doing the same thing. @ajschmidt8 correct me if I am wrong.

@pentschev
Copy link
Member

@AjayThorve I think you meant set +e, but yes, it seems you're right. I did not know it would also apply to trap. I wrote a small script to confirm:

trap "echo CTRL+C pressed, exiting; break" SIGINT
set +e

while true
do
    echo "Press CTRL+C to stop"
    sleep 1
done

echo Trapped

With set +e on line 2, we see Trapped being printed after pressing CTRL+C, but if we write set -e instead, then the script exits instantly. Therefore, I agree, this looks like a better way of handling issues. The only part I think it's a pitty to lose are custom messages, depending on how a test ends it may be difficult to discern which test(s) failed. However, I think we can still do that by rewriting the trap, for example:

set +e

trap "echo CTRL+C pressed, going to next loop; break" SIGINT
while true
do
    echo "Press CTRL+C to stop first trap"
    sleep 1
done

trap "echo CTRL+C pressed again, exiting; break" SIGINT
while true
do
    echo "Press CTRL+C to stop second trap"
    sleep 1
done

echo Trapped

The code above would outputs the following:

$ bash trap.sh
Press CTRL+C to stop first trap
Press CTRL+C to stop first trap
^CCTRL+C pressed, going to next loop
Press CTRL+C to stop second trap
Press CTRL+C to stop second trap
^CCTRL+C pressed again, exiting
Trapped

I'm not sure this is absolutely necessary, but throwing that as an idea to see what you think of this.

@ajschmidt8
Copy link
Member

@pentschev, I'd like to keep this PR as it is for consistency with other projects.

To your point about easily discerning test failures: our shared workflow for Python tests makes use of the test-summary/action (see here for where we implement it).

This means that pytest job summaries can be viewed on the GitHub Action workflow summary cards (in addition to the job's raw log output).

Here's a quick example of what that looks like for a failing pytest for cuml.

@ajschmidt8
Copy link
Member

@pentschev, I'll wait for your reply to merge.

Or feel free to merge this yourself if you're okay with the current changes.

@pentschev
Copy link
Member

This means that pytest job summaries can be viewed on the GitHub Action workflow summary cards (in addition to the job's raw log output).

In our CI, not everything is a pytest, we also have some benchmarks that are just CLI tools, for example this and this. It's not a big deal currently, but I'd be interested in knowing if there's anything we can also add for non-pytest executables, do you know if there's anything we can report based on their outputs, or just on their exit codes?

@ajschmidt8
Copy link
Member

do you know if there's anything we can report based on their outputs, or just on their exit codes?

@pentschev, what's your ideal way to report?

GitHub Summary Cards are markdown files that you can redirect output to.

So you could effectively print anything you want to it.

In your shell script, you would just have to do something like:

echo "this is my output" >> "$GITHUB_STEP_SUMMARY"

@pentschev
Copy link
Member

GitHub Summary Cards are markdown files that you can redirect output to.

This looks super cool, I think we could print a summary of our results there, but probably have to rethink how we run those benchmarks, and if it makes sense to run them here. In any case, I think this is a little beyond this PR, so let's merge it as is for now and I'll try to rethink this a bit later. Thanks so much for the suggestion, I was not aware of that!

@pentschev
Copy link
Member

/merge

@rapids-bot rapids-bot bot merged commit f44ba33 into rapidsai:branch-23.04 Feb 10, 2023
@AjayThorve AjayThorve deleted the ci/update-error-handling branch February 28, 2023 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gpuCI gpuCI issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants