Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hospital Data Scraper Does Not Log Errors #1024

Open
Mr0grog opened this issue Mar 14, 2021 · 3 comments
Open

Hospital Data Scraper Does Not Log Errors #1024

Mr0grog opened this issue Mar 14, 2021 · 3 comments
Assignees

Comments

@Mr0grog
Copy link
Collaborator

Mr0grog commented Mar 14, 2021

It turns out that the hospital scraper has been failing for a while (e.g. https://github.com/sfbrigade/stop-covid19-sfbayarea/runs/2107127471?check_suite_focus=true), but we didn't know because it's not logging failures to Slack like the news and data scrapers are. This needs fixing.

@benghancock
Copy link
Collaborator

Thanks for flagging this @Mr0grog - I'll take a look at how we're logging in the other code and see how that can be integrated into the hospital script.

@benghancock
Copy link
Collaborator

The scraper_hospital_data.py script does have exception handling that appears the same as the other scripts:

    except Exception as error:
        message = click.style(
            'Hospitalization data fetch encountered error', fg='red'
        )
        click.echo(f'{message}: {error}', err=True)
        traceback.print_exc()
        sys.exit(1)

Does something need to change in the GH action YAML? I see this in the Action for data.v2.json but nothing comparable in the hospital data action:

python scraper_data.py > today.json 2> errors.txt || true
        # Merge new data into previous data. (Note this has to be two steps;
        # if we read $OUT_PATH as input for jq and write stdout to it at the
        # same time, we get conflicts and bad output.)
        jq -s '.[0] + .[1]' $OUT_PATH today.json > merged.json
        mv merged.json $OUT_PATH
        ERRORS=`cat errors.txt`
        if [ -n "${ERRORS}" ]; then
          echo "Encountered the following errors while scraping:"
          echo "------------------------------------------------"
          echo "${ERRORS}"
          
          # The error text can contain unescaped quotes, newlines, etc.
          # Use jq to make sure we are composing correctly formatted JSON.
          # `--raw-input` treats the input as strings instead of JSON.
          # `--slurp` causes all lines to be combined into one string.
          ERRORS_JSON=`cat errors.txt | jq --slurp --raw-input '{"text": .}'`
          curl -X POST -H 'Content-type: application/json' --data "${ERRORS_JSON}" $SLACK_WEBHOOK_URL
          
          # Raise an error so this step fails.
          false
        fi

I'm pretty unfamiliar with GH actions but happy to try my hand at this. Looks like a bash script inside the YAML, and I can see how it integrates with Slack in the section above. Seems like the next step would be to put in a PR to adjust .github/workflows/data_update_hospitalization.yml? Is there any way to test prior to merging, or are we basically just testing in production?

@Mr0grog
Copy link
Collaborator Author

Mr0grog commented Apr 1, 2021

Does something need to change in the GH action YAML? I see this in the Action for data.v2.json but nothing comparable in the hospital data action

Yep, you've got it. I should have been clearer in the issue description.

The other key bit is the part on the step that runs the scraper, which lets the job continue even if that step error'd out.

Best way to test is to create a free Slack org and Webhook URL for yourself, alter the scraper to make it fail, and try that out. You can run it on a fork or test locally with act.

All that is really high overhead, though, and I’m ok if we just review with eyeballs for this, even that may not be ideal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants