Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a negative lookahead assertion for log check #253

Merged
merged 1 commit into from
Dec 11, 2023

Conversation

taldcroft
Copy link
Member

@taldcroft taldcroft commented Dec 7, 2023

Description

The HDF5 retry errors should not generate an alert email.

Interface impacts

None

Testing

Unit tests

  • No unit tests (for task-schedule code)

Functional tests

The watch_cron_logs3.pl documentation shows an example which indicates this regex and syntax should be valid:

<check>
 	<error>
              #    File           Expression
              #  ----------      ---------------------------
 		*		Use of uninitialized value
 		*		(?<!Program caused arithmetic )Error
 		*		Warning
                 *               fatal
 	</error>
</check>

Ran the following Python script to verify the regex's are good. The unwanted lines with error or warning are not matched, while other error or warning lines are matched.

The text is adapted from an alert email today, with a couple of extra lines added.

import re

text = """
warning: could not open file!
WARNING: open_file(/proj/sot/ska3/flight/data/eng_archive/data/acis2eng/5min/1DEAMZT.h5, mode=a, filters=Filters(complevel=5, complib='zlib', shuffle=True, bitshuffle=False, fletcher32=False, least_significant_digit=None)) exception: HDF5 error back trace

  File "H5F.c", line 620, in H5Fopen
    unable to open file
  File "H5VLcallback.c", line 3501, in H5VL_file_open
    failed to iterate over available VOL connector plugins
  File "H5PLpath.c", line 578, in H5PL__path_table_iterate
    can't iterate over plugins in plugin path '(null)'
  File "H5PLpath.c", line 620, in H5PL__path_table_iterate_process_path
    can't open directory: /proj/sot/ska3/flight/lib/hdf5/plugin
  File "H5VLcallback.c", line 3351, in H5VL__file_open
    open failed
  File "H5VLnative_file.c", line 97, in H5VL__native_file_open
    unable to open file
  File "H5Fint.c", line 1898, in H5F_open
    unable to lock the file
  File "H5FD.c", line 1625, in H5FD_lock
    driver lock request failed
  File "H5FDsec2.c", line 1002, in H5FD__sec2_lock
    unable to lock file, errno = 11, error message = 'Resource temporarily unavailable'
Error Message:
  unable to lock file
End of HDF5 error back trace
"""

lines = text.split('\n')

regex = re.compile("error(?! (back trace|message =))", re.IGNORECASE)
print(regex)
for line in lines:
    if regex.search(line):
        print(line)

print()

regex = re.compile("warning(?!: open_file)", re.IGNORECASE)
print(regex)
for line in lines:
    if regex.search(line):
        print(line)

Output:

$ python check_error_regex.py
re.compile('error(?! (back trace|message =))', re.IGNORECASE)
Error Message:

re.compile('warning(?!: open_file)', re.IGNORECASE)
warning: could not open file!

@taldcroft taldcroft requested a review from jeanconn December 7, 2023 11:52
@jeanconn
Copy link
Contributor

jeanconn commented Dec 7, 2023

I could use a bit more context here. Are the warnings benign because the retry eventually succeeded? Or are they benign because there's nothing to do except try again the next day?

@taldcroft
Copy link
Member Author

They are benign because they eventually succeed. If all the tries fail then an exception is raised and the program execution halts giving a non-zero exit status.

@jeanconn
Copy link
Contributor

jeanconn commented Dec 7, 2023

OK, is that exception caught with the current checks?

@taldcroft
Copy link
Member Author

I believe task_schedule will send an alert email if the job fails outright from an uncaught exception. This is no longer the responsibility of watch_cron_job:

https://github.com/sot/task_schedule/blob/128ffa073ce44bc6cd84bf5db144cb2218de0a60/task_schedule3.pl#L265

@taldcroft
Copy link
Member Author

A recent example is in email with subject "ACA weekly report: ALERT".

@taldcroft taldcroft merged commit 43985d4 into master Dec 11, 2023
2 checks passed
@taldcroft taldcroft deleted the exclude-hdf5-retry-error-warning-from-check branch December 11, 2023 11:57
@javierggt javierggt mentioned this pull request Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants