-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to mangle alert words in retry warnings and ruff it #57
Conversation
@taldcroft Can you point me to an example of one of the task schedule alerts that was unhelpful? I'm a little confused as I figured that if we don't want the retry warnings as warnings, the fix would be to go to ska_helpers/ska_helpers/retry/api.py Line 79 in d80b23e
|
@jeanconn - I want the full warning messages to be in the logs, but just not trigger task_schedule alerts if they are warnings and not actual exceptions. Setting the logger to debug would make the warnings invisible to us since e.g. cheta runs logging at FYI I gave @javierggt a detailed code walkthrough and after some convincing he said this was at least "acceptable".
|
"I want the full warning messages to be in the logs, but just not trigger task_schedule alerts if they are warnings and not actual exceptions". OK, though then instead of changing the logging output to debug wouldn't that just be changing the "warning" text to "retry warn" or something in the try output? I'm still a little confused about the need for mangling. |
And using a reference email, I'm not confident about how the task_schedule regex is applied, but I'd figure the back trace in the example email would get caught by the first check like: |
It's a good point that this PR should also be accompanied by a cheta PR to effectively back out sot/cheta#253. In retrospect that was not a great solution and I think this PR is better. This PR applies universally for all applications that use
To the last bullet point, the contents of the exception message from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are probably ways to trigger the alerts we want and not see the others without this technique but if 1/3 of the user base wants this, it seems fine to me.
|
||
Example:: | ||
|
||
>>> mangle_alert_words("WARNING: This is a fatal Error message.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In passing I note this example suggests mangle_alert_words is available as a function without the leading underscore.
I am open to concrete suggestions which address all of the concerns I raised above. |
OK, first then, do we need full tracebacks in the logs for warnings -- the ones handled in the retry? |
Your comment "the contents of the exception message from tables (that gets put into the logger.warn() output) should be considered out of our control." was a bit confusing to me, as I thought including the stack trace is within our control and seems to be one of the things we're working around here. |
And if the stack trace is required (and with my code reading and no real debugging I'm confused about why we would be seeing those traces for handled exceptions out of ska_helpers retry in the first place) it seems like the error and the trace and anything out of ska_helpers retry could be reformatted to something more transparent like prepending something like "log_check_ignore" to every line out of that retry function? And removing the lookaheads. |
True, though that is just a different form of mangling in which each line is mangled with a magic word in front. In this case you would still need a negative lookbefore assertion to preclude matching any alert word which is on a line that starts with |
I meant that the original exception message is generated by |
@jeanconn - the original version transformed like |
Yes, I don't disagree that adding word to mark for exclusion would just be another kind of mangling, but I though it would be a little easier to find these "try then succeeds" in the logs without the letter-to-number mangling. I think the other problem I had in review was that I've only got 7 email messages |
sot/cheta#253 does actually work. But Basically I just want to upstream the solution to this particular problem instead of using regex lookahead/behind assertions (which always break my brain). If you want to make a new PR that mangles the lines and then make and test the sister PR to watch_cron_logs then I'd be fine with that. Otherwise I'm done with this. I'd like one solution in speedy. |
That's fine. I approved when fine with this. I agree that my suggestion to mangle differently is not substantively different from this PR. I was a little confused about the tracebacks and lookahead so the other PR to remove the lookahead check for cheta is a good outcome of this conversation. |
Description
This is intended to make it so that warnings about HDF5 file "resource unavailable" which are eventually resolved (the H5 file open finally succeeds) do NOT result in a task_schedule alert. This has been a problem because it is difficult to distinguish from the email whether it failed and stopped processing or it succeeded and processing completed successfully.
The idea here is a little hacky, but when issuing a logger warning it mangles certain alert keywords, replacing an
i
orl
with1
ando
with0
. SoWARNING: Failing fatAL Error exception message
becomesWARN1NG: Fai1ing fatA1 Err0r excepti0n message
. This will remove all the matches in task_schedule checking for warnings only.Any exceptions (i.e. it failed every time) will be passed through as normal.
Unrelated fixes
This replaces use of
logging.getLogger
withska_helpers.logging.basic_logger
following modern best-practices. In particularlogging.getLogger
will end up delegating to the root logger and there are no guarantees on how that is configured.This introduces use of
ruff
in the standard way (https://github.com/sot/skare3/wiki/Configuration-for-linting-and-formatting-with-ruff). It was failingblack
andlint
and it was just easier to migrate toruff
since that is what we see in our IDE.Interface impacts
Testing
Unit tests
Independent check of unit tests by Jean
Functional tests
No functional testing.