-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(embedded): Retry when executing alert queries to avoid sending transient errors to users as alert failure notifications #20419
Conversation
Codecov Report
@@ Coverage Diff @@
## master #20419 +/- ##
==========================================
- Coverage 66.70% 66.64% -0.07%
==========================================
Files 1739 1744 +5
Lines 65135 65381 +246
Branches 6897 6897
==========================================
+ Hits 43450 43570 +120
- Misses 19932 20058 +126
Partials 1753 1753
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
d2ac5b0
to
6b8e8ea
Compare
d4e8834
to
77c3fd2
Compare
superset/reports/commands/alert.py
Outdated
@@ -46,6 +47,10 @@ | |||
# to avoid heavy loads done by a user mistake | |||
OPERATOR_FUNCTIONS = {">=": ge, ">": gt, "<=": le, "<": lt, "==": eq, "!=": ne} | |||
|
|||
# Max tries to run queries to prevent false errors caused by transient errors | |||
# being returned to users | |||
EXECUTE_QUERY_MAX_TRIES = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep it at 1 to keep the default behavior
also let's move it to the config to let other to decide what retry policy they are looking for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's move const to the config
367cc9b
to
9525525
Compare
superset/reports/commands/alert.py
Outdated
@@ -27,6 +27,7 @@ | |||
|
|||
from superset import app, jinja_context, security_manager | |||
from superset.commands.base import BaseCommand | |||
from superset.config import ALERT_REPORTS_QUERY_EXECUTION_MAX_TRIES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be accessed in a different way:
superset/superset/reports/commands/execute.py
Line 388 in 4f77824
if app.config["ALERT_REPORTS_NOTIFICATION_DRY_RUN"]: |
import pandas as pd | ||
from pytest_mock import MockFixture | ||
|
||
from superset.config import ALERT_REPORTS_QUERY_EXECUTION_MAX_TRIES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's overwrite it for the test to be set to 2, feel free to use a test config
2be5b46
to
962cd2e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG%nit
|
||
command.validate() | ||
|
||
assert execute_query_mock.call_count == expected_max_retries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do
assert execute_query_mock.call_count == 3
instead, it will make sure that you are using a right config
…ansient errors to users as alert failure notifications
…ansient errors to users as alert failure notifications (#20419) Co-authored-by: Rui Zhao <[email protected]> (cherry picked from commit 818962c)
fix(embedded): Retry when executing alert queries to avoid sending transient errors to users as alert failure notifications
SUMMARY
Details documented at #20418
When there are transient errors, alert owners will receive a notification saying the alert fails even though the underlying dataset is working properly. Adding retries to avoid users receiving redundant notifications.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
N/A
TESTING INSTRUCTIONS
This only happens when there are transient errors, so there is no stable way to reproduce the issue. Added unit tests to make sure execute_query function is retried when it should.
ADDITIONAL INFORMATION