-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Fix unintended cache misses with async queries #14291
fix: Fix unintended cache misses with async queries #14291
Conversation
@@ -1065,6 +1065,9 @@ def to_adhoc( | |||
elif expression_type == "SQL": | |||
result.update({"sqlExpression": filt.get(clause)}) | |||
|
|||
deterministic_name = md5_sha_from_dict(result) | |||
result["filterOptionName"] = deterministic_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree we should get some 👀 on this from folks more familiar. AFAICT, this preserves the existing behavior of generating a unique key (which I assume is needed on clients) but also keeps it deterministic so that hashing form_data
will work properly (both with async/non-async caching).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With a deterministic filterOptionName
we of course now have the risk of having duplicate filterOptionNames
which probably is precisely what the uuid4 key is aiming to avoid (could cause trouble in React components if these get returned to the frontend). So maybe we should deduplicate filters, too, which in edge cases would avoid unnecessary cache misses (=if the exact same filter has been defined twice vs another query where the same filter is only defined once)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... can you point to a specific example of where that might be an issue? Wouldn't each filter have different attributes? If not, is it a bug on the client that the same filters for the same chart would be added more than once? The idea is this should be deterministic for the same filter dict, but the set of filter dicts should have different properties, causing a different sha.
# objects modify the form_data object. If the modified version were | ||
# to be cached here, it will lead to a cache miss when clients | ||
# attempt to retrieve the value of the completed async query. | ||
original_form_data = copy.deepcopy(form_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
Codecov Report
@@ Coverage Diff @@
## master #14291 +/- ##
==========================================
- Coverage 76.79% 76.71% -0.09%
==========================================
Files 955 955
Lines 48251 48255 +4
Branches 6030 6030
==========================================
- Hits 37055 37018 -37
- Misses 11001 11042 +41
Partials 195 195
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor comment
@@ -1065,6 +1065,9 @@ def to_adhoc( | |||
elif expression_type == "SQL": | |||
result.update({"sqlExpression": filt.get(clause)}) | |||
|
|||
deterministic_name = md5_sha_from_dict(result) | |||
result["filterOptionName"] = deterministic_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With a deterministic filterOptionName
we of course now have the risk of having duplicate filterOptionNames
which probably is precisely what the uuid4 key is aiming to avoid (could cause trouble in React components if these get returned to the frontend). So maybe we should deduplicate filters, too, which in edge cases would avoid unnecessary cache misses (=if the exact same filter has been defined twice vs another query where the same filter is only defined once)?
860f7fc
to
789e109
Compare
Following up on this: I just spent some time playing with the UI and poking through the code. I am pretty confident the As mentioned above, the only way I see a conflict is if two adhoc filters contained the exact same key/value pairs, which I don't think is an expected / desired use case to support. I think we're good on dashboards too since the filter handling should be scoped to a single chart. FWIW, I tested adding a duplicate filter pair, running the chart query, saving the chart, reloading it, etc., and it worked as expected. |
* bug: Fix unintended cache misses with async queries * Ensure sort order * Ensure columns are sorted * Update failing tests
Some charts are rendering errors when
GLOBAL_ASYNC_QUERIES
is turned on (#14289).Some of the fixes:
form_data
to be used for validation on subsequent requests when clients request the query results. However,form_data
is mutated by the code that runs the query and the background worker is storing the mutated object. Since this object is used to derive a unique cache key, any mutations to it will lead to a different key and, subsequently, a cache miss.form_data
in a non-deterministic way before the cache key is derived from the object (e.g., adding UUIDs to it). This will always lead to a unique cache key.Note: even though this is broken in the async queries experience, it will benefit everyone since these fixes should prevent unnecessary cache misses in the regular flow.