Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dashboard and chart cache refresh unexpected behaviour #30150

Open
3 tasks
calfieri opened this issue Sep 4, 2024 · 4 comments
Open
3 tasks

dashboard and chart cache refresh unexpected behaviour #30150

calfieri opened this issue Sep 4, 2024 · 4 comments
Assignees
Labels
dashboard:performance Related to Dashboard performance viz:charts:export Related to exporting charts

Comments

@calfieri
Copy link

calfieri commented Sep 4, 2024

Bug description

first anomaly: Dashboard data don't rely on table chart cache
the dashboard showing a data table is not getting its data from the underlying chart, but appears to get data directly on the chart dataset.

second anomaly: Dashboard file download don't rely on dashboard cache
the file download (both csv and xlsx) from a dashboard relies directly on the underlying dataset, neither on the dashboard cache nor on the chart cache.

third anomaly: chart file download (only xlsx) don't rely on chart cache
the file download (only xlsx) from a dashboard relies directly on the underlying dataset, not on the chart cache. The csv download instead appears to work correctly.

How to reproduce the bug

  1. precondition: cache enabled based on Redis
  2. precondition: dataset chache timeout: default
  3. precondition: chart refresh cache timeout: 1 day
  4. precondition: dashboard refresh frequency: 1day
  5. insert into dataset three lines l1, l2, l3
  6. create a chart X with cache timeout 1 day based on dataset
  7. create a dashboard D with refresh frequency 1 day based on chart X
  8. execute the dashboard D; three lines appear
  9. open chart X; three lines appear
  10. delete line l3
  11. open dashboard D; three lines appear l1, l2, l3
  12. dowload, from dashboard, in csv format: two lines appear l1 and l2 (the download doesn't rely on dashboard cache).
  13. download from dashboard in xlsx format, two lines appear l1 and l2 (the download doesn't rely on dashboard cache).
  14. open chart X; three lines appear l1, l2, l3.
  15. download from chart, in xlsx format three lines appear l1, l2, l3
  16. download from chart, in csv format, three lines appear l1, l2, l3.
  17. Open dashboard D, two lines appear l1, l2 (a dashboard refresh has been triggered simply opening the chart X)
  18. After three hours
  19. open chart X; three lines appear l1, l2, l3.
  20. download from chart, in xlsx format two lines appear l1, l2 (the download doesn't rely on chart cache)
  21. download from chart, in csv format, three lines appear l1, l2, l3

Screenshots/recordings

three lines interted into the dataset
image

image

image

After deleting third row

image

The dashboard remains stable, relying on the cache

image

excel download reports 2 rows

image

csv dowload reports 2 rows

the downloaded data don't meet the dashboard data

image

After entering the chart the dashboard appears to have refreshed with dataset data and not with chart data (cached)

image

Please pay attention the chart data is still presenting the cached data (aging 14 minutes):

image

Downloading csv from chart after few minutes:
image

Downloading xlsx from chart after few minutes:

image

download data meet the chart data

After 3 hours

image

chart still rely on its data cache

Downloading csv from chart:
image

download data meet the chart data

Downloading xlsx from chart:
image
download data don't meet the chart data

Superset version

3.1.3

Python version

3.9

Node version

16

Browser

Chrome

Additional context

No response

Checklist

  • I have searched Superset docs and Slack and didn't find a solution to my problem.
  • I have searched the GitHub issue tracker and didn't find a similar bug report.
  • I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.
@dosubot dosubot bot added dashboard:performance Related to Dashboard performance viz:charts:export Related to exporting charts labels Sep 4, 2024
Copy link

dosubot bot commented Sep 4, 2024

@calfieri, hello! I'm here to help you with any bugs, questions, or contributions. Let's work together to solve the issue you're facing.

The issues you're experiencing with the cache behavior in Apache Superset can be attributed to how the system handles caching and data retrieval for dashboards and charts. Here are some insights and potential explanations for each anomaly:

  1. Dashboard Data Not Relying on Table Chart Cache:

    • Dashboards in Superset may fetch data directly from the underlying dataset rather than relying on the cached data of individual charts. This behavior ensures that the dashboard displays the most up-to-date data, but it can lead to inconsistencies if the chart cache is not refreshed simultaneously.
  2. Dashboard File Download Not Relying on Dashboard Cache:

    • When downloading data from a dashboard in CSV or XLSX format, the system fetches data directly from the underlying dataset rather than using the cached data. This ensures that the downloaded data is current, but it bypasses the dashboard cache, leading to discrepancies between the displayed data and the downloaded data.
  3. Chart File Download (Only XLSX) Not Relying on Chart Cache:

    • The XLSX download for charts fetches data directly from the dataset, bypassing the chart cache. This behavior is different from the CSV download, which correctly uses the cached data. This inconsistency can cause confusion and needs to be addressed to ensure uniform behavior across different download formats.

Steps to Reproduce and Observations:

  • The steps you provided clearly outline the process and the anomalies observed at each stage. The key observations are:
    • The dashboard and chart caches are not being utilized for data downloads.
    • The chart cache is not being refreshed when the underlying dataset changes, leading to stale data being displayed.
    • The XLSX download for charts does not rely on the chart cache, unlike the CSV download.

Potential Solutions:

  1. Ensure Consistent Cache Usage:

    • Modify the data retrieval logic to ensure that both CSV and XLSX downloads use the cached data for charts and dashboards. This will require changes in the backend API to check for cached data before fetching from the dataset.
  2. Cache Invalidation and Refresh:

    • Implement a mechanism to invalidate and refresh the cache for charts and dashboards when the underlying dataset changes. This can be achieved by setting appropriate cache timeout values and using the force parameter to bypass the cache when necessary.
  3. Unified Data Retrieval Logic:

    • Ensure that the data retrieval logic for dashboards and charts is consistent across different formats (CSV, XLSX) and actions (viewing, downloading). This will help avoid discrepancies and ensure that the displayed and downloaded data match.

Example Configuration for Cache:

To configure the cache using Redis and ensure proper cache timeout settings, you can use the following configuration in superset_config.py:

FILTER_STATE_CACHE_CONFIG = {
    'CACHE_TYPE': 'RedisCache',
    'CACHE_DEFAULT_TIMEOUT': 86400,
    'CACHE_KEY_PREFIX': 'superset_filter_cache',
    'CACHE_REDIS_URL': 'redis://localhost:6379/0'
}

DATA_CACHE_CONFIG = {
    "CACHE_TYPE": "RedisCache",
    "CACHE_KEY_PREFIX": "superset_results",
    "CACHE_DEFAULT_TIMEOUT": 86400,
    'CACHE_REDIS_URL': 'redis://localhost:6379/0'
}

Conclusion:

Addressing these anomalies will require changes in the data retrieval and caching logic to ensure consistent behavior across different actions and formats. By ensuring that the cache is properly utilized and refreshed, you can avoid discrepancies and provide a more reliable user experience.

For more detailed information on caching mechanisms and configurations, you can refer to the Superset documentation [1][2][3].

To continue talking to Dosu, mention @dosu.

@rusackas
Copy link
Member

rusackas commented Sep 4, 2024

I'm not sure who knows the most about chart caching at this point. Maybe @villebro or @michael-s-molina know who might be able to answer this? Let's see if we can hot-potato it to success :)

@villebro
Copy link
Member

villebro commented Sep 4, 2024

There's two things here that could be causing the issue:

  • Charts rendered via the dashboard sometimes generate a cache key that's different from the one in Explore view. This is due to the order of the filters, which are slightly different when certain native filters are used. I've been meaning to open a PR to fix this that guarantees the same order for filters irrespective of Dashboard or Explore view, but it's far down on my backlog.
  • IIRC, requesting chart data as FULL vs JSON vs XLS vs CSV generates a different cache key, which may be causing the unexpected cache miss (not 100 % of the exact specifics of this). At any rate, I believe we should ensure that result type doesn't affect the cache key.

Anyone feel like taking these on? 😛

@calfieri
Copy link
Author

calfieri commented Sep 5, 2024

Hi villebro; thanks a lot for your comment; if we understood correctly the behavior you describe is related to our first issue (Dashboard data don't rely on table chart cache) and of course we welcome the idea of opening a PR on this.
Anyway, fo the further two questions (i.e.: file download don't rely on related cache) we got, from the previous dosubot comment, that the excel download logic should be modified to ensure that both CSV and XLSX downloads use the cached data; this is matter of configuration, or it is a source code fix? thanks in advance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dashboard:performance Related to Dashboard performance viz:charts:export Related to exporting charts
Projects
None yet
Development

No branches or pull requests

4 participants