Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache warm-ups never succeed #9597

Closed
3 tasks done
Pinimo opened this issue Apr 20, 2020 · 27 comments · Fixed by #21076
Closed
3 tasks done

Cache warm-ups never succeed #9597

Pinimo opened this issue Apr 20, 2020 · 27 comments · Fixed by #21076
Labels
!deprecated-label:bug Deprecated label - Use #bug instead .pinned Draws attention

Comments

@Pinimo
Copy link

Pinimo commented Apr 20, 2020

Globally: the cache warm-up tasks launched by Celery workers all silently fail. Indeed, they perform GETs on the main server's URL without providing the required authentication. However, dashboards may not be loaded without being logged in.

Related bugs:

  • unit tests on this feature miss the error
  • the documentation should mention that the Celery worker needs the --beat flag to listen on CeleryBeat schedules (cf docker-compose.yml configuration)

At stake: long dashboard load times for our users, or outdated dashboards.

Main files to be fixed:

  • superset/tasks/cache.py

Expected results

When the Celery worker logs this (notice 'errors': []):

superset-worker_1  | [2020-04-20 13:05:00,299: INFO/ForkPoolWorker-3] Task cache-warmup[73c09754-4dcb-4674-9ac2-087b04b6e209] 
                     succeeded in 0.1351924880000297s: 
                     {'success': [
                         'http://superset:8088/superset/explore/?form_data=%7B%22slice_id%22%3A%2031%7D', 
                         'http://superset:8088/superset/explore/?form_data=%7B%22slice_id%22%3A%2032%7D', 
                         'http://superset:8088/superset/explore/?form_data=%7B%22slice_id%22%3A%2033%7D'], 
                     'errors': []}

... we would expect to have something (more or less) like this in the Superset server logs:

superset_1         | 172.20.0.6 - - [2020-04-20 13:05:00,049] "POST /superset/explore_json/?form_data=%7B%22slice_id%22%3A HTTP/1.1" 
                     200 738 "http://superset:8088/superset/dashboard/1/" "python-urllib2"

Of course, we also hope to have a bunch of items in the Redis logs, and that loading dashboards is lightning-quick.

Actual results

But we get these logs instead, which show there is a 302 redirect to the login page, followed by a 200 on the login page. This redirect is interpreted as a success by the tests.

superset_1         | 172.20.0.6 - - [20/Apr/2020 08:12:00] "GET /superset/explore/?form_data=%7B%22slice_id%22%3A%2030%7D HTTP/1.1" 
                     302 -
superset_1         | INFO:werkzeug:172.20.0.6 - - [20/Apr/2020 08:12:00] "GET /superset/explore/?form_data=%7B%22slice_id%22%3A%2030%7D HTTP/1.1" 
                     302 -
superset_1         | 172.20.0.6 - - [20/Apr/2020 08:12:00] "GET /login/?next=http%3A%2F%2Fsuperset%3A8088%2Fsuperset%2Fexplore%2F%3Fform_data%3D%257B%2522slice_id%2522%253A%252030%257D HTTP/1.1" 
                     200 -

(I added a few line returns)

In the Redis, here is the only stored key:

$ docker-compose exec redis redis-cli
127.0.0.1:6379> KEYS *
1) "_kombu.binding.celery"

Last, the dashboards take time loading the data on the first connection.

Screenshots

None

How to reproduce the bug

I had to patch the master branch to get this to work. In particular, I have to admit it was not very clear to me whether the config was read from file docker/pythonpath_dev/superset_config.py or file superset/config.py. So I kind of adapted superset/config.py and copied it over to the pythonpath one (which looks like it is read by the celery worker, but not the server).

Anyway, this reproduces the bug:

  1. $ docker system prune --all to remove all dangling images, exited containers and volumes.
  2. $ git checkout master && git pull origin master
  3. $ wget -O configs.patch https://gist.githubusercontent.com/Pinimo/c339ea828974d2141423b6ae64192aa4/raw/e449c97c11f81f7270d6e0b2369d55ec41b079a9/0001-bug-Patch-master-to-reproduce-sweetly-the-cache-warm.patch && git apply configs.patch
    This will apply patches to master to make the scenario work out neatly, in particular add the --beat flag and specify a cache warmup task on all dashboards every minute.
  4. $ docker-compose up -d
  5. Wait for the containers to be built and up.
  6. $ docker-compose logs superset-worker | grep cache-warmup
  7. $ docker-compose logs superset | grep slice
  8. $ docker-compose exec redis redis-cli then type KEYS *

Environment

(please complete the following information):

  • superset version: 0.36.0
  • python version: dockerized
  • node.js version: dockerized
  • npm version: dockerized

Checklist

  • I have checked the superset logs for python stacktraces and included it here as text if there are any.
  • I have reproduced the issue with at least the latest released version of superset.
  • I have checked the issue tracker for the same issue and I haven't found one similar.
@Pinimo Pinimo added the !deprecated-label:bug Deprecated label - Use #bug instead label Apr 20, 2020
@Pinimo
Copy link
Author

Pinimo commented Apr 20, 2020

BTW, I noticed there is such a login procedure (with a headless browser) in the email report generator. Perhaps that procedure could be factored out and reused to warm up the cache in our case?

@jay-finn
Copy link

@Pinimo what does your cache config look like?
I setup redis caching with with a timeout of 5 minutes and cache warmup with the topndashboard strategy every 2 minutes (just to test).
I can see this in Celery worker logs:

[2020-04-21 00:36:00,009: INFO/ForkPoolWorker-1] cache-warmup[e41de539-0bf7-4e70-b02b-4d2a132d8d0e]: Loading strategy
[2020-04-21 00:36:00,010: INFO/ForkPoolWorker-1] cache-warmup[e41de539-0bf7-4e70-b02b-4d2a132d8d0e]: Loading TopNDashboardsStrategy
[2020-04-21 00:36:00,014: INFO/ForkPoolWorker-1] cache-warmup[e41de539-0bf7-4e70-b02b-4d2a132d8d0e]: Success!
[2020-04-21 00:36:00,043: INFO/ForkPoolWorker-1] cache-warmup[e41de539-0bf7-4e70-b02b-4d2a132d8d0e]: Fetching http://0.0.0.0:8088/superset/explore/?form_data=%7B%22slice_id%22%3A%201%7D
[2020-04-21 01:06:00,131: INFO/ForkPoolWorker-2] cache-warmup[d2d68627-adce-4fa5-852e-522e95350a6c]: {'success': ['http://0.0.0.0:8088/superset/explore/?form_data=%7B%22slice_id%22%3A%201%7D'], 'errors': []}

but in Superset logs, I only see:

superset_1      | 2020-04-21 00:36:00,049 [DEBUG] [stats_logger] (incr) explore

Needless to say, my charts are not being updated

This is my config:

CACHE_DEFAULT_TIMEOUT = 300
CACHE_CONFIG = {
    'CACHE_TYPE': 'redis',
    'CACHE_DEFAULT_TIMEOUT': 180,
    'CACHE_KEY_PREFIX': 'superset_results',
    'CACHE_REDIS_URL': 'redis://localhost:6379/0',
}

class CeleryConfig(object):
    BROKER_URL = 'redis://localhost:6379/0'
    CELERY_IMPORTS = (
        'superset.sql_lab',
        'superset.tasks',
    )
    CELERY_RESULT_BACKEND = 'redis://localhost:6379/0'
    CELERYD_LOG_LEVEL = 'DEBUG'
    CELERYD_PREFETCH_MULTIPLIER = 10
    CELERY_ACKS_LATE = True
    CELERYBEAT_SCHEDULE = {
        'cache-warmup-hourly': {
            'task': 'cache-warmup',
            'schedule': crontab(minute='*/2', hour='*'),
            'kwargs': {
                'strategy_name': 'top_n_dashboards',
                'top_n': 5,
                'since': '7 days ago',
            },
        },
    }

CELERY_CONFIG = CeleryConfig

@Pinimo
Copy link
Author

Pinimo commented Apr 21, 2020

Hi @jayhjha, thank you for looking into this. Interesting bug. 🤔

My cache config is one of these two (which are identical): one and two.

I've put everything in a git patch so you can normally start from master, apply the patch and reproduce.

@Pinimo
Copy link
Author

Pinimo commented Apr 21, 2020

@jayhjha Perhaps it would be worth changing your config variable SUPERSET_SERVER_ADDRESS to "superset".

@stefanmo
Copy link

Any news on this?

@Pinimo
Copy link
Author

Pinimo commented May 14, 2020

A colleague made a POC on this, but came to the conclusion it is already quite difficult to have the email reports working... He wanted to use part of that code (headless browser + login) to work around the login problem. I think he found out the dependencies for the feature were not included in the Dockerfile.

@Pinimo
Copy link
Author

Pinimo commented May 14, 2020

To my knowledge the feature is (and will stay...) broken 😢

@stale
Copy link

stale bot commented Jul 13, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.

@stale stale bot added the inactive Inactive for >= 30 days label Jul 13, 2020
@Pinimo
Copy link
Author

Pinimo commented Jul 14, 2020

Any news from the community on this issue?

@stale stale bot removed the inactive Inactive for >= 30 days label Jul 14, 2020
@mukulsaini
Copy link

@Pinimo any workaround you find to this issue? I am getting the same exact issue.

@Pinimo
Copy link
Author

Pinimo commented Aug 6, 2020

@mukulsaini I have not found the time to address the issue, to my best knowledge it has not been solved yet. If you too find this issue is a real problem, I invite you to talk it over on Superset's Slack 🗨️

Here are a few educated guesses as how to solve the issue:

  1. We could work around the auth by... signing in through a headless browser in the Celery process. After thinking it over, it seems difficult to me:

    • the browser software needs to be installed and launched on the Celery worker (looks like an overkill to me)
    • the passphrase and user name need to be handled as secrets
  2. Perhaps a better solution would involve setting up an API server with its own authentication procedure -- or a new auth method on the same server, to allow the Celery worker to perform cache requests:

    • Logging in would require a login token. The token would need to be handed over to the Celery worker on start-up, possibly through an environment variable or shell argument.
    • When the cache warm-up starts, the worker would use the token to authenticate through this special auth channel
    • He would not be granted any rights on the server except performing cache requests, i.e. POST requests that only return empty documents (not usable to extract data from the instance).
    • However, we should make sure this would not breach the server's security or reliability. For instance, the token should not be used to flood the server, DB engine and Redis instance (DDOS).
  3. Not sure at all about this last idea: we could code a new CLI route to return the chart data. The Celery worker would then execute the CLI (if I'm remembering right, all the configs and docker images are the same). However, that would possibly infringe memory limits for the Celery worker.

@Pinimo
Copy link
Author

Pinimo commented Aug 6, 2020

Yet another draft solution:

  1. Simpler than (2): create a user/password for the Celery worker on db-inits, with a very specific caching role. I find it important that this role should never be able to actually extract any data (so as not to care too much for the password being stolen), just to ping the server and get it to cache the data. It would even be possible to modify the @login_required decorator to add the constraint:
    • "if the user belongs only to group __cache_worker, then
      • he should never POST or PUT
      • he should never see any data he requested
      • but his data should be sent out to Redis just like for anybody"

@mistercrunch
Copy link
Member

Mmmh, maybe I'm missing something, but it seems like we shouldn't have to go through the web server to do this.

Refactoring / mimicking what explore_json does might be an option.
https://github.com/apache/incubator-superset/blob/master/superset/views/core.py#L525-L536

@mukulsaini
Copy link

mukulsaini commented Aug 7, 2020

@mistercrunch What I see from the previous commits, previously the route used to cache warmup in cache.py get_url was /explore_json. Any reason it was changed to /explore ?

@stale
Copy link

stale bot commented Oct 10, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.

@stale stale bot added the inactive Inactive for >= 30 days label Oct 10, 2020
@daniel-mueller
Copy link

daniel-mueller commented Mar 22, 2021

Any news or workarounds for avoiding the 302 to the login endpoint?

I still run into this issue using latest docker image (warm-up succeeds on worker, superset logs show redirect to login, no caches refreshed). Not being able to warm-up caches periodically feels like a missing vital feature.

@stale stale bot removed the inactive Inactive for >= 30 days label Mar 22, 2021
@SathishkumarSubramanian
Copy link

SathishkumarSubramanian commented Jul 17, 2021

Cache Warmup is not working on my superset1.2 (docker, ubuntu 18.04), my superset log is below,
Is this known defect in current version, Any suggestion on this or workaround?

superset_worker | [2021-07-17 16:20:00,031: INFO/ForkPoolWorker-1] cache-warmup[e9f826a6-794a-46e5-b6b4-3e2122c7ae03]: Loading strategy
superset_worker | [2021-07-17 16:20:00,031: INFO/ForkPoolWorker-1] cache-warmup[e9f826a6-794a-46e5-b6b4-3e2122c7ae03]: Loading TopNDashboardsStrategy
superset_worker | [2021-07-17 16:20:00,032: INFO/ForkPoolWorker-1] cache-warmup[e9f826a6-794a-46e5-b6b4-3e2122c7ae03]: Success!
superset_worker | [2021-07-17 16:20:00,049: INFO/ForkPoolWorker-1] cache-warmup[e9f826a6-794a-46e5-b6b4-3e2122c7ae03]: Fetching http://0.0.0.0:8088/superset/explore/?form_data=%7B%22slice_id%22%3A%20164%7D
superset_worker | [2021-07-17 16:20:00,051: ERROR/ForkPoolWorker-1] cache-warmup[e9f826a6-794a-46e5-b6b4-3e2122c7ae03]: Error warming up cache!
superset_worker | Traceback (most recent call last):
superset_worker | File "/usr/local/lib/python3.7/urllib/request.py", line 1350, in do_open
superset_worker | encode_chunked=req.has_header('Transfer-encoding'))
superset_worker | File "/usr/local/lib/python3.7/http/client.py", line 1277, in request
superset_worker | self._send_request(method, url, body, headers, encode_chunked)
superset_worker | File "/usr/local/lib/python3.7/http/client.py", line 1323, in _send_request
superset_worker | self.endheaders(body, encode_chunked=encode_chunked)
superset_worker | File "/usr/local/lib/python3.7/http/client.py", line 1272, in endheaders
superset_worker | self._send_output(message_body, encode_chunked=encode_chunked)
superset_worker | File "/usr/local/lib/python3.7/http/client.py", line 1032, in _send_output
superset_worker | self.send(msg)
superset_worker | File "/usr/local/lib/python3.7/http/client.py", line 972, in send
superset_worker | self.connect()
superset_worker | File "/usr/local/lib/python3.7/http/client.py", line 944, in connect
superset_worker | (self.host,self.port), self.timeout, self.source_address)
superset_worker | File "/usr/local/lib/python3.7/socket.py", line 728, in create_connection
superset_worker | raise err
superset_worker | File "/usr/local/lib/python3.7/socket.py", line 716, in create_connection
superset_worker | sock.connect(sa)
superset_worker | ConnectionRefusedError: [Errno 111] Connection refused
superset_worker |
superset_worker | During handling of the above exception, another exception occurred:
superset_worker |
superset_worker | Traceback (most recent call last):
superset_worker | File "/app/superset/tasks/cache.py", line 291, in cache_warmup
superset_worker | request.urlopen(url)
superset_worker | File "/usr/local/lib/python3.7/urllib/request.py", line 222, in urlopen
superset_worker | return opener.open(url, data, timeout)
superset_worker | File "/usr/local/lib/python3.7/urllib/request.py", line 525, in open
superset_worker | response = self._open(req, data)
superset_worker | File "/usr/local/lib/python3.7/urllib/request.py", line 543, in _open
superset_worker | '_open', req)
superset_worker | File "/usr/local/lib/python3.7/urllib/request.py", line 503, in _call_chain
superset_worker | result = func(*args)
superset_worker | File "/usr/local/lib/python3.7/urllib/request.py", line 1378, in http_open
superset_worker | return self.do_open(http.client.HTTPConnection, req)
superset_worker | File "/usr/local/lib/python3.7/urllib/request.py", line 1352, in do_open
superset_worker | raise URLError(err)
superset_worker | urllib.error.URLError: <urlopen error [Errno 111] Connection refused>
superset_worker | [2021-07-17 16:20:00,054: INFO/ForkPoolWorker-2] Task reports.scheduler[8a25cd68-3d17-4969-bb69-752aee1fb177] succeeded in 0.02081612199981464s: None
superset_worker | [2021-07-17 16:20:00,055: INFO/ForkPoolWorker-1] Task cache-warmup[e9f826a6-794a-46e5-b6b4-3e2122c7ae03] succeeded in 0.024644025999805308s: {'success': [], 'errors': ['http://0.0.0.0:8088/superset/explore/?form_data=%7B%22slice_id%22%3A%20164%7D']}

@nytai
Copy link
Member

nytai commented Oct 19, 2021

Yes, unfortunately the cache warmup code is missing auth. It should be fairly simple to pass auth cookies in the request though, most of the code is already in superset. There's a function that returns auth cookies given a user (can use the THUMBNAIL_SELENIUM_USER config, as that user should be an admin user that can view all charts), it lives here. So you would simply have to add the code that grabs the cookies and then in each chart request include the Cookies header.

@nytai nytai added the .pinned Draws attention label Oct 19, 2021
@nytai
Copy link
Member

nytai commented Oct 19, 2021

Additionally, it seems that the celery worker is reporting success since 302s do not throw URLError exceptions. The code should probably be updated to only report success if the response is in the 200 range.

@nikenfls
Copy link

nikenfls commented Apr 18, 2022

superset 1.3.2 (ecs not docker env)
python 3.7.9

I tried the warmup feature last week. And find out several problems:

  1. currently I am in UTC+8 tz. And crontab should subtract 8 hours.
  2. the authentication problem that talked about in previous comments. I reused the code in screenshot to get cookie and make the request works (machine_auth_provider_factory part).
  3. Then I redo the refresh cache operation in superset and found that it is a POST request(original code is a GET) and body are chart params with url param force=true. (original url GETrequest only gets the result from cache)
  4. I tried to get params from slices params field and submit the POST request (need disable CSRF protection in my case). The process worked, however the cache key I generated is different from the request from frontend. Seem like the cache key is generated by query object with extra unknown params. And I didn't find a way to build the post body like frontend.

And I read the cache part code in master branch and nothing change.

  • I just wonder is there a way to replace the content of the cache key?
  • or how to refresh chart cache key without building POST request myself?

@musicmuthu
Copy link

Still this issue was not fixed even with the latest image Some one please throw the light.

@atticoos
Copy link
Contributor

atticoos commented Jun 4, 2022

I'm quite confused how this is advertised in the documentation as a way to warm the cache, yet is impossible since all the cache-warmup strategies are stuck behind the auth wall... is the community overlooking something?

Tagging original author @betodealmeida and recent maintainers @graceguo-supercat & @john-bodley

@betodealmeida
Copy link
Member

At Lyft, when I worked on this, we had a custom security manager, and we could access any endpoint using a master token that gave full permissions.

@atticoos
Copy link
Contributor

atticoos commented Jun 5, 2022

@betodealmeida i started to explore slapping on a token, but noticed everything was cookie based so i didn't pursue much further.

I managed to just grab the Set-Cookie by making a login request and then attaching the cookie to the cache warming requests f57fab7

I just need my graphs precomputed, saw it documented and thought this would be a good fit. Not interested in forking this, maintaining it, & building images from underlying source etc. I see you work at preset.io which is based on superset; does your on-prem offering happen to have this working?

atticoos added a commit to atticoos/superset that referenced this issue Jun 5, 2022
The cache warming strategies perform unauthorized `GET` requests that are redirected to the login page.

As a result, they do not work as documented. Please remove documentation references until these are ready for production use.

apache#9597
atticoos added a commit to atticoos/superset that referenced this issue Jun 5, 2022
The cache warming strategies perform unauthorized `GET` requests that are redirected to the login page apache#9597
atticoos added a commit to atticoos/superset that referenced this issue Jun 5, 2022
The cache warming strategies perform unauthorized `GET` requests that are redirected to the login page apache#9597
@atticoos
Copy link
Contributor

atticoos commented Jun 5, 2022

Maybe i can do something with

@expose("/guest_token/", methods=["POST"])
@event_logger.log_this
@protect()
@safe
@permission_name("grant_guest_token")
def guest_token(self) -> Response:
"""Response
Returns a guest token that can be used for auth in embedded Superset
---
post:
description: >-
Fetches a guest token
requestBody:
description: Parameters for the guest token
required: true
content:
application/json:
schema: GuestTokenCreateSchema
responses:
200:
description: Result contains the guest token
content:
application/json:
schema:
type: object
properties:
token:
type: string
401:
$ref: '#/components/responses/401'
400:
$ref: '#/components/responses/400'
500:
$ref: '#/components/responses/500'
"""
try:
body = guest_token_create_schema.load(request.json)
self.appbuilder.sm.validate_guest_token_resources(body["resources"])
# todo validate stuff:
# make sure username doesn't reference an existing user
# check rls rules for validity?
token = self.appbuilder.sm.create_guest_access_token(
body["user"], body["resources"], body["rls"]
)
return self.response(200, token=token)
except EmbeddedDashboardNotFoundError as error:
return self.response_400(message=error.message)
except ValidationError as error:
return self.response_400(message=error.messages)

But looks like it would need to be regenerated whenever a new dashboard is added

def validate_guest_token_resources(resources: GuestTokenResources) -> None:
# pylint: disable=import-outside-toplevel
from superset.embedded.dao import EmbeddedDAO
from superset.embedded_dashboard.commands.exceptions import (
EmbeddedDashboardNotFoundError,
)
from superset.models.dashboard import Dashboard
for resource in resources:
if resource["type"] == GuestTokenResourceType.DASHBOARD.value:
# TODO (embedded): remove this check once uuids are rolled out
dashboard = Dashboard.get(str(resource["id"]))
if not dashboard:
embedded = EmbeddedDAO.find_by_id(str(resource["id"]))
if not embedded:
raise EmbeddedDashboardNotFoundError()

@betodealmeida
Copy link
Member

Ah, you're right, I don't think there's an easy way to customize the celery workers to pass a custom token in the request. Let me take another look at this, it's been a couple years.

@ensky
Copy link
Contributor

ensky commented Jun 15, 2022

@ajwhite @betodealmeida I sent a PR to address this issue, which is working in my environment.

ensky pushed a commit to ensky/superset that referenced this issue Jun 15, 2022
eschutho pushed a commit that referenced this issue Jun 15, 2022
The cache warming strategies perform unauthorized `GET` requests that are redirected to the login page #9597
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
!deprecated-label:bug Deprecated label - Use #bug instead .pinned Draws attention
Projects
None yet
Development

Successfully merging a pull request may close this issue.