diff --git a/docs/docs/installation/alerts-reports.mdx b/docs/docs/installation/alerts-reports.mdx index 580fb9965a540..09f680e6e7ab9 100644 --- a/docs/docs/installation/alerts-reports.mdx +++ b/docs/docs/installation/alerts-reports.mdx @@ -7,7 +7,7 @@ version: 2 ## Alerts and Reports -(version 1.0.1 and above) +*This covers versions 1.0.1 to current.* Users can configure automated alerts and reports to send dashboards or charts to an email recipient or Slack channel. @@ -20,21 +20,28 @@ Alerts and reports are disabled by default. To turn them on, you need to do some #### Commons -##### In your `superset_config.py` +##### In your `superset_config.py` or `superset_config_docker.py` - `"ALERT_REPORTS"` [feature flag](https://superset.apache.org/docs/installation/configuring-superset#feature-flags) must be turned to True. -- `CELERYBEAT_SCHEDULE` in CeleryConfig must contain schedule for `reports.scheduler`. +- `beat_schedule` in CeleryConfig must contain schedule for `reports.scheduler`. - At least one of those must be configured, depending on what you want to use: - emails: `SMTP_*` settings - Slack messages: `SLACK_API_TOKEN` +###### Disable dry-run mode + +Screenshots will be taken but no messages actually sent as long as `ALERT_REPORTS_NOTIFICATION_DRY_RUN = True`, its default value in `config.py`. To disable dry-run mode and start receiving email/Slack notifications, set `ALERT_REPORTS_NOTIFICATION_DRY_RUN` to `False` in [superset config](https://github.com/apache/superset/blob/master/docker/pythonpath_dev/superset_config.py). + ##### In your `Dockerfile` - You must install a headless browser, for taking screenshots of the charts and dashboards. Only Firefox and Chrome are currently supported. > If you choose Chrome, you must also change the value of `WEBDRIVER_TYPE` to `"chrome"` in your `superset_config.py`. -Note : All the components required (headless browser, redis, postgres db, celery worker and celery beat) are present in the docker image if you are following [Installing Superset Locally](https://superset.apache.org/docs/installation/installing-superset-using-docker-compose/). -All you need to do is add the required config (See `Detailed Config`). Set `ALERT_REPORTS_NOTIFICATION_DRY_RUN` to `False` in [superset config](https://github.com/apache/superset/blob/master/docker/pythonpath_dev/superset_config.py) to disable dry-run mode and start receiving email/slack notifications. +Note: All the components required (Firefox headless browser, Redis, Postgres db, celery worker and celery beat) are present in the *dev* docker image if you are following [Installing Superset Locally](https://superset.apache.org/docs/installation/installing-superset-using-docker-compose/). +All you need to do is add the required config variables described in this guide (See `Detailed Config`). + +If you are running a non-dev docker image, e.g., a stable release like `apache/superset:2.0.1`, that image does not include a headless browser. Only the `superset_worker` container needs this headless browser to browse to the target chart or dashboard. +You can either install and configure the headless browser - see "Custom Dockerfile" section below - or when deploying via `docker-compose`, modify your `docker-compose.yml` file to use a dev image for the worker container and a stable release image for the `superset_app` container. #### Slack integration @@ -52,21 +59,23 @@ To send alerts and reports to Slack channels, you need to create a new Slack App 6. The app should now be installed in your workspace, and a "Bot User OAuth Access Token" should have been created. Copy that token in the `SLACK_API_TOKEN` variable of your `superset_config.py`. 7. Restart the service (or run `superset init`) to pull in the new configuration. -Note: when you configure an alert or a report, the Slack channel list take channel names without the leading '#' e.g. use `alerts` instead of `#alerts`. +Note: when you configure an alert or a report, the Slack channel list takes channel names without the leading '#' e.g. use `alerts` instead of `#alerts`. -#### Kubernetes specific +#### Kubernetes-specific - You must have a `celery beat` pod running. If you're using the chart included in the GitHub repository under [helm/superset](https://github.com/apache/superset/tree/master/helm/superset), you need to put `supersetCeleryBeat.enabled = true` in your values override. - You can see the dedicated docs about [Kubernetes installation](/docs/installation/running-on-kubernetes) for more generic details. #### Docker-compose specific -##### You must have in your`docker-compose.yaml` +##### You must have in your `docker-compose.yml` -- a redis message broker +- A Redis message broker - PostgreSQL DB instead of SQLlite -- one or more `celery worker` -- a single `celery beat` +- One or more `celery worker` +- A single `celery beat` + +This process also works in a Docker swarm environment, you would just need to add `Deploy:` to the Superset, Redis and Postgres services along with your specific configs for your swarm. ### Detailed config @@ -76,7 +85,11 @@ You can find documentation about each field in the default `config.py` in the Gi You need to replace default values with your custom Redis, Slack and/or SMTP config. -In the `CeleryConfig`, only the `CELERYBEAT_SCHEDULE` is relative to this feature, the rest of the `CeleryConfig` can be changed for your needs. +Superset uses Celery beat and Celery worker(s) to send alerts and reports. +- The beat is the scheduler that tells the worker when to perform its tasks. This schedule is defined when you create the alert or report. +- The worker will process the tasks that need to be performed when an alert or report is fired. + +In the `CeleryConfig`, only the `beat_schedule` is relevant to this feature, the rest of the `CeleryConfig` can be changed for your needs. ```python from celery.schedules import crontab @@ -124,14 +137,15 @@ SCREENSHOT_LOAD_WAIT = 600 SLACK_API_TOKEN = "xoxb-" # Email configuration -SMTP_HOST = "smtp.sendgrid.net" #change to your host +SMTP_HOST = "smtp.sendgrid.net" # change to your host +SMTP_PORT = 2525 # your port, e.g. 587 SMTP_STARTTLS = True SMTP_SSL_SERVER_AUTH = True # If your using an SMTP server with a valid certificate SMTP_SSL = False -SMTP_USER = "your_user" -SMTP_PORT = 2525 # your port eg. 587 -SMTP_PASSWORD = "your_password" +SMTP_USER = "your_user" # use the empty string "" if using an unauthenticated SMTP server +SMTP_PASSWORD = "your_password" # use the empty string "" if using an unauthenticated SMTP server SMTP_MAIL_FROM = "noreply@youremail.com" +EMAIL_REPORTS_SUBJECT_PREFIX = "[Superset] " # optional - overwrites default value in config.py of "[Report] " # WebDriver configuration # If you use Firefox, you can stick with default values @@ -149,19 +163,70 @@ WEBDRIVER_OPTION_ARGS = [ ] # This is for internal use, you can keep http -WEBDRIVER_BASEURL="http://superset:8088" -# This is the link sent to the recipient, change to your domain eg. https://superset.mydomain.com -WEBDRIVER_BASEURL_USER_FRIENDLY="http://localhost:8088" +WEBDRIVER_BASEURL = "http://superset:8088" +# This is the link sent to the recipient. Change to your domain, e.g. https://superset.mydomain.com +WEBDRIVER_BASEURL_USER_FRIENDLY = "http://localhost:8088" +``` + +You also need +to specify on behalf of which username to render the dashboards. In general dashboards and charts +are not accessible to unauthorized requests, that is why the worker needs to take over credentials +of an existing user to take a snapshot. + +By default, Alerts and Reports are executed as the user that the `THUMBNAIL_SELENIUM_USER` config +parameter is set to. To change this user, just change the config as follows: + +```python +THUMBNAIL_SELENIUM_USER = 'username_with_permission_to_access_dashboards' +``` + +In addition, it's also possible to execute the reports as the report owners/creators. This is typically +needed if there isn't a central service account that has access to all objects or databases (e.g. +when using user impersonation on database connections). For this there's the config flag +`ALERTS_REPORTS_EXECUTE_AS` which makes it possible to customize how alerts and reports are executed. +To first try to execute as the creator in the owners list (if present), then fall +back to the creator, then the last modifier in the owners list (if present), then the +last modifier, then an owner (giving priority to the last modifier and then the +creator if either is contained within the list of owners, otherwise the first owner +will be used) and finally `THUMBNAIL_SELENIUM_USER`, set as follows: + +```python +from superset.reports.types import ReportScheduleExecutor + +ALERT_REPORTS_EXECUTE_AS = [ + ReportScheduleExecutor.CREATOR_OWNER, + ReportScheduleExecutor.CREATOR, + ReportScheduleExecutor.MODIFIER_OWNER, + ReportScheduleExecutor.MODIFIER, + ReportScheduleExecutor.OWNER, + ReportScheduleExecutor.SELENIUM, +] ``` + +**Important notes** + +- Be mindful of the concurrency setting for celery (using `-c 4`). Selenium/webdriver instances can + consume a lot of CPU / memory on your servers. +- In some cases, if you notice a lot of leaked geckodriver processes, try running your celery + processes with `celery worker --pool=prefork --max-tasks-per-child=128 ...` +- It is recommended to run separate workers for the `sql_lab` and `email_reports` tasks. This can be + done using the `queue` field in `task_annotations`. +- Adjust `WEBDRIVER_BASEURL` in your configuration file if celery workers can’t access Superset via + its default value of `http://0.0.0.0:8080/`. + + ### Custom Dockerfile -A webdriver (and headless browser) is needed to capture screenshots of the charts and dashboards which are then sent to the recipient. As the base superset image does not have a webdriver installed, we need to extend it and install the webdriver. +If you're running the dev version of a released Superset image, like `apache/superset:2.0.1-dev`, you should be set with the above. + +But if you're building your own image, or starting with a non-dev version, a webdriver (and headless browser) is needed to capture screenshots of the charts and dashboards which are then sent to the recipient. +Here's how you can modify your Dockerfile to take the screenshots either with Firefox or Chrome. #### Using Firefox ```docker -FROM apache/superset:1.0.1 +FROM apache/superset:2.0.1 USER root @@ -182,7 +247,7 @@ USER superset #### Using Chrome ```docker -FROM apache/superset:1.0.1 +FROM apache/superset:2.0.1 USER root @@ -202,215 +267,7 @@ RUN pip install --no-cache gevent psycopg2 redis USER superset ``` -> Don't forget to set `WEBDRIVER_TYPE` and `WEBDRIVER_OPTION_ARGS` in your config if you use Chrome. - -### Summary of steps to turn on alerts and reporting: - -Using the templates below, - -1. Create a new directory and create the Dockerfile -2. Build the extended image using the Dockerfile -3. Create the `docker-compose.yaml` file in the same directory -4. Create a new subdirectory called `config` -5. Create the `superset_config.py` file in the `config` subdirectory -6. Run the image using `docker-compose up` in the same directory as the `docker-compose.py` file -7. In a new terminal window, upgrade the DB by running `docker exec -it superset-1.0.1-extended superset db upgrade` -8. Then run `docker exec -it superset-1.0.1-extended superset init` -9. Then setup your admin user if need be, `docker exec -it superset-1.0.1-extended superset fab create-admin` -10. Finally, restart the running instance - `CTRL-C`, then `docker-compose up` - -(note: v 1.0.1 is current at time of writing, you can change the version number to the latest version if a newer version is available) - -### Docker compose - -The docker compose file lists the services that will be used when running the image. The specific services needed for alerts and reporting are outlined below. - -#### Redis message broker - -To ferry requests between the celery worker and the Superset instance, we use a message broker. This template uses Redis. - -#### Replacing SQLite with Postgres - -While it might be possible to use SQLite for alerts and reporting, it is highly recommended using a more production ready DB for Superset in general. Our template uses Postgres. - -#### Celery worker - -The worker will process the tasks that need to be performed when an alert or report is fired. - -#### Celery beat - -The beat is the scheduler that tells the worker when to perform its tasks. This schedule is defined when you create the alert or report. - -#### Full `docker-compose.yaml` configuration - -The Redis, Postgres, Celery worker and Celery beat services are defined in the template: - -Config for `docker-compose.yaml`: - -```docker -version: '3.6' -services: - redis: - image: redis:6.0.9-buster - restart: on-failure - volumes: - - redis:/data - postgres: - image: postgres - restart: on-failure - environment: - POSTGRES_DB: superset - POSTGRES_PASSWORD: superset - POSTGRES_USER: superset - volumes: - - db:/var/lib/postgresql/data - worker: - image: superset-1.0.1-extended - restart: on-failure - healthcheck: - disable: true - depends_on: - - superset - - postgres - - redis - command: "celery --app=superset.tasks.celery_app:app worker --pool=gevent --concurrency=500" - volumes: - - ./config/:/app/pythonpath/ - beat: - image: superset-1.0.1-extended - restart: on-failure - healthcheck: - disable: true - depends_on: - - superset - - postgres - - redis - command: "celery --app=superset.tasks.celery_app:app beat --pidfile /tmp/celerybeat.pid --schedule /tmp/celerybeat-schedule" - volumes: - - ./config/:/app/pythonpath/ - superset: - image: superset-1.0.1-extended - restart: on-failure - environment: - - SUPERSET_PORT=8088 - ports: - - "8088:8088" - depends_on: - - postgres - - redis - command: gunicorn --bind 0.0.0.0:8088 --access-logfile - --error-logfile - --workers 5 --worker-class gthread --threads 4 --timeout 200 --limit-request-line 4094 --limit-request-field_size 8190 superset.app:create_app() - volumes: - - ./config/:/app/pythonpath/ -volumes: - db: - external: true - redis: - external: false -``` - -### Summary - -With the extended image created by using the `Dockerfile`, and then running that image using `docker-compose.yaml`, plus the required configurations in the `superset_config.py` you should now have alerts and reporting working correctly. - -- The above templates also work in a Docker swarm environment, you would just need to add `Deploy:` to the Superset, Redis and Postgres services along with your specific configs for your swarm - -# Old Reports feature - -## Scheduling and Emailing Reports - -(version 0.38 and below) - -### Email Reports - -Email reports allow users to schedule email reports for: - -- chart and dashboard visualization (attachment or inline) -- chart data (CSV attachment on inline table) - -Enable email reports in your `superset_config.py` file: - -```python -ENABLE_SCHEDULED_EMAIL_REPORTS = True -``` - -This flag enables some permissions that are stored in your database, so you'll want to run `superset init` again if you are running this in a dev environment. -Now you will find two new items in the navigation bar that allow you to schedule email reports: - -- **Manage > Dashboard Emails** -- **Manage > Chart Email Schedules** - -Schedules are defined in [crontab format](https://crontab.guru/) and each schedule can have a list -of recipients (all of them can receive a single mail, or separate mails). For audit purposes, all -outgoing mails can have a mandatory BCC. - -In order get picked up you need to configure a celery worker and a celery beat (see section above -“Celery Tasks”). Your celery configuration also needs an entry `email_reports.schedule_hourly` for -`CELERYBEAT_SCHEDULE`. - -To send emails you need to configure SMTP settings in your `superset_config.py` configuration file. - -```python -EMAIL_NOTIFICATIONS = True - -SMTP_HOST = "email-smtp.eu-west-1.amazonaws.com" -SMTP_STARTTLS = True -SMTP_SSL = False -SMTP_USER = "smtp_username" -SMTP_PORT = 25 -SMTP_PASSWORD = os.environ.get("SMTP_PASSWORD") -SMTP_MAIL_FROM = "insights@komoot.com" -``` - -To render dashboards you need to install a local browser on your Superset instance: - -- [geckodriver](https://github.com/mozilla/geckodriver) for Firefox -- [chromedriver](http://chromedriver.chromium.org/) for Chrome - -You'll need to adjust the `WEBDRIVER_TYPE` accordingly in your configuration. You also need -to specify on behalf of which username to render the dashboards. In general dashboards and charts -are not accessible to unauthorized requests, that is why the worker needs to take over credentials -of an existing user to take a snapshot. - -By default, Alerts and Reports are executed as the user that the `THUMBNAIL_SELENIUM_USER` config -parameter is set to. To change this user, just change the config as follows: - -```python -THUMBNAIL_SELENIUM_USER = 'username_with_permission_to_access_dashboards' -``` - -In addition, it's also possible to execute the reports as the report owners/creators. This is typically -needed if there isn't a central service account that has access to all objects or databases (e.g. -when using user impersonation on database connections). For this there's the config flag -`ALERTS_REPORTS_EXECUTE_AS` which makes it possible to customize how alerts and reports are executed. -To first try to execute as the creator in the owners list (if present), then fall -back to the creator, then the last modifier in the owners list (if present), then the -last modifier, then an owner (giving priority to the last modifier and then the -creator if either is contained within the list of owners, otherwise the first owner -will be used) and finally `THUMBNAIL_SELENIUM_USER`, set as follows: - -```python -from superset.reports.types import ReportScheduleExecutor - -ALERT_REPORTS_EXECUTE_AS = [ - ReportScheduleExecutor.CREATOR_OWNER, - ReportScheduleExecutor.CREATOR, - ReportScheduleExecutor.MODIFIER_OWNER, - ReportScheduleExecutor.MODIFIER, - ReportScheduleExecutor.OWNER, - ReportScheduleExecutor.SELENIUM, -] -``` - -**Important notes** - -- Be mindful of the concurrency setting for celery (using `-c 4`). Selenium/webdriver instances can - consume a lot of CPU / memory on your servers. -- In some cases, if you notice a lot of leaked geckodriver processes, try running your celery - processes with `celery worker --pool=prefork --max-tasks-per-child=128 ...` -- It is recommended to run separate workers for the `sql_lab` and `email_reports` tasks. This can be - done using the `queue` field in `task_annotations`. -- Adjust `WEBDRIVER_BASEURL` in your configuration file if celery workers can’t access Superset via - its default value of `http://0.0.0.0:8080/`. +Don't forget to set `WEBDRIVER_TYPE` and `WEBDRIVER_OPTION_ARGS` in your config if you use Chrome. ### Schedule Reports