Enable trigger logging in webserver #27758

dstandish · 2022-11-17T22:56:41Z

Implemented for all handlers except alibaba.

Different handlers may be implemented in slightly different ways depending on their characteristics.

Blob storage handlers work by writing to file and then uploading when task is complete. For these handlers, each trigger writes to its own file and at trigger end the file is uploaded with a suffix that distinguishes it from the task log file.

Blob storage handlers include the following:

GCS
S3
WASB
Alibaba (not implemented due to missing functionality in hook)

For file-based handlers, we have to do two things to make this work with triggers.

The first is, we don't emit to them synchronously. We pipe the messages through a queue to a QueueListener that runs in a thread and emits them to the potentially-blocking handler.

The second is, we need to create a distinct instance of the task handler for each trigger because each instance corresponds to a specific file (and each trigger needs to write to a distinct file). To accomplish this we add a wrapper handler that manages the individual handlers and routes the messages accordingly.

The other category of handlers is a type of "streaming" handler, where messages are pushed through an API to a remote logging service more or less as they are emitted. This category includes the following handlers:

Cloudwatch
Elasticsearch
Stackdriver

Each of the streaming handlers has slightly different characteristics.

Stackdriver essentially has fully native support for triggers. It doesn't require wrapping or routing messages through queuelistener. This is because for one it already runs in a thread and routes messages through a queue internally so as not to be blocking. Additionally it already attached the necessary labels to the record in each call of emit so updathing it to generate those based on the task instance attached to the LogRecord was a trivial matter.

Cloudwatch could be made "native" like stackdriver but the underlying library (watchtower, a 3rd party community library) doesn't quite have the necessary behavior. So for now it still requires using the wrapper and queuelistener.

Elasticsearch is a bit of a hybrid. By default it is filebased and assumes you have a something like filebeat to ship the files. In that way it behaves the same as blob storage handlers. But it has an optional config where messages go to stdout. It it is possible that it could be modifed such that when run in stdout mode it could be "native" like stackdriver where there's just one instance of the handler but such work is not taken on here.

Try it with sample dag:

with DAG(
    dag_id="example_time_delta_sensor_async",
    schedule=None,
    start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
    catchup=False,
    tags=["example"],
):
    wait = TimeDeltaSensorAsync(task_id="wait", delta=datetime.timedelta(seconds=30))

Example:

screen-recording-tigger-logs.mov

potiuk · 2022-11-18T01:25:11Z

nice

airflow/jobs/triggerer_job.py

airflow/utils/log/file_task_handler.py

airflow/utils/log/logging_mixin.py

airflow/utils/log/trigger_handler.py

airflow/utils/serve_logs.py

ashb

I'll need to re-read the configure_trigger_log_handler a couple of times to grok it I think :lolsob:

dstandish · 2022-12-22T09:31:11Z

I'll need to re-read the configure_trigger_log_handler a couple of times to grok it I think :lolsob:

yeah it's def a bit much

i have just pushed a change trying to make it a bit more readable

This reverts commit 611ae28.

dstandish · 2023-02-04T06:15:31Z

thanks @uranusjr @potiuk @ashb @bbovenzi @jedcunningham

This reverts commit 1b18a50.

…ache#29472)" This reverts commit 60d4bcd.

Restores trigger logging PR #27758 which was reverted (#29472) due to CI problems. See commit 5e8aed9 for the fix. We just mock _serve_logs so log server isn't started during the test. This reverts commit 60d4bcd.

With the 5.0.0 release of the Microsoft Azure PR, the PR apache/airflow#27758 introduced a breaking change where while creating new connections, the ``extra__`` prefix is no longer set for extra fields in the conneciton. This issue was not identified with testing the 5.0.0 RC because, it only happens for new connections that are created. The existing connections still contain the extra fiels with the ``extra__`` prefix. Hence, the existing code which looks for the connection field with the prefix ``extra__azure_data_factory__subscriptionId`` works on the older deployment with the new provider release as the connection was created before the release, but it fails on new deployments when a fresh connection is created. To fix this, we're removing the prefix while retrieving the connection field and at the same time we're supporting previously created connections by using the same ``get_field`` method from Airflow OSS introduced in the same PR above to allow backward compatibility.

boring-cyborg bot added area:API Airflow's REST/HTTP API area:CLI area:logging area:Scheduler including HA (high availability) scheduler labels Nov 17, 2022

dstandish force-pushed the enable-trigger-logging branch from d8911a8 to 20b4ce2 Compare November 21, 2022 08:39

dstandish force-pushed the enable-trigger-logging branch from bf660ba to 782c5a0 Compare November 30, 2022 06:36

dstandish force-pushed the enable-trigger-logging branch 5 times, most recently from 6e929a8 to f6ef57a Compare December 12, 2022 22:45

dstandish changed the title ~~WIP - Enable trigger logging in webserver -- proof of concept~~ WIP - Enable trigger logging in webserver Dec 15, 2022

dstandish force-pushed the enable-trigger-logging branch 2 times, most recently from 3450f65 to 8469e8d Compare December 17, 2022 08:31