-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable trigger logging in webserver #27758
Conversation
nice |
d8911a8
to
20b4ce2
Compare
bf660ba
to
782c5a0
Compare
6e929a8
to
f6ef57a
Compare
3450f65
to
8469e8d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll need to re-read the configure_trigger_log_handler
a couple of times to grok it I think :lolsob:
975b75e
to
2921f37
Compare
yeah it's def a bit much i have just pushed a change trying to make it a bit more readable |
7534d8c
to
3b40361
Compare
This reverts commit 611ae28.
thanks @uranusjr @potiuk @ashb @bbovenzi @jedcunningham |
This reverts commit 1b18a50.
…ache#29472)" This reverts commit 60d4bcd.
With the 5.0.0 release of the Microsoft Azure PR, the PR apache/airflow#27758 introduced a breaking change where while creating new connections, the ``extra__`` prefix is no longer set for extra fields in the conneciton. This issue was not identified with testing the 5.0.0 RC because, it only happens for new connections that are created. The existing connections still contain the extra fiels with the ``extra__`` prefix. Hence, the existing code which looks for the connection field with the prefix ``extra__azure_data_factory__subscriptionId`` works on the older deployment with the new provider release as the connection was created before the release, but it fails on new deployments when a fresh connection is created. To fix this, we're removing the prefix while retrieving the connection field and at the same time we're supporting previously created connections by using the same ``get_field`` method from Airflow OSS introduced in the same PR above to allow backward compatibility.
Implemented for all handlers except alibaba.
Different handlers may be implemented in slightly different ways depending on their characteristics.
Blob storage handlers work by writing to file and then uploading when task is complete. For these handlers, each trigger writes to its own file and at trigger end the file is uploaded with a suffix that distinguishes it from the task log file.
Blob storage handlers include the following:
For file-based handlers, we have to do two things to make this work with triggers.
The first is, we don't emit to them synchronously. We pipe the messages through a queue to a QueueListener that runs in a thread and emits them to the potentially-blocking handler.
The second is, we need to create a distinct instance of the task handler for each trigger because each instance corresponds to a specific file (and each trigger needs to write to a distinct file). To accomplish this we add a wrapper handler that manages the individual handlers and routes the messages accordingly.
The other category of handlers is a type of "streaming" handler, where messages are pushed through an API to a remote logging service more or less as they are emitted. This category includes the following handlers:
Each of the streaming handlers has slightly different characteristics.
Stackdriver essentially has fully native support for triggers. It doesn't require wrapping or routing messages through queuelistener. This is because for one it already runs in a thread and routes messages through a queue internally so as not to be blocking. Additionally it already attached the necessary labels to the record in each call of
emit
so updathing it to generate those based on the task instance attached to the LogRecord was a trivial matter.Cloudwatch could be made "native" like stackdriver but the underlying library (watchtower, a 3rd party community library) doesn't quite have the necessary behavior. So for now it still requires using the wrapper and queuelistener.
Elasticsearch is a bit of a hybrid. By default it is filebased and assumes you have a something like filebeat to ship the files. In that way it behaves the same as blob storage handlers. But it has an optional config where messages go to stdout. It it is possible that it could be modifed such that when run in stdout mode it could be "native" like stackdriver where there's just one instance of the handler but such work is not taken on here.
Try it with sample dag:
Example:
screen-recording-tigger-logs.mov