-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade watchtower to 3.0.1 (#25019) #34747
Conversation
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
The config option should be prefixed with |
airflow/config_templates/config.yml
Outdated
json_serializer: | ||
description: | | ||
By default, for non-string logged messages all non-json-parsable objects are logged as `null` except | ||
`datetime` objects which are ISO formatted. Users can optionally provide their own JSON serializer or | ||
opt to use a `repr` serializer which calls `repr(object)` for any non-JSON-serializable objects in the | ||
logged message. The `airflow.providers.amazon.aws.log.cloudwatch_task_handler.json_serialize` uses | ||
`repr` while `airflow.providers.amazon.aws.log.cloudwatch_task_handler.json_serialize_legacy` uses | ||
`null`. If a custom serializer is provide, it must adhear to `Callable[[Any], str]` | ||
(`def my_serializer(o: Any) -> str`). Be aware, that if opting in to using the `repr` serializer, you | ||
should take extra care that no new, sensitive, data is logged (e.g. credentials). If creating your own | ||
json-serializer take special care to fail gracefully, without throwing. | ||
type: string | ||
version_added: 2.7.1 | ||
example: airflow.providers.amazon.aws.log.cloudwatch_task_handler.json_serialize | ||
default: airflow.providers.amazon.aws.log.cloudwatch_task_handler.json_serialize_legacy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be core configuration or provider configuration?
we now have the option to define configurations as part of the amazon provider
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, it should be part of Amazon provider package config to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is only being used by the cloudwatch logger then yeah, Provider config sounds right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would also a good idea move all Amazon specific loggers configs from core to Provider (as separate PR), I'm just not sure is we have any mechanism to deprecate config in core and move in provider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved this new config to the aws
provider.
airflow/config_templates/config.yml
Outdated
json_serializer: | ||
description: | | ||
By default, for non-string logged messages all non-json-parsable objects are logged as `null` except | ||
`datetime` objects which are ISO formatted. Users can optionally provide their own JSON serializer or | ||
opt to use a `repr` serializer which calls `repr(object)` for any non-JSON-serializable objects in the | ||
logged message. The `airflow.providers.amazon.aws.log.cloudwatch_task_handler.json_serialize` uses | ||
`repr` while `airflow.providers.amazon.aws.log.cloudwatch_task_handler.json_serialize_legacy` uses | ||
`null`. If a custom serializer is provide, it must adhear to `Callable[[Any], str]` | ||
(`def my_serializer(o: Any) -> str`). Be aware, that if opting in to using the `repr` serializer, you | ||
should take extra care that no new, sensitive, data is logged (e.g. credentials). If creating your own | ||
json-serializer take special care to fail gracefully, without throwing. | ||
type: string | ||
version_added: 2.7.1 | ||
example: airflow.providers.amazon.aws.log.cloudwatch_task_handler.json_serialize | ||
default: airflow.providers.amazon.aws.log.cloudwatch_task_handler.json_serialize_legacy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, it should be part of Amazon provider package config to me
Thank you everyone for your input. I've addressed your comments. P.S. |
@@ -33,22 +33,33 @@ | |||
from airflow.models import TaskInstance | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bolkedebruin - Does the following method step on your serializing PR any?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think so, but I am not familiar with this code.
qouting @o-nikolas comment from the issue
@vincbeck @ferruzzi just to confirm that your review take it into account |
Good point! @cBiscuitSurprise Could you check that and maybe some sample of logs before and after the change? From the code it is impossible to say |
The breaking change is that in We're preserving the "legacy" behavior by default, so no new information will be logged. Customers can "opt-in" to the new behavior by setting the configuration I've added tests to illustrate the difference (the first case is the default, legacy, behavior and the second case is the opt-in new watchtower behavior): airflow/tests/providers/amazon/aws/log/test_cloudwatch_task_handler.py Lines 179 to 183 in d03759e
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved pending confirmation from @bolkedebruin that this isn't going to conflict with his changes in #34683
Failed CI test is helm failing in the KubernetesPodOperator unit test... I can't imagine how this PR would trigger that so I've re-run it, it may just be a bit of unrelated flake. |
Yes this test has been failing intermittently a lot these last days |
Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions. |
we are not participating in hacktoberfest this year but if this helps you I'm happy to accommodate the request. |
Nice work @cBiscuitSurprise, I'm glad to see this one finally closed out. Smart idea to make the new behaviour configurable and off by default, it limits the blast radius of doing this upgrade! Thanks for the contribution. |
This PR upgrades
watchtower
from2.0.1
to3.0.1
. A new config item is introduced to allow customer to opt-in to the "new" serialization format.Watchtower functionality
Watchtower introduced a change whereby they use
repr
for any non-serializable objects in place of what was justnull
.Source
Was
{"datetime": "2023-01-01T00:00:00+00:00", "customObject": null}
Now
{"datetime": "2023-01-01T00:00:00+00:00", "customObject": "SomeCustomSerializationProvidedByRepr(...)"}
Airflow functionality
The default behavior for
airflow
will be to maintain thenull
serialization, but allow the option to use the new style (or provide your own).The new config is
logging.json_serializer
which is an import path (string). The import should be a callable taking an object and returning a string.closes: #25019
@ferruzzi