Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade watchtower to 3.0.1 (#25019) #34747

Merged
merged 2 commits into from
Oct 6, 2023
Merged

Conversation

cBiscuitSurprise
Copy link
Contributor


This PR upgrades watchtower from 2.0.1 to 3.0.1. A new config item is introduced to allow customer to opt-in to the "new" serialization format.

Watchtower functionality

Watchtower introduced a change whereby they use repr for any non-serializable objects in place of what was just null.

Source

{"datetime": datetime(2023,1,1), "customObject": SomeCustomObject(1, 2, 3)}

Was

{"datetime": "2023-01-01T00:00:00+00:00", "customObject": null}

Now

{"datetime": "2023-01-01T00:00:00+00:00", "customObject": "SomeCustomSerializationProvidedByRepr(...)"}

Airflow functionality

The default behavior for airflow will be to maintain the null serialization, but allow the option to use the new style (or provide your own).

The new config is logging.json_serializer which is an import path (string). The import should be a callable taking an object and returning a string.

closes: #25019

@ferruzzi

@boring-cyborg
Copy link

boring-cyborg bot commented Oct 4, 2023

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: [email protected]
    Slack: https://s.apache.org/airflow-slack

@uranusjr
Copy link
Member

uranusjr commented Oct 4, 2023

The config option should be prefixed with cloudwatch_task_handler since it only applies to it.

Comment on lines 931 to 945
json_serializer:
description: |
By default, for non-string logged messages all non-json-parsable objects are logged as `null` except
`datetime` objects which are ISO formatted. Users can optionally provide their own JSON serializer or
opt to use a `repr` serializer which calls `repr(object)` for any non-JSON-serializable objects in the
logged message. The `airflow.providers.amazon.aws.log.cloudwatch_task_handler.json_serialize` uses
`repr` while `airflow.providers.amazon.aws.log.cloudwatch_task_handler.json_serialize_legacy` uses
`null`. If a custom serializer is provide, it must adhear to `Callable[[Any], str]`
(`def my_serializer(o: Any) -> str`). Be aware, that if opting in to using the `repr` serializer, you
should take extra care that no new, sensitive, data is logged (e.g. credentials). If creating your own
json-serializer take special care to fail gracefully, without throwing.
type: string
version_added: 2.7.1
example: airflow.providers.amazon.aws.log.cloudwatch_task_handler.json_serialize
default: airflow.providers.amazon.aws.log.cloudwatch_task_handler.json_serialize_legacy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be core configuration or provider configuration?
we now have the option to define configurations as part of the amazon provider

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, it should be part of Amazon provider package config to me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is only being used by the cloudwatch logger then yeah, Provider config sounds right.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would also a good idea move all Amazon specific loggers configs from core to Provider (as separate PR), I'm just not sure is we have any mechanism to deprecate config in core and move in provider.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved this new config to the aws provider.

@eladkal eladkal requested review from ferruzzi and vincbeck October 4, 2023 09:01
airflow/config_templates/config.yml Outdated Show resolved Hide resolved
Comment on lines 931 to 945
json_serializer:
description: |
By default, for non-string logged messages all non-json-parsable objects are logged as `null` except
`datetime` objects which are ISO formatted. Users can optionally provide their own JSON serializer or
opt to use a `repr` serializer which calls `repr(object)` for any non-JSON-serializable objects in the
logged message. The `airflow.providers.amazon.aws.log.cloudwatch_task_handler.json_serialize` uses
`repr` while `airflow.providers.amazon.aws.log.cloudwatch_task_handler.json_serialize_legacy` uses
`null`. If a custom serializer is provide, it must adhear to `Callable[[Any], str]`
(`def my_serializer(o: Any) -> str`). Be aware, that if opting in to using the `repr` serializer, you
should take extra care that no new, sensitive, data is logged (e.g. credentials). If creating your own
json-serializer take special care to fail gracefully, without throwing.
type: string
version_added: 2.7.1
example: airflow.providers.amazon.aws.log.cloudwatch_task_handler.json_serialize
default: airflow.providers.amazon.aws.log.cloudwatch_task_handler.json_serialize_legacy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, it should be part of Amazon provider package config to me

@cBiscuitSurprise
Copy link
Contributor Author

Thank you everyone for your input. I've addressed your comments.


P.S.
Would someone be able to add the hacktoberfest-accepted label to this PR?

@@ -33,22 +33,33 @@
from airflow.models import TaskInstance


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bolkedebruin - Does the following method step on your serializing PR any?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think so, but I am not familiar with this code.

@eladkal
Copy link
Contributor

eladkal commented Oct 4, 2023

qouting @o-nikolas comment from the issue

But note: This change will need a careful check of Airflow logs, since v3 of Watcher can now include much more information in the logs than previous

@vincbeck @ferruzzi just to confirm that your review take it into account

@vincbeck
Copy link
Contributor

vincbeck commented Oct 4, 2023

cBiscuitSurprise

Good point! @cBiscuitSurprise Could you check that and maybe some sample of logs before and after the change? From the code it is impossible to say

@cBiscuitSurprise
Copy link
Contributor Author

cBiscuitSurprise commented Oct 4, 2023

@vincbeck @eladkal

The breaking change is that in watchtower 3.0.1 json-serialized log messages now get serialized with repr instead of just returning None (if you log an object as opposed to a string). Depending on the implementation of an object's repr this could include more information than previously logged. The airflow tests did not have any instances of logging an object.

We're preserving the "legacy" behavior by default, so no new information will be logged. Customers can "opt-in" to the new behavior by setting the configuration cloudwatch_task_handler_json_serializer to "airflow.providers.amazon.aws.log.cloudwatch_task_handler.json_serialize" (as opposed to the default value of "airflow.providers.amazon.aws.log.cloudwatch_task_handler.json_serialize_legacy").

I've added tests to illustrate the difference (the first case is the default, legacy, behavior and the second case is the opt-in new watchtower behavior):

(None, '{"datetime": "2023-01-01T00:00:00+00:00", "customObject": null}'),
(
"airflow.providers.amazon.aws.log.cloudwatch_task_handler.json_serialize",
'{"datetime": "2023-01-01T00:00:00+00:00", "customObject": "SomeCustomSerialization(...)"}',
),

Copy link
Contributor

@ferruzzi ferruzzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved pending confirmation from @bolkedebruin that this isn't going to conflict with his changes in #34683

@ferruzzi
Copy link
Contributor

ferruzzi commented Oct 5, 2023

Failed CI test is helm failing in the KubernetesPodOperator unit test... I can't imagine how this PR would trigger that so I've re-run it, it may just be a bit of unrelated flake.

@vincbeck
Copy link
Contributor

vincbeck commented Oct 5, 2023

Failed CI test is helm failing in the KubernetesPodOperator unit test... I can't imagine how this PR would trigger that so I've re-run it, it may just be a bit of unrelated flake.

Yes this test has been failing intermittently a lot these last days

@vincbeck vincbeck merged commit c01abd1 into apache:main Oct 6, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented Oct 6, 2023

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

@eladkal
Copy link
Contributor

eladkal commented Oct 6, 2023

P.S.
Would someone be able to add the hacktoberfest-accepted label to this PR?

we are not participating in hacktoberfest this year but if this helps you I'm happy to accommodate the request.
let me know if still relevant.

@o-nikolas
Copy link
Contributor

Nice work @cBiscuitSurprise, I'm glad to see this one finally closed out. Smart idea to make the new behaviour configurable and off by default, it limits the blast radius of doing this upgrade! Thanks for the contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

update watchtower version in amazon provider
8 participants