Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove tasks annotation in AppflowHook #33881

Merged
merged 1 commit into from
Aug 29, 2023

Conversation

Taragolis
Copy link
Contributor

@Taragolis Taragolis commented Aug 29, 2023

Type hints + Static Checkers + boto3 = Pain
#33822 (comment)

In latest mypy-boto3-appflow (1.28.29) version tasks have type list[TaskTypeDef] and TaskOutputTypeDef not exists anymore

image

Even if we change it and it pass CI static checks, it still won't pass in 1.28.16 (breeze ci-image build --python 3.8), for avoid fighting with mypy, just remove annotations related mypy_boto3_appflow.type_defs

P.S.: I'm not sure that we should have annotate something more than specific boto3 clients


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@Taragolis Taragolis requested a review from potiuk August 29, 2023 12:11
@boring-cyborg boring-cyborg bot added area:providers provider:amazon-aws AWS/Amazon - related issues labels Aug 29, 2023
@Taragolis
Copy link
Contributor Author

cc: @potiuk @Lee-W

@Taragolis Taragolis added the type:misc/internal Changelog: Misc changes that should appear in change log label Aug 29, 2023
@Taragolis Taragolis changed the title Fix type annotation in AppflowHook Remove tasks annotation in AppflowHook Aug 29, 2023
from mypy_boto3_appflow.type_defs import TaskOutputTypeDef, TaskTypeDef

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI for someone who want to keep type-checking for boto3/botocore expected values.
For correct work we should have exactly same version of mypy-boto3-{service-name} and boto3

root@5e78bbc9c51e:/opt/airflow# pip list | grep boto
aiobotocore                              2.6.0
boto                                     2.49.0
boto3                                    1.28.17
botocore                                 1.31.17
mypy-boto3-appflow                       1.28.16
mypy-boto3-rds                           1.28.34
mypy-boto3-redshift-data                 1.28.16
mypy-boto3-s3                            1.28.27
types-boto                               2.49.18.9

Copy link
Contributor Author

@Taragolis Taragolis Aug 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fun fact, deprecated boto and types-boto we got from apache-airflow-providers-qubole and unmaintained qds-sdk 🤣

https://github.com/qubole/qds-sdk-py#where-are-the-maintainers-

Where are the maintainers ?
Qubole was acquired. All the maintainers of this repo have moved on. Some of the employees founded ClearFeed. Others are at big data teams in Microsoft, Amazon et al.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually (sort of) do. The packages get upgraded automatically as the boto3 - which means that --eager-upgrade of ours will update them automatically when they are released. We certainly do not (and cannot) pin and update those packages manually (unless someone volunteers for updating it every few days). And to add the complexity we are also limited by aiobotocore releases - which automatically pin boto3 to specific versions.

And it ain't so easy as having the same version. Currently (due to aiobotocore) our "base CI image" has

  • boto3==1.28.17 => we have not upgraded to latest 1.28.36 because aiobotocore holds us at 1.28.17 - we will upgrade automatically once aiobotocore releases newer version. Latest aiobotocore has:
Requires-Dist: boto3 <1.28.18,>=1.28.17 ; extra == 'boto3'
  • mypy-boto3-appflow==1.28.16 -> simply because there is no 1.28.17 and 1.28.36 (which is next version) apparently has been released today

  • the mypy-boto3-rds 1.28.36 has been released today as well and we will update to 1.28.36 likely (unless it has some requirements preventing from doing it).

same redshift-data

There is no consistency (for example some all the mypy libs had version betwen and some did not). And "eventually consistent" approach is I think the best we can do.

Of course we could try to somehow manyally manage it and put manual limits and bump them when we see we need to. But with 650 dependencies we have, that's far too much work - even if we would have a dependabot attempting to do it for us we often have 5-10 packages updated a day - so this automation we have to attempt to upgrade everything, let pypi figure out the latest "eager upgraded" set of dependencies that are not conflicting and passing the tests (only after they pass the tests they make it into latest constraints") is probably the best we can do.

At the expense of course that sometimes our main and PRs that change dependencies (which are few and far between) will get broken by unrelated dependency changes.

I have not figured a better way of doing it, but if somone has any ideas, I am all ears :).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point just keep annotations only for boto3.client in this case it is pretty safe to survive after changes, AFAIK mypy-boto3-{service-name} build in top of botocore JSON data definition, changes in methods are rare, rather than in methods types.

I have an idea in the past to make all AWS Hooks based on Generic, maybe it is even better than I decide do not implemented it. Also mypy-boto3-{service-name} pretty heavy, I don't remember exact numbers but I guess full size would be 100-200 MB

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you think it's just safe to remove mypy ones? can we do it without loosing anyting ? Can You do it :D ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requires-Dist: boto3 <1.28.18,>=1.28.17 ; extra == 'boto3'

Author of aiobotocore explain about about why it required, and it wouldn't be resolved until botocore have native asyncio support (9 years since request created an P3 priority)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you think it's just safe to remove mypy ones? can we do it without loosing anyting ? Can You do it :D ?

I would be better keep what we have now, and maybe move mypy-boto3-{service-name} packages into provider extra block, I could have a look what we should do for it.

@o-nikolas Do you think is it even possible to implements #11297 or maybe we should decide that in won't fixed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@o-nikolas Do you think is it even possible to implements #11297 or maybe we should decide that in won't fixed

So far I've seen mostly grief working with these types packages, there was some breakage a week or so back as well. I'd agree with resolving as wont fix, for now, we can always re-open later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be better keep what we have now, and maybe move mypy-boto3-{service-name} packages into provider extra block, I could have a look what we should do for it.

We could simply move them to devel-only extra. This is where we keep all things we want to have in CI image, but we do not want to make them direct dependencies of the packages. It's a bit disorganised and As I noticed yesterday we had forgottten "qubole" in there but let me just do it really quickly to split it into several dicts "per-provider" so that it will at least be organized.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First step of reorganizing #33907

Later we might move the provider one's to devel extras but this will require a bit more tool update when CI building so I wil leave it as next step. #33908

@potiuk potiuk merged commit 7d267fb into apache:main Aug 29, 2023
42 checks passed
@Taragolis Taragolis deleted the fix-appflow-static-checks branch August 29, 2023 12:41
@potiuk
Copy link
Member

potiuk commented Aug 29, 2023

Fun fact, deprecated boto and types-boto we got from apache-airflow-providers-qubole and unmaintained qds-sdk 🤣

https://github.com/qubole/qds-sdk-py#where-are-the-maintainers-

Where are the maintainers ?
Qubole was acquired. All the maintainers of this repo have moved on. Some of the employees founded ClearFeed. Others are at big data teams in Microsoft, Amazon et al.

Ok. Time to suspend Qubole then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:amazon-aws AWS/Amazon - related issues type:misc/internal Changelog: Misc changes that should appear in change log
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants