add more accurate typing for DbApiHook.run method #31846

dwreeves · 2023-06-11T17:28:54Z

Overview

The DbApiHook.run method has incredibly complex typing, which is currently reduced to just the following:

    def run(
        self,
        sql: str | Iterable[str],
        autocommit: bool = False,
        parameters: Iterable | Mapping | None = None,
        handler: Callable | None = None,
        split_statements: bool = True,
        return_last: bool = True,
    ) -> Any | list[Any] | None:

This is incomplete typing for a few reasons:

Some of the bools in the kwargs determine the return type.
Whether the handler is None determines whether the return type is None.
The return type of the handler is generic and determines the return type of the run method.

Addressing the first bullet point would be a bit of work, and would add a lot of complicated typing to each provider hook.

But I figured addressing the 2nd and 3rd bullet points was easy and takes just 2 overloads. This alone helps vastly improve the typing of of the DbApiHook.run method with a minimal amount of added annotation complexity.

Misc.

I do not anticipate that this should cause any backwards compatibility concerns except for users who meet all the following criteria:
- (1) they are relying on the DbiApiHook directly, (2) they are using mypy or another type checker, and (3) they had type checking issues that were not being properly flagged before due to the less precise typing.
- This may also cause users' IDEs to get upset at them if they have a function that looks like this as a handler, even though it would technically work. This is probably fine to flag to users, even if it works, since the only reason it works is by ignoring the kwargs entirely.
```
# This works as a handler because `handler(cursor)` is a valid input, and `foo` is ignored.
def handler(cursor, foo=None):
```
I did not open an issue because the issues template for feature requests says:

Features should be small improvements that do not dramatically change Airflow assumptions. Note, that in this case you do not even need to create an issue if you have a code change ready to submit! You can open Pull Request immediately instead.

airflow/providers/common/sql/hooks/sql.py

…e-typing-for-dbapihook

uranusjr

Some thoughts and a question

uranusjr · 2023-06-12T18:21:34Z

airflow/providers/databricks/hooks/databricks_sql.py

+        sql: str | Iterable[str],
+        autocommit: bool = False,
+        parameters: Iterable | Mapping | None = None,
+        handler: Callable[[Any], T] = None,  # type: ignore[assignment]


Suggested change

handler: Callable[[Any], T] = None, # type: ignore[assignment]

handler: Callable[[Any], T] | None = None,

I don’t think handler is required for this signature…?

I'm confused and I'm curious what you mean.

For the Databricks hook's run, we still have in the method these lines of code:

if handler is None: return None

So, it seems it should still be overloaded. And the # type: ignore[assignment] is needed if we assign defaults to the overloads instead of ....

What am I missing?

So is it the other way around, for the T | list[T] variant, handler must be set to a callable right? I think in that case I’d perfer we simply remove the argument default from the overload signature altogether. Maybe also make it keyword-only (not technically perfect but practically that’s how the argument should be specified anyway).

So is it the other way around, for the T | list[T] variant, handler must be set to a callable right?

Yes.

I think in that case I’d perfer we simply remove the argument default from the overload signature altogether. Maybe also make it keyword-only (not technically perfect but practically that’s how the argument should be specified anyway).

The problem with doing that is Mypy doesn't allow for it. The mypy-providers returns 18 errors if I do that:

Found 18 errors in 16 files (checked 993 source files)

Here is what I replaced each run typing with (+ Snowflake one has extra kwarg).

@overload def run( self, sql: str | Iterable[str], *, autocommit: bool, parameters: Iterable | Mapping[str, Any] | None, handler: None, split_statements: bool, return_last: bool, ) -> None: ... @overload def run( self, sql: str | Iterable[str], *, autocommit: bool, parameters: Iterable | Mapping[str, Any] | None, handler: Callable[[Any], T], split_statements: bool, return_last: bool, ) -> T | list[T]: ... def run( self, sql: str | Iterable[str], autocommit: bool = False, parameters: Iterable | Mapping[str, Any] | None = None, handler: Callable[[Any], T] | None = None, split_statements: bool = False, return_last: bool = True, ) -> T | list[T] | None:

Similarly, Leaving as positional args (i.e. excluding the *) also returns 18 errors.

I believe the only way to avoid Mypy errors are the following 2 options:

To do what I did originally, and set = ... as the kwargs. Although I could be wrong, I do not believe this breaks any IDEs or static type checkers. And Mypy is cool with it, and it doesn't involve using # type: ignore.

Do what I have in the current PR.

Ah I see. PyCharm will use ... here for the overloaded type. I don't see that as wrong per se, but I suppose it depends on what user experience you want.

uranusjr · 2023-06-12T18:23:11Z

airflow/providers/databricks/hooks/databricks_sql.py

+        return_last: bool = True,
+    ) -> T | list[T]:


I think this can be further split into Literal[True] returning T and Literal[False] returning list[T]

Yes, you're correct. I decided initially that I wanted to take baby steps with this PR, but we can go all the way if you'd like!

Do whatever you feel comfortable

I'm gonna be real. I tried breaking out T and list[T] return types based on Literals but I couldn't get Mypy to agree with me that there were no conflicts. I will just leave that to someone else in the future who is more knowledgeable than I am about Mypy. 😅

airflow/providers/databricks/hooks/databricks_sql.py

…ethods

dwreeves · 2023-06-13T17:10:17Z

The scope of the PR increased a little bit here. Hope that isn't too bad.

I replaced all parameters: Iterable | Mapping | Nones with parameters: Iterable | Mapping[str, Any] | Nones.
I also type-annotated most (but not all-- some I wasn't sure whether they hooked into a SQLAlchemy connection) parameters kwargs that were un-annotated that I could find.
I removed two superfluous run() methods for DbApiHook subclasses. These were just doing return super().run(...) with no other changes. I could not find any reason for these methods not just default to the subclassed version. I am aware that provider package version conflicts can sometimes necessitate doing things like this, but that didn't seem to make sense here. (Let me know if I'm missing something.) The last major changes to the run() methods occur in this PR: df00436 and from what I can tell:
- The Trino hook used to be a lot of custom code that was replaced with return super().run(...).
- The Presto hook used to have a super().run() to warn users about a deprecated kwarg (hql) that no longer exists.

dwreeves · 2023-06-13T17:26:28Z

airflow/providers/exasol/hooks/exasol.py

    def run(
        self,
        sql: str | Iterable[str],
        autocommit: bool = False,
-        parameters: Iterable | Mapping | None = None,
-        handler: Callable | None = None,
+        parameters: Iterable | Mapping[str, Any] | None = None,


This is the one package where I'm unsure... 🤔 I'm actually thinking, the old annotation was incorrect and it never allowed for Iterable types:

Note that the original annotation was Iterable | Mapping | None. So, Airflow has said for a while that it allows for an iterable.

pyexasol isn't using a SQLAlchemy connection object, as far as I can tell.

When I look at pyexasol's documentation, and I look at the API https://github.com/exasol/pyexasol/blob/master/docs/REFERENCE.md I don't see any ability to take in a non-mapping iterable type as a valid input. The reason the other connections take in an Iterable is because they all wrap SQLAlchemy. But if you look carefully, you'll notice pyexasol is completely foregoing SQLAlchemy.

It's possible the safest thing to do here would be to not touch this file's parameters type annotations at all, and just leave as Iterable | Mapping | None for the run() method + don't touch anything else.

That said, we may as well not let this little bit of digging go to waste. Someone who is knowledgeable on pyexasol should confirm whether I am correct that the Iterable type should not be here.

Annotations in providers can be inaccurate since not all of them are very often used. I’d say we can just change the annotation and see if anyone complains.

Oh no, mypy yelled at me for violating the Liskov substitution principle when I swap to Mapping[str, Any] | None, of course... 🤦 Ahh. I'm just going to keep as-is. I don't want to mess around too much with this. I hope that is OK.

dwreeves · 2023-06-18T03:53:20Z

@uranusjr Coming back to / checking in on this.

I think the main outstanding thing with this PR is in regards of what to do with default arguments in overloads.

(1) Excluding the defaults entirely (something you suggested) causes MyPy to fail due to incompatible typing.
The current approach includes (2) overloaded defaults that match the actual defaults, as per your initial request. It requires adding some # type: ignores though.
Another approach that MyPy will pass is to use (3) ellipses in overloaded defaults. This does not require any # type: ignores.

So right now we're on 2, originally I started with 3, and the latest communication I received was to switch to 1 but this causes MyPy to fail without extensive use of # type: ignores.

Due to 1 not really working, I think we should stick with either 2 or 3. Let me know which of them you'd like.

An updated summary of everything else would be:

I'm not sure if I can add additional overloads. I could not figure out how to overload defaults with the Literal[True]s without MyPy describing conflicts. I imagine this is my fault for not understanding something. In any case, I think the currently proposed changes are a vast improvement over the status quo, and perhaps I or someone else can figure out the additional typing at a later date.
I also removed some superfluous .run() methods in a few provider packages. I did this to keep the code changes simpler overall, as my would have necessitated adding a ton of overloads to those files. Looking at the code and the code history of the provider package releases (for trino, presto, and common-sql), I do not foresee any problems with this change (also, the tests pass). But it's not literally zero risk either.

And that's about it.

Let me know what you want to do. 😄

uranusjr

I think this is about as good as we can get. This kind of one or many depending on flag kind of thing is by design difficult to type (pro tip: don’t design API like this).

uranusjr · 2023-06-19T06:55:19Z

(Forgot to respond to the comment)

default arguments in overloads

Using ... is probably the best we can do, and slightly better than ignore comments. Not perfect, but unless we explode the overloads (cross-product with and without argument variants) it’s the best we can do.

…e-typing-for-dbapihook

dwreeves · 2023-06-20T01:01:26Z

@uranusjr should be all set.

…reeves/airflow into improve-typing-for-dbapihook

dwreeves · 2023-07-03T23:40:47Z

Fixed merge conflicts and merged changes that have passing CI.

dstandish

i don't think i saw anything that would change behavior so looks ok to me

dwreeves requested review from eladkal, potiuk and mik-laj as code owners June 11, 2023 17:28

boring-cyborg bot added area:providers provider:common-sql provider:databricks provider:snowflake Issues related to Snowflake provider provider:trino labels Jun 11, 2023

add more accurate typing for DbApiHook.run method

985d29b

dwreeves force-pushed the improve-typing-for-dbapihook branch from 1331739 to 985d29b Compare June 11, 2023 17:47

uranusjr reviewed Jun 12, 2023

View reviewed changes

airflow/providers/common/sql/hooks/sql.py Show resolved Hide resolved

dwreeves added 2 commits June 12, 2023 13:18

Merge branch 'main' of github.aaakk.us.kg-dwreeves:apache/airflow into improv…

e7b535e

…e-typing-for-dbapihook

add real defaults to overloads

a20554a

dwreeves force-pushed the improve-typing-for-dbapihook branch from aa86823 to a20554a Compare June 12, 2023 18:16

uranusjr reviewed Jun 12, 2023

View reviewed changes

dwreeves and others added 2 commits June 13, 2023 12:41

change Mapping to Mapping[str, Any] and remove unnecessary subclass m…

ea77f2a

…ethods

Merge branch 'main' into improve-typing-for-dbapihook

1e626ad

dwreeves requested a review from uranusjr June 13, 2023 17:10

dwreeves commented Jun 13, 2023

View reviewed changes

Merge branch 'main' into improve-typing-for-dbapihook

8502619

uranusjr approved these changes Jun 19, 2023

View reviewed changes

dwreeves added 2 commits June 19, 2023 17:51

move back to ellipses for overloads

86c698d

Merge branch 'main' of github.aaakk.us.kg-dwreeves:apache/airflow into improv…

c345d8a

…e-typing-for-dbapihook

Merge branch 'improve-typing-for-dbapihook' of github.aaakk.us.kg-dwreeves:dw…

d03e1bc

…reeves/airflow into improve-typing-for-dbapihook

dwreeves force-pushed the improve-typing-for-dbapihook branch from 6a77dae to d03e1bc Compare June 20, 2023 01:05

uranusjr approved these changes Jun 20, 2023

View reviewed changes

dwreeves and others added 4 commits June 22, 2023 11:44

Merge branch 'main' into improve-typing-for-dbapihook

26aa5e1

resolve merge conflicts

edfb2c1

Merge branch 'main' into improve-typing-for-dbapihook

be0cd4a

Merge branch 'main' into improve-typing-for-dbapihook

90b0a61

Merge branch 'main' into improve-typing-for-dbapihook

11ed2fd

dstandish approved these changes Jul 18, 2023

View reviewed changes

uranusjr merged commit 60c49ab into apache:main Jul 18, 2023

eladkal mentioned this pull request Jul 29, 2023

Status of testing Providers that were prepared on July 29, 2023 #32936

Closed

69 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add more accurate typing for DbApiHook.run method #31846

add more accurate typing for DbApiHook.run method #31846

dwreeves commented Jun 11, 2023 •

edited

Loading

uranusjr left a comment

uranusjr Jun 12, 2023

dwreeves Jun 12, 2023

uranusjr Jun 13, 2023

dwreeves Jun 13, 2023 •

edited

Loading

dwreeves Jun 13, 2023

uranusjr Jun 12, 2023

dwreeves Jun 12, 2023

uranusjr Jun 12, 2023

dwreeves Jun 13, 2023

dwreeves commented Jun 13, 2023

dwreeves Jun 13, 2023 •

edited

Loading

uranusjr Jun 13, 2023

dwreeves Jun 13, 2023

dwreeves commented Jun 18, 2023 •

edited

Loading

uranusjr left a comment

uranusjr commented Jun 19, 2023

dwreeves commented Jun 20, 2023

dwreeves commented Jul 3, 2023

dstandish left a comment

	handler: Callable[[Any], T] = None, # type: ignore[assignment]
	handler: Callable[[Any], T] \| None = None,

add more accurate typing for DbApiHook.run method #31846

add more accurate typing for DbApiHook.run method #31846

Conversation

dwreeves commented Jun 11, 2023 • edited Loading

Overview

Misc.

uranusjr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dwreeves Jun 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dwreeves commented Jun 13, 2023

dwreeves Jun 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dwreeves commented Jun 18, 2023 • edited Loading

uranusjr left a comment

Choose a reason for hiding this comment

uranusjr commented Jun 19, 2023

dwreeves commented Jun 20, 2023

dwreeves commented Jul 3, 2023

dstandish left a comment

Choose a reason for hiding this comment

dwreeves commented Jun 11, 2023 •

edited

Loading

dwreeves Jun 13, 2023 •

edited

Loading

dwreeves Jun 13, 2023 •

edited

Loading

dwreeves commented Jun 18, 2023 •

edited

Loading