-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add more accurate typing for DbApiHook.run method #31846
Conversation
1331739
to
985d29b
Compare
aa86823
to
a20554a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thoughts and a question
sql: str | Iterable[str], | ||
autocommit: bool = False, | ||
parameters: Iterable | Mapping | None = None, | ||
handler: Callable[[Any], T] = None, # type: ignore[assignment] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handler: Callable[[Any], T] = None, # type: ignore[assignment] | |
handler: Callable[[Any], T] | None = None, |
I don’t think handler
is required for this signature…?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused and I'm curious what you mean.
For the Databricks hook's run
, we still have in the method these lines of code:
if handler is None:
return None
So, it seems it should still be overloaded. And the # type: ignore[assignment]
is needed if we assign defaults to the overloads instead of ...
.
What am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is it the other way around, for the T | list[T]
variant, handler must be set to a callable right? I think in that case I’d perfer we simply remove the argument default from the overload signature altogether. Maybe also make it keyword-only (not technically perfect but practically that’s how the argument should be specified anyway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is it the other way around, for the
T | list[T]
variant, handler must be set to a callable right?
Yes.
I think in that case I’d perfer we simply remove the argument default from the overload signature altogether. Maybe also make it keyword-only (not technically perfect but practically that’s how the argument should be specified anyway).
The problem with doing that is Mypy doesn't allow for it. The mypy-providers
returns 18 errors if I do that:
Found 18 errors in 16 files (checked 993 source files)
Here is what I replaced each run
typing with (+ Snowflake one has extra kwarg).
@overload
def run(
self,
sql: str | Iterable[str],
*,
autocommit: bool,
parameters: Iterable | Mapping[str, Any] | None,
handler: None,
split_statements: bool,
return_last: bool,
) -> None:
...
@overload
def run(
self,
sql: str | Iterable[str],
*,
autocommit: bool,
parameters: Iterable | Mapping[str, Any] | None,
handler: Callable[[Any], T],
split_statements: bool,
return_last: bool,
) -> T | list[T]:
...
def run(
self,
sql: str | Iterable[str],
autocommit: bool = False,
parameters: Iterable | Mapping[str, Any] | None = None,
handler: Callable[[Any], T] | None = None,
split_statements: bool = False,
return_last: bool = True,
) -> T | list[T] | None:
Similarly, Leaving as positional args (i.e. excluding the *
) also returns 18 errors.
I believe the only way to avoid Mypy errors are the following 2 options:
- To do what I did originally, and set
= ...
as the kwargs. Although I could be wrong, I do not believe this breaks any IDEs or static type checkers. And Mypy is cool with it, and it doesn't involve using# type: ignore
. - Do what I have in the current PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return_last: bool = True, | ||
) -> T | list[T]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be further split into Literal[True]
returning T
and Literal[False]
returning list[T]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're correct. I decided initially that I wanted to take baby steps with this PR, but we can go all the way if you'd like!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do whatever you feel comfortable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm gonna be real. I tried breaking out T
and list[T]
return types based on Literal
s but I couldn't get Mypy to agree with me that there were no conflicts. I will just leave that to someone else in the future who is more knowledgeable than I am about Mypy. 😅
The scope of the PR increased a little bit here. Hope that isn't too bad.
|
def run( | ||
self, | ||
sql: str | Iterable[str], | ||
autocommit: bool = False, | ||
parameters: Iterable | Mapping | None = None, | ||
handler: Callable | None = None, | ||
parameters: Iterable | Mapping[str, Any] | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the one package where I'm unsure... 🤔 I'm actually thinking, the old annotation was incorrect and it never allowed for Iterable
types:
- Note that the original annotation was
Iterable | Mapping | None
. So, Airflow has said for a while that it allows for an iterable. pyexasol
isn't using a SQLAlchemy connection object, as far as I can tell.- When I look at
pyexasol
's documentation, and I look at the API https://github.com/exasol/pyexasol/blob/master/docs/REFERENCE.md I don't see any ability to take in a non-mapping iterable type as a valid input. The reason the other connections take in anIterable
is because they all wrap SQLAlchemy. But if you look carefully, you'll noticepyexasol
is completely foregoing SQLAlchemy.
It's possible the safest thing to do here would be to not touch this file's parameters
type annotations at all, and just leave as Iterable | Mapping | None
for the run()
method + don't touch anything else.
That said, we may as well not let this little bit of digging go to waste. Someone who is knowledgeable on pyexasol
should confirm whether I am correct that the Iterable
type should not be here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Annotations in providers can be inaccurate since not all of them are very often used. I’d say we can just change the annotation and see if anyone complains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh no, mypy yelled at me for violating the Liskov substitution principle when I swap to Mapping[str, Any] | None
, of course... 🤦 Ahh. I'm just going to keep as-is. I don't want to mess around too much with this. I hope that is OK.
@uranusjr Coming back to / checking in on this. I think the main outstanding thing with this PR is in regards of what to do with default arguments in overloads.
So right now we're on 2, originally I started with 3, and the latest communication I received was to switch to 1 but this causes MyPy to fail without extensive use of Due to 1 not really working, I think we should stick with either 2 or 3. Let me know which of them you'd like. An updated summary of everything else would be:
And that's about it. Let me know what you want to do. 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is about as good as we can get. This kind of one or many depending on flag kind of thing is by design difficult to type (pro tip: don’t design API like this).
(Forgot to respond to the comment)
Using |
@uranusjr should be all set. |
…reeves/airflow into improve-typing-for-dbapihook
6a77dae
to
d03e1bc
Compare
Fixed merge conflicts and merged changes that have passing CI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think i saw anything that would change behavior so looks ok to me
Overview
The
DbApiHook.run
method has incredibly complex typing, which is currently reduced to just the following:This is incomplete typing for a few reasons:
bool
s in the kwargs determine the return type.handler
isNone
determines whether the return type isNone
.handler
is generic and determines the return type of therun
method.Addressing the first bullet point would be a bit of work, and would add a lot of complicated typing to each provider hook.
But I figured addressing the 2nd and 3rd bullet points was easy and takes just 2 overloads. This alone helps vastly improve the typing of of the
DbApiHook.run
method with a minimal amount of added annotation complexity.Misc.
DbiApiHook
directly, (2) they are using mypy or another type checker, and (3) they had type checking issues that were not being properly flagged before due to the less precise typing.