-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve annotation of fromstring
#64
Conversation
`parser` seems like it can be an `HTMLParser. There's code in the wild that does this without obviously being wrong. As far as I can tell from the lxml Cython source, `parser` is a `_Baseparser`. I wasn't sure if the `iterparse` class ought to be allowed here too. I erred on the side of caution. It's possible for this function to return None, if the parser has `recover=True`. I couldn't find a good way to express this without affecting every other call to `fromstring`. (Perhaps one could make the Parser generic over some kind of `Recoverable` indicator type... but that seemed like overkill) Resolves lxml#63.
lxml-stubs/etree.pyi
Outdated
@@ -496,8 +496,11 @@ def parse( | |||
source: _FileSource, parser: XMLParser = ..., base_url: _AnyStr = ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The HTMLParser
is also allowed here.
test-data/test-etree.yml
Outdated
from lxml import etree | ||
document = etree.fromstring("<doc></doc>") | ||
reveal_type(document) # N: Revealed type is "lxml.etree._Element" | ||
reveal_type(document) # N: Revealed type is "Union[lxml.etree._Element, None]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since None
is really a thing that you'd only get if you pass a suitable parser (and that gets in the way otherwise), maybe we can split up the declarations of fromstring()
and parse()
to only allow None
return values if the parser argument is provided?
Admittedly, you can also change the default parser globally, which then allows a None
return value here. And, in fact, it's not even guaranteed that the return value is an Element
otherwise, since parsers can really return whatever they choose. However, that's such an extremely rare use case that I'd lean towards not expressing it in the type system. (OTOH, the whole point of this PR is to cater for an extremely rare use case, so …)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In typeshed we generally avoid Unions in return types, for reasons discussed in python/mypy#1693. In this case, it sounds like returning None is a very rare edge case, but if we return Element | None
, we force every caller to check for None after calling this function. As you say, one improvement could be to use overloads for cases where we know the return value is definitely not None. Another approach we've used in typeshed is to return something like Element | Any
. This gives you a lot of type safety but also allows callers to do None checks without hitting "unreachable code" warnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks both!
On returning Union types: I'd read some of the discussion in python/typing#566 which sounded promising at first, but as Jukka writes here:
For new code and new APIs the recommended way is to avoid signatures that would require the use of AnyOf anyway.
I'm about to push three commits which:
- clarify that
parse
accepts an HTML parser too; - express that
fromstring()
without a parser returns an_Element
(the comment about changing the default parser notwithstanding; and - change the return type of
fromstring
to_Element | Any
.
To be explicit: I don't have strong opinions about this PR. I like the idea of annotations that describe everything that a function can possibly do; but there's a balance to between that and pragmatism, and I'm happy to drop the | None
change if it's too niche. It'd be nice to get the bit about HTMLParser in though, if nothing else!
This isn't strictly true: one can apparently change the default parser. But it's a nice refinement.
so as to not burden existing users with having to check the None-ness of `fromstring()` return values
Co-authored-by: scoder <[email protected]>
Thanks |
Resolves #63. See discussion there for more context.
parser
seems like it can be anHTMLParser
. There's code in the wildthat does this without obviously being wrong. As far as I can tell from
the lxml Cython source,
parser
is a_BaseParser
, which encompassesXMLParser
,HTMLParser
and theiterparse
class. I wasn't sure if thelatter ought to be allowed here too. I erred on the side of caution.
It's possible for this function to return None, if the parser has
recover=True
. I couldn't find a good way to express this withoutaffecting every other call to
fromstring
. (Perhaps one could make theParser generic over some kind of
Recoverable
indicator type... butthat seemed like overkill).