Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comments are not accounted for #46

Closed
eemeli opened this issue Apr 12, 2024 · 5 comments
Closed

Comments are not accounted for #46

eemeli opened this issue Apr 12, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@eemeli
Copy link
Contributor

eemeli commented Apr 12, 2024

At least the following make using the types somewhat challenging:

  • The tag of an element can be etree.Comment, a function.
  • The text of a comment is always a str.

Given these, I wouldn't be surprised to find other places where the types are misaligned with respect to comments.

@abelcheung
Copy link
Owner

abelcheung commented Apr 12, 2024

So I got it all wrong.

@eemeli I presume you mean the tag selector here, please tell me (or even better, provide an example) if I didn't get your real intention.

It is indeed a challenge on how to fit in other types of element factory into the _TagSelector alias, but ultimately there's no good solution. Not just etree.Comment(), there are etree.Entity() and etree.ProcessingInstruction() too, all of them having different call arguments.

That's why I choose to use generic Callable in _TagSelector, this should cover usage of etree.Comment too. For example, the following code should be ok for type checkers:

    from lxml import etree
    tree = etree.parse(somefile)
    result = tree.iter(etree.Comment)

@eemeli
Copy link
Contributor Author

eemeli commented Apr 12, 2024

The code that I encountered these issues with looks roughly like this:

from lxml import etree
root = etree.fromstring('<foo><!-- comment --></foo>')
comment: list[str | None] = []  # TODO: should be list[str]
for el in root:
    if el.tag == etree.Comment:  # type: ignore[comparison-overlap]
        comment.append(el.text)

To satisfy mypy, I need the type: ignore comment because the current types are presuming that el.tag is always a string, and it isn't for comments.

Also, even if I narrow the typing of el with cast(etree._Comment, el), el.text won't be typed as str, but always as str | None; hence the wider typing for comment.

@abelcheung abelcheung added the bug Something isn't working label Apr 12, 2024
@abelcheung
Copy link
Owner

abelcheung commented Apr 12, 2024

For the first problem, I don't have any good solution so far. This is the price needed to pay when I decided to follow lxml on how _Comment is inherited from _Element. _Comment has totally different behavior from _Element, yet it is a subclass of _Element. I could have severed the relation between _Comment and _Element, but that leads to another set of problems.

Right now you may want to try narrowing down the type of el itself, for example:

for el in root:
    if isinstance(el, etree._Comment):
        comment.append(el.text)

On second problem, you're right. It is possible to use None as input argument in etree.Comment(), but the .text attribute of comment element is always str. Will fix that soon. Done.

abelcheung added a commit that referenced this issue Apr 12, 2024
It is empty string even when there is no text content for __ContentOnlyElement. Partially addresses #46.
@eemeli
Copy link
Contributor Author

eemeli commented Apr 13, 2024

The instanceof check is indeed better, and the text fix looks promising. Thank you!

@abelcheung
Copy link
Owner

The fixable part of this issue is addressed, so I'm closing. Please feel free to reopen if something related to this issue still doesn't work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants