Skip to content

Releases: abelcheung/types-lxml

2024.12.13

13 Dec 12:59
2024.12.13
525e6f8
Compare
Choose a tag to compare

Breaking changes and features

  • bytearray accepted as tag names, attribute names and attribute values
    • Related change: create _TextArg type alias to slowly replace existing _AnyStr (#71)
  • Warn IDE users via warnings.deprecated about exception upon certain argument combinations in HTML link functions

Bug fixes

  • Property deleter missing for HTML elements (#73)
  • etree.strip_attributes() support bytes and QName as input
  • Completion of #64 for remaining known cases
  • Corrected link replacement function return type in html.rewrite_links()
  • etree.canonicalize() shouldn't accept bytes as input

Tests related

  • Use hypothesis for extensive tests on function arguments, currently used in _Attrib and HTML link function tests (#75)
  • reveal_type() injector has been split into its own project and pulled via dependency

Internal changes

  • Folder structure changes for the whole repository (#70)
  • Remove _HANDLE_FAILURES type alias and show values directly to users
  • Rename type-only protocol SupportsLaxedItems to SupportsLaxItems

Full Changelog: 2024.11.08...2024.12.13

2024.11.08

08 Nov 10:48
2024.11.08
99f984d
Compare
Choose a tag to compare

Breaking and important changes

image showing deprecation warning

  • pyright users (and IDE that can make use of pyright) will see warning if a single string is supplied where collection of string is expected (tuple, set, list etc). In terms of typing, a single str itself is valid as a Sequence, so type checkers normally would not raise alarm when using str in such function parameters, but can induce unexpected runtime behavior. (#64)
    • _ElementTree.write(), etree.fromstringlist(), etree.tostring(), html.soupparser.fromstring(), html.soupparser.parse()
  • It is possible to verify release files indeed come from GitHub and not maliciously altered. See Release file attestation for detail.
  • Runtime tests support comparing with mypy results, therefore officially making static stub tests obsolete

Bug fixes

  • Element tag names, attribute names and attribute values support bytearray. This is discovered via hypothesis testing, which is intended to be utilized in next release
  • Compatibility with pyright ⩾ 1.1.378, which imposes additional overload warning for etree.iterparse()
  • Use relative import in lxml.ElementInclude, otherwise mypy triggers --install-type behavior.
  • ObjectifiedElement __getitem()__ and __setitem()__ should accept str as key, which behaves mostly like __getattr__() and __setattr__(). That means, elem["foo"] is equivalent to elem.foo for non-repeating subelements.

fixes for etree submodule

  • _Element.tag property is not just a str. It is str after initial document or string parsing, but can be set manually to any type supported by tag name and returns the same object.
  • When QName is initialized with first argument set to None, _Element can be used as second argument (which is promoted to first argument in implementation)
  • Relax single argument usage in _Element.iter*() method family, doesn't need tag= keyword when argument is None
  • FunctionNamespace() should generate an _XPathFunctionNamespaceRegistry object, not its superclass
  • For decorator usage of _XPathFunctionNamespaceRegistry and _ClassNamespaceRegistry, decorator signature included an extraneous argument, though it doesn't affect any existing correct usage.
  • indent() first parameter has wrong name

fixes for html submodule

  • soupparser.parse() should accept pathlib.Path object as input
  • .value property of SelectElement can't be set to bytes
  • .action property of FormElement can have a value of None, and can be set to None. They have different meanings though.

Small and internal changes

  • Declare python 3.13 support and perform CI tests.
  • Separation of pyright and mypy ignore comments: in previous releases # type: ignore[code] was enabled in pyright settings. Now it only uses # pyright: ignore[code] so mypy comment won't affect pyright behavior.
  • Add ._name property to html.FormElement for form name
  • Eliminate typing.TypeAlias usage (declared obsolete, and we can do without it)

Test related changes

  • Stub tests migration to runtime:
    • Most of remaining etree._Element methods, now only .makeelement() and .xpath() left in stub test
  • Runtime test additions:
    • ElementNamespaceClassLookup()
  • tox config migrated to pyproject.toml, thus requiring tox ⩾ 4.22
  • Runtime tests are now executed within test-rt folder due to python/mypy#8400
  • Some tests need to be performed conditionally when multi-subclass patch is applied
  • Some tests or syntaxes need to be turned off to cope with mypy deficiencies
  • Usage of Rust-based uv as well as related tox plugin to speed up test environment recreation
  • Don't force users installing tox-gh-actions when checkout out repository, it is only useful for GitHub workflows

Docstring additions

  • etree submodule: parse(), fromstringlist(), tostring(), indent(), iselement(), adopt_external_document(), DocInfo properties, QName, CData, some exception classes
  • html.soupparser submodule: fromstring(), parse(), convert_tree()

2024.09.16

16 Sep 07:07
2024.09.16
470f1bf
Compare
Choose a tag to compare

Bug fix and small changes

  • Namespace argument in Elementpath methods should allow None (#60 thanks to @cukiernick)

Internal changes

  • Perform runtime tests against lxml 5.3

2024.08.07

07 Aug 08:04
2024.08.07
9187118
Compare
Choose a tag to compare

Breaking changes

  • Multiple builds available, with the alternative build enhancing multiple XML subclassing scenario. See relevant README section for detail. Thanks to @scanny for the driving force behind #51.
  • Mypy 1.11 required, which introduced backward incompatible @typing.overload changes.
  • lxml.html.clean stub depreated, lxml 5.2.0 completely removes the submodule due to multiple security issues. Corresponding code and type definitions are split into a new independent repo.

Features

  • (#56) Replace typing.TypeGuard with typing.TypeIs
  • Use callback protocol for more precise element and ElementMaker factory function typing
  • lxml.etree.ICONV_COMPILED_VERSION exported since 5.2.2
  • Special handling for ObjectifiedElement and HTMLElement in lxml.cssselect.CSSSelector and various cssselect() methods
  • html.builder shorthands return more precise element type for certain HTML elements. For example, html.builder.LABEL(), corresponding to <LABEL> tag, yields LabelElement.
  • More precise etree.Extension() annotation depending on supplied namespace
  • Stricter namespace argument type in _Element ElementPath methods
  • For lxml.builder.ElementMaker class:
    • Provide better hint in __call__() argument
    • Accepts namespace tuple in nsmap argument
    • Export private properties
  • For lxml.sax module:
    • Export private properties in various classes
    • Explicitly list all inherited methods in ElementTreeContentHandler class, as method arguments names are different from superclass ones
  • Alert etree.HTMLParser users to remove deprecated strip_cdata argument

Bug fix and small changes

  • Some _Element related input arguments fixed to use typing.Sequence instead of Interable, as _Element is already an Iterable itself. Supplying _Element where a proper Iterable is expected would cause problem.
  • Similar situation arises for str or byte in tag selector argument; use typing.Collection to alert user more clearly.
  • None can't be used as etree.strip_*() argument
  • Some etree.DocInfo read-only properties can't be None
  • Fix etree.Resolver method return types
  • Avoid exception raising arg combinations in html.html5parser.HTMLParser

Internal changes

  • The usual static stub to runtime test migration:
    • Part of basic _Element tests and its find*() methods
    • More extensive _Attrib tests
  • Use ruff to replace black and isort as code formatter
  • Migrate stub tests to support pytest-mypy-plugins ⩾ 2.0
  • Use pdm-backend as build backend due to its more versatile versioning support

2024.04.14

14 Apr 04:45
2024.04.14
8335d33
Compare
Choose a tag to compare

Breaking changes

  • Mypy 1.9 is required, dropping 1.5 support. 1.6 - 1.8 was never supported.
  • lxml.ElementInclude completely reworked

Features

  • PEP 696 support, simplifying usage of some subscripted types (#42)
    • As a convenient side effect, lxml.html parser constructor signatures can be removed
  • All annotations do provide default values in their signatures now instead of ...

Bug fix and small changes

  • Type of _Comment.text property (and those of similar elements) is always str (#46, thanks to @eemeli)
  • Tag selector argument in element iterator methods should support keyword with a single tag (#45, thanks to @eemeli)
  • html.fragments_fromstring() should receive same fix as html.html5parser.fragments_fromstring() do (#43, thanks to @Wuestengecko)
  • @overload for etree.SubElement() on handling of HtmlElement and ObjectifiedElement
  • Some exported constants were missing from lxml.ElementInclude stub
  • html.soupparser module functions return type depends on makeelement argument
  • Keyword arguments in html.soupparser module functions are explicitly listed now (instead of generic **kwargs before)
  • The 2 arguments in html.diff.html_annotate() should align their annotation types
  • html.submit_form() return type depends on the result of open_http function argument
  • Add missing exported variable for lxml.isoschematron
  • Uppercase variants of output method arguments ("HTML", "TEXT", "XML") were dropped

Internal changes

  • Usual runtime test additions: lxml.html.soupparser, lxml.ElementInclude, various exported constants
  • Runtime tests also do test against lxml 5.2

2024.03.27

27 Mar 17:13
2024.03.27
138fcaf
Compare
Choose a tag to compare

Breaking change

  • Requires cssselect ⩾ 1.2 for annotation in lxml.cssselect, since cssselect is now inline annotated.

Bug fix and small changes

  • Compatibility with pyright ⩾ 1.1.353
  • In etree.clean_* functions, first argument (the Element or ElementTree to be processed) must be strictly positional
  • etree._LogEntry.filename property is never empty, as it uses the value <string> as fallback
  • etree._BaseErrorLog.receive() argument name was wrong
  • Self brewed SupportsReadClose protocol dropped, replacing with more standardized SupportsRead
  • html.html5parser.parse() should support data stream as input
  • html.html5parser.fragments_fromstring() return type is dependent on no_leading_text argument
  • encoding arguments in various methods / functions used to only support ASCII and UTF-8 as byte encodings, now the restriction is lifted
  • Place some typing usage under python version check (if sys.version_info >= (3, x))
  • etree.PyErrorLog constructor shouldn't accept 2 logger arguments simultaneously
  • etree.PyErrorLog.level_map property reverted to vanilla type (int) instead of our fake enum

Internal changes

  • Some runtime tests are lxml version dependent (#34, thanks to @fabaff)
  • Adds stub check for _Element, _Comment and _ElementTree (#33, thanks to @udifuchs)
  • Following stub tests migrated to runtime: _Attrib, _ErrorLog and friends, html5lib

2024.02.09

09 Feb 10:19
2024.02.09
433d03a
Compare
Choose a tag to compare

Bug fix and small changes

  • Add back HtmlProcessingInstruction element (#28, thanks to @eliotwrobson)
  • Silence pyright ⩾ 1.1.345 warning on overriding read-write property with read-only one (ObjectifyElement.text)

Documentation

  • mypy ⩾ 1.6 does not support PEP702, thus shouldn't be used with types-lxml

Internal changes

  • Stub test suite uses mypy 1.5.x now

2023.10.21

21 Oct 17:29
2023.10.21
31088c5
Compare
Choose a tag to compare

Bug Fix

  • Types for emitted events and values in iterparse() were not optimal (issue #19, thanks to @Daverball)
  • Most html link and clean functions should be unable to process ElementTree, except Cleaner.clean_html()

Feature

  • Completed following modules, thus really having lxml fully covered (sans a few submodules that will never be implemented):
    • lxml.html.diff
    • lxml.ElementInclude
  • Declares support for Python 3.12
  • Update for upcoming lxml 5.0
    • Schematron constructor arguments
    • Some obsolete functions removed

Internal change


Here is the list of change since last release. Besides, please check out release notes for previous release as well, since it contains substantial changes.

2023.3.28

28 Mar 04:10
2023.3.28
9bb1e9b
Compare
Choose a tag to compare

The list of changes since last release is huge, be it visible by users or not.

Breaking changes

  • Class inheritance of html.HtmlComment and friends have changed to deviate from source code. Now they are 'thought' to inherit from html.HtmlElement within stubs, like the XML etree._Element counterpart. Refer to wiki document on how and why this change is done.
  • Shelved custom parser target support (custom parser target is used when initiating XML / HTML parsers with target= argument), as current python typing system is deemed insufficient to get it working without plugins.
  • Stub package only depends on other stub packages, following behavior of typeshed distributed stubs. This means lxml is no longer pulled in when installing types-lxml.
  • etree.SmartStr reverted back to its original class name
  • etree._ErrorLog is now made a function that generates etree._ListErrorLog (despite the fact that it is a class in source code), according to actual created instance type

Significant changes / completion

  • Completed following submodules and parts, thus removing the partial status of types-lxml package:
    • lxml.etree proper:
      • XSLT related classes / functions
      • XML:ID support
      • External document and URI resolving
      • XInclude support
      • XPath and XSLT extension function registry
      • Error log and reporting, along with numerous bug fixes
      • etree.iterparse and etree.iterwalk
      • Various ElementClassLookup types
    • lxml.objectify
      • Includes all DataElement subtypes and type annotation support
    • lxml.isoschematron
  • When subclassing XML elements, now most of its methods can be inherited without overriding output element type.

Smaller changes

  • More extensive usage of Python 3.9-3.11 typing features, this is possible since types-lxml is external stub package and doesn't affect source code. Such as:
    • Marking string constants as LiteralString (PEP 675)
    • Make type aliases more explicit (PEP 613)
    • Convenient Self when declaring methods (PEP 673)
  • Both mypy and pyright type checkers have strict mode turned on when verifying stub source
  • _Element.sourceline property becomes read-only
  • Re-added most deprecated methods in various places, with help from provisional PEP 702 support (@deprecated) in pyright
  • Incorporate more docstring from official lxml classes, in case IDEs can display them in user interface.
  • Force _XPathEvaluatorBase subclasses to make __call__ available, by explicitly declaring it as abstract method within _XPathEvaluatorBase
  • Removal of http.open_http_urllib, which is only intended as a fallback callback function for html.submit_form() without user intervention
  • libxml2 error constants become integer enum in stub
  • Warn userland usage of dummy etree.PyErrorLog.copy(), because it is only intended for smoother internal lxml error handling.

Bug fixes

  • File reading source (used in file= argument in parse() and friends) requirement relaxed
  • html.(X)HtmlParser __init__ was missing some arguments
  • Convert iter* methods of Elements and some tag cleanup functions into @overload, to better reflect its original intended arguments usage
  • etree.ElementBase and similar public base element classes lacked __init__
  • Setting of etree.DocInfo text properties now accepts bytes
  • name= argument of html.HtmlElementClassLookup() doesn't accept None
  • Concerning _Comment, _Entity, _ProcessingInstruction, and their subclasses
    • .tag attribute now returns correct value (the basic etree element factory function)
    • Users will be warned if they use these elements like normal XML _Element do, such as treating them as parent elements and insert children element into them

2023.02.11

11 Feb 22:50
2023.02.11
db5e903
Compare
Choose a tag to compare