- Vulnerability: Fixed an issue where normalizing unicode too late in the process would keep disallowed tags when using specially crafted HTML. Fixed in 2.4.2.
- Fixed missing whitespace while merging adjacent tags.
- Raised the minimum lxml-html-clean version to 0.4 because of a reported vulnerability. We were already compatible, but enforcing the lower bound makes sense.
- Fixed an edge case where
br
tag attributes weren't removed if the br tag appears first. - Updated the
lxml
dependency to 5.2 and added the now-requiredlxml[html_clean]
extra.
- Avoided adding whitespace when merging tags of the same type.
- Updated the tests.
- Switched from black to the ruff formatter.
- Changed
keep_normalized_whitespace
to preserve whitespace at the tail of tags, not just between tags. - Changed the parameters of
normalize_whitespace_in_text_or_tail
to be keyword-only.
- Added a test for a type of misconfiguration.
- Changed the sanitizer configuration validation to not allow unexpected data
types in
tags
,empty
,separate
,whitespace
andattributes
.
- Raised the minimum Python version to 3.7. Added Python 3.10, 3.11.
- Raised the minimum lxml version to the current 4.9.1.
- Switched from Travis CI to GitHub actions. Added Python 3.9 to the CI matrix.
- Renamed the main branch to main.
- Switched to a declarative setup.
- Fixed a whitespace dependency in the testsuite.
- Switched to hatchling and ruff.
- Made behavior-altering arguments to
normalize_overall_whitespace
keyword-only.
1.9 (2020-01-20)
- Added Python 3.8 to the CI matrix.
- Be able to keep the
<style>
tag by adding it totags
. - Added a style check to the CI matrix.
1.8 (2019-11-21)
- Actually added support for customizing lxml's autolinking behavior using a dictionary argument.
- Stopped removing explicitly allowed attributes.
- Removed
id
from allowed attributes of<a>
tags to provide an additional layer of defense against DOM clobbering attacks. - Added an element preprocessor which assigns the
id
value to thename
attribute of anchors ifname
isn't set or empty. This should provide additional backwards compatibility making theid
removal less of a problem when using named anchors.
1.7 (2019-02-19)
- Added a system check which validates sanitizer configurations early when using Django.
- Fixed an edge case where passing in an empty allowed tags list would unexpectedly and silently not remove any tags at all (because that's the way lxml's cleaner works).
- Changed the sanitizer
tags
,empty
andseparate
options to also accept any iterable, not just sets. - Changed the
lru_cache
import in the Django module to tryfunctools
first. - Fixed the tag merging to also check tags in
empty
. This means that e.g. consecutive<hr>
tags are also merged now when using the default settings. - Made it possible to override the set of tags processed as whitespace.
The default set is
{"br"}
which preserves the current behavior of stripping breaks from the beginning or end of tags' content.
1.6 (2018-06-29)
- Fixed another edge case where a tag which is allowed to be empty was
erroneously removed if it contained not only whitespace but also a
<br>
tag.
1.5 (2018-06-01)
- Fixed a few edge whitespace normalization edge cases and a bug where removing an empty tag removed all whitespace.
- Added black for automatically formatting the Python code.
- By default, links with
target="_blank"
get an additionalrel="noopener"
attribute (Article by Mathias Bynens). If you're overriding the list of allowed attributes for anchor tags you must addrel
to your list.
1.4 (2018-03-29)
- Corrected the required lxml version in
install_requires
. - Added comments and testing for more edge cases.
- Changed the cleaner to not drop form elements; instead,
<form>
is converted to<p>
, and form elements are preserved. - Added an
is_mergeable
hook for conditionally preventing the merging of adjacent elements. - Fixed a case where paragraphs were allowed inside paragraphs (which was never the idea).
1.3 (2017-09-22)
- Fixed a case where tags with content between them were erroneously merged.
- Added a
tox.ini
file for running style checks and tests. - Replaced
REPLACEMENTS
andelement_filters
with the more generalelement_preprocessors
andelement_postprocessors
settings. - Removed the restriction that
<span>
tags are never allowed.
1.2 (2017-05-25)
- Fixed the erroneous removal of all whitespace between adjacent elements.
- Fixed a few occasions where
<br>
tags were erroneously removed. - Back to beautifulsoup4 for especially broken HTML respectively HTML with Emojis on macOS.
- Used a
<div>
instead of<anything>
to wrap the document (since beautifulsoup4 does not like custom tags too much)
1.1 (2017-05-02)
- Added
html_sanitizer.django.get_sanitizer
to provide an official way of configuring HTML sanitizers using Django settings.
1.0 (2017-05-02)
- Initial public release.