Skip to content

Commit

Permalink
Improve multi-byte cutter/chunk (#233)
Browse files Browse the repository at this point in the history
* 🐛 multi-byte cutter/chunk is not accurate enough on u16, u32 (le/be)

* 🔖 bump version 3.0.1-dev
  • Loading branch information
Ousret authored Nov 12, 2022
1 parent 5ec4a27 commit 2d26aeb
Show file tree
Hide file tree
Showing 3 changed files with 7 additions and 2 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,11 @@
All notable changes to charset-normalizer will be documented in this file. This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

## [3.0.1](https://github.com/Ousret/charset_normalizer/compare/3.0.0...master) (unreleased)

### Fixed
- Multi-bytes cutter/chunk generator did not always cut correctly (PR #233)

## [3.0.0](https://github.com/Ousret/charset_normalizer/compare/2.1.1...3.0.0) (2022-10-20)

### Added
Expand Down
2 changes: 1 addition & 1 deletion charset_normalizer/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -396,7 +396,7 @@ def cut_sequence_chunks(

# multi-byte bad cutting detector and adjustment
# not the cleanest way to perform that fix but clever enough for now.
if is_multi_byte_decoder and i > 0 and sequences[i] >= 0x80:
if is_multi_byte_decoder and i > 0:

chunk_partial_size_chk: int = min(chunk_size, 16)

Expand Down
2 changes: 1 addition & 1 deletion charset_normalizer/version.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@
Expose version
"""

__version__ = "3.0.0"
__version__ = "3.0.1-dev"
VERSION = __version__.split(".")

0 comments on commit 2d26aeb

Please sign in to comment.