Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: jawah/charset_normalizer
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: 2.0.7
Choose a base ref
...
head repository: jawah/charset_normalizer
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: 3.0.1
Choose a head ref
Loading
Showing with 3,947 additions and 48,852 deletions.
  1. +8 −0 .codecov.yml
  2. +2 −0 .github/FUNDING.yml
  3. +4 −0 .github/dependabot.yml
  4. +5 −4 .github/workflows/chardet-bc.yml
  5. +0 −56 .github/workflows/codeql-analysis.yml
  6. +54 −0 .github/workflows/codeql.yml
  7. +6 −5 .github/workflows/detector-coverage.yml
  8. +4 −3 .github/workflows/integration.yml
  9. +6 −5 .github/workflows/lint.yml
  10. +40 −0 .github/workflows/mypyc-verify.yml
  11. +4 −3 .github/workflows/performance.yml
  12. +240 −8 .github/workflows/python-publish.yml
  13. +6 −5 .github/workflows/run-tests.yml
  14. +19 −0 .readthedocs.yaml
  15. +313 −0 CHANGELOG.md
  16. +0 −3 CONTRIBUTING.md
  17. +2 −2 MANIFEST.in
  18. +18 −31 README.md
  19. +4 −0 SECURITY.md
  20. +9 −6 bin/bc.py
  21. +9 −5 bin/coverage.py
  22. +13 −1 bin/performance.py
  23. +2 −2 bin/run_autofix.sh
  24. +1 −1 bin/run_checks.sh
  25. +1 −1 bin/serve.py
  26. +7 −0 build-requirements.txt
  27. +12 −14 charset_normalizer/__init__.py
  28. +210 −185 charset_normalizer/api.py
  29. +1,439 −1,243 charset_normalizer/assets/__init__.py
  30. +138 −88 charset_normalizer/cd.py
  31. +30 −23 charset_normalizer/cli/normalizer.py
  32. +41 −42 charset_normalizer/constant.py
  33. +1 −53 charset_normalizer/legacy.py
  34. +121 −90 charset_normalizer/md.py
  35. +32 −88 charset_normalizer/models.py
  36. +128 −43 charset_normalizer/utils.py
  37. +1 −1 charset_normalizer/version.py
  38. +9 −0 data/NOTICE.md
  39. +6 −0 data/sample-arabic-1.txt
  40. +6 −0 data/sample-arabic.txt
  41. +59 −0 data/sample-french-1.txt
  42. +59 −0 data/sample-french.txt
  43. +204 −0 data/sample-polish.txt
  44. +7 −0 data/sample-russian-3.txt
  45. +0 −8,519 data/sample.1.ar.srt
  46. +0 −1,769 data/sample.1.fr.srt
  47. +0 −2,011 data/sample.1.gr.srt
  48. +0 −3,905 data/sample.1.he.srt
  49. +0 −6,188 data/sample.1.hi.srt
  50. +0 −1,919 data/sample.1.ru.srt
  51. +0 −95 data/sample.1.tu.srt
  52. +0 −7,341 data/sample.2.ar.srt
  53. +0 −3,074 data/sample.3.ar.srt
  54. +0 −3,284 data/sample.4.ar.srt
  55. +0 −8,519 data/sample.5.ar.srt
  56. +26 −10 dev-requirements.txt
  57. +104 −0 docs/api.rst
  58. +5 −2 docs/{ → community}/faq.rst
  59. +43 −0 docs/community/speedup.rst
  60. +2 −2 docs/{ → community}/why_migrate.rst
  61. +15 −18 docs/conf.py
  62. +26 −9 docs/index.rst
  63. +36 −0 docs/make.bat
  64. +0 −20 docs/miscellaneous.rst
  65. +2 −2 docs/requirements.txt
  66. +3 −2 docs/{ → user}/advanced_search.rst
  67. +108 −0 docs/user/cli.rst
  68. 0 docs/{ → user}/getstarted.rst
  69. +2 −2 docs/{ → user}/handling_result.rst
  70. +46 −0 docs/user/miscellaneous.rst
  71. +4 −7 docs/{ → user}/support.rst
  72. +57 −3 setup.cfg
  73. +18 −65 setup.py
  74. +35 −21 tests/test_cli.py
  75. +34 −1 tests/test_coherence_detection.py
  76. +14 −13 tests/test_full_detection.py
  77. +12 −0 tests/test_large_payload.py
  78. +52 −0 tests/test_logging.py
  79. +1 −1 tests/test_mess_detection.py
  80. +0 −20 tests/test_normalize_fp.py
  81. +22 −19 tests/test_utils.py
8 changes: 8 additions & 0 deletions .codecov.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
coverage:
status:
project:
default:
target: 88%
threshold: null
patch: false
changes: false
2 changes: 2 additions & 0 deletions .github/FUNDING.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# These are supported funding model platforms
tidelift: pypi/charset-normalizer
4 changes: 4 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -9,3 +9,7 @@ updates:
directory: "/" # Location of package manifests
schedule:
interval: "weekly"
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "weekly"
9 changes: 5 additions & 4 deletions .github/workflows/chardet-bc.yml
Original file line number Diff line number Diff line change
@@ -13,9 +13,9 @@ jobs:
os: [ubuntu-latest]

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
@@ -25,10 +25,11 @@ jobs:
pip uninstall -y charset-normalizer
- name: Install the package
run: |
python setup.py install
python -m build
pip install ./dist/*.whl
- name: Clone the complete dataset
run: |
git clone https://github.com/Ousret/char-dataset.git
- name: BC Coverage
run: |
python ./bin/bc.py --coverage 85
python ./bin/bc.py --coverage 80
56 changes: 0 additions & 56 deletions .github/workflows/codeql-analysis.yml

This file was deleted.

54 changes: 54 additions & 0 deletions .github/workflows/codeql.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# For most projects, this workflow file will not need changing; you simply need
# to commit it to your repository.
#
# You may wish to alter this file to override the set of languages analyzed,
# or to provide custom queries or build logic.
#
# ******** NOTE ********
# We have attempted to detect the languages in your repository. Please check
# the `language` matrix defined below to confirm you have the correct set of
# supported CodeQL languages.
#
name: "CodeQL"

on:
push:
branches: [ "master", "2.1.x" ]
pull_request:
branches: [ "master", "2.1.x" ]
schedule:
- cron: '39 1 * * 6'

jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
permissions:
actions: read
contents: read
security-events: write

strategy:
fail-fast: false
matrix:
language: [ 'python' ]

steps:
- name: Checkout repository
uses: actions/checkout@v3

# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
with:
languages: ${{ matrix.language }}

# Autobuild attempts to build any compiled languages (C/C++, C#, Go, or Java).
# If this step fails, then you should remove it and run the build manually (see below)
- name: Autobuild
uses: github/codeql-action/autobuild@v2

- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2
with:
category: "/language:${{matrix.language}}"
11 changes: 6 additions & 5 deletions .github/workflows/detector-coverage.yml
Original file line number Diff line number Diff line change
@@ -13,9 +13,9 @@ jobs:
os: [ubuntu-latest]

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
@@ -25,13 +25,14 @@ jobs:
pip uninstall -y charset-normalizer
- name: Install the package
run: |
python setup.py install
python -m build
pip install ./dist/*.whl
- name: Clone the complete dataset
run: |
git clone https://github.com/Ousret/char-dataset.git
- name: Coverage WITH preemptive
run: |
python ./bin/coverage.py --coverage 90 --with-preemptive
python ./bin/coverage.py --coverage 97 --with-preemptive
- name: Coverage WITHOUT preemptive
run: |
python ./bin/coverage.py --coverage 90
python ./bin/coverage.py --coverage 95
7 changes: 4 additions & 3 deletions .github/workflows/integration.yml
Original file line number Diff line number Diff line change
@@ -13,9 +13,9 @@ jobs:
os: [ubuntu-latest]

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
@@ -28,7 +28,8 @@ jobs:
pip uninstall -y charset-normalizer
- name: Install the package
run: |
python setup.py install
python -m build
pip install ./dist/*.whl
- name: Clone the complete dataset
run: |
git clone https://github.com/Ousret/char-dataset.git
11 changes: 6 additions & 5 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
@@ -13,9 +13,9 @@ jobs:
os: [ubuntu-latest]

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
@@ -25,16 +25,17 @@ jobs:
pip uninstall -y charset-normalizer
- name: Install the package
run: |
python setup.py install
python -m build
pip install ./dist/*.whl
- name: Type checking (Mypy)
run: |
mypy charset_normalizer
mypy --strict charset_normalizer
- name: Import sorting check (isort)
run: |
isort --check charset_normalizer
- name: Code format (Black)
run: |
black --check --diff --target-version=py35 charset_normalizer
black --check --diff --target-version=py36 charset_normalizer
- name: Style guide enforcement (Flake8)
run: |
flake8 charset_normalizer
40 changes: 40 additions & 0 deletions .github/workflows/mypyc-verify.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
name: MYPYC Run

on: [push, pull_request]

jobs:
detection_coverage:
runs-on: ${{ matrix.os }}

strategy:
fail-fast: false
matrix:
python-version: [3.6, 3.7, 3.8, 3.9, "3.10"]
os: [ubuntu-latest]

steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
pip install -U pip setuptools
pip install -r dev-requirements.txt
pip uninstall -y charset-normalizer
- name: Install the package
env:
CHARSET_NORMALIZER_USE_MYPYC: '1'
run: |
python -m build --no-isolation
pip install ./dist/*.whl
- name: Clone the complete dataset
run: |
git clone https://github.com/Ousret/char-dataset.git
- name: Coverage WITH preemptive
run: |
python ./bin/coverage.py --coverage 97 --with-preemptive
- name: Coverage WITHOUT preemptive
run: |
python ./bin/coverage.py --coverage 95
7 changes: 4 additions & 3 deletions .github/workflows/performance.yml
Original file line number Diff line number Diff line change
@@ -13,9 +13,9 @@ jobs:
os: [ubuntu-latest]

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
@@ -25,7 +25,8 @@ jobs:
pip uninstall -y charset-normalizer
- name: Install the package
run: |
python setup.py install
python -m build
pip install ./dist/*.whl
- name: Clone the complete dataset
run: |
git clone https://github.com/Ousret/char-dataset.git
Loading