Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build(deps): replace pillow-heif with pi-heif #3571

Merged
merged 3 commits into from
Aug 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## 0.15.8-dev5
## 0.15.8

### Enhancements

Expand All @@ -8,6 +8,7 @@

### Fixes

* **Replace `pillow-heif` with `pi-heif`**. Replaces `pillow-heif` with `pi-heif` due to more permissive licensing on the wheel for `pi-heif`.
* **Minify text_as_html from DOCX.** Previously `.metadata.text_as_html` for DOCX tables was "bloated" with whitespace and noise elements introduced by `tabulate` that produced over-chunking and lower "semantic density" of elements. Reduce HTML to minimum character count without preserving all text.
* **Fall back to filename extension-based file-type detection for unidentified OLE files.** Resolves a problem where a DOC file that could not be detected as such by `filetype` was incorrectly identified as a MSG file.

Expand Down
2 changes: 1 addition & 1 deletion requirements/base.txt
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ h11==0.14.0
# via httpcore
httpcore==1.0.5
# via httpx
httpx==0.27.0
httpx==0.27.2
# via unstructured-client
idna==3.8
# via
Expand Down
2 changes: 1 addition & 1 deletion requirements/dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -354,7 +354,7 @@ wheel==0.44.0
# pip-tools
widgetsnbextension==4.0.13
# via ipywidgets
zipp==3.20.0
zipp==3.20.1
# via importlib-metadata

# The following packages are considered to be unsafe in a requirements file:
Expand Down
2 changes: 1 addition & 1 deletion requirements/extra-markdown.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,5 @@ importlib-metadata==8.4.0
# via markdown
markdown==3.7
# via -r ./extra-markdown.in
zipp==3.20.0
zipp==3.20.1
# via importlib-metadata
4 changes: 2 additions & 2 deletions requirements/extra-paddleocr.txt
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ httpcore==1.0.5
# via
# -c ./base.txt
# httpx
httpx==0.27.0
httpx==0.27.2
# via
# -c ./base.txt
# paddlepaddle
Expand Down Expand Up @@ -176,5 +176,5 @@ urllib3==1.26.19
# -c ././deps/constraints.txt
# -c ./base.txt
# requests
zipp==3.20.0
zipp==3.20.1
# via importlib-resources
2 changes: 1 addition & 1 deletion requirements/extra-pdf-image.in
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ onnx
pdf2image
pdfminer.six
pikepdf
pillow_heif
pi_heif
pypdf
google-cloud-vision
effdet
Expand Down
10 changes: 5 additions & 5 deletions requirements/extra-pdf-image.txt
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ google-auth==2.34.0
# google-cloud-vision
google-cloud-vision==3.7.4
# via -r ./extra-pdf-image.in
googleapis-common-protos==1.63.2
googleapis-common-protos==1.64.0
# via
# google-api-core
# grpcio-status
Expand Down Expand Up @@ -147,6 +147,8 @@ pdfminer-six==20231228
# pdfplumber
pdfplumber==0.11.4
# via layoutparser
pi-heif==0.18.0
# via -r ./extra-pdf-image.in
pikepdf==9.2.0
# via -r ./extra-pdf-image.in
pillow==10.4.0
Expand All @@ -155,12 +157,10 @@ pillow==10.4.0
# matplotlib
# pdf2image
# pdfplumber
# pi-heif
# pikepdf
# pillow-heif
# torchvision
# unstructured-pytesseract
pillow-heif==0.18.0
# via -r ./extra-pdf-image.in
portalocker==2.10.1
# via iopath
proto-plus==1.24.0
Expand Down Expand Up @@ -293,5 +293,5 @@ wrapt==1.16.0
# -c ././deps/constraints.txt
# -c ./base.txt
# deprecated
zipp==3.20.0
zipp==3.20.1
# via importlib-resources
2 changes: 1 addition & 1 deletion requirements/ingest/astradb.txt
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ httpcore==1.0.5
# via
# -c ./ingest/../base.txt
# httpx
httpx[http2]==0.27.0
httpx[http2]==0.27.2
# via
# -c ./ingest/../base.txt
# astrapy
Expand Down
4 changes: 2 additions & 2 deletions requirements/ingest/chroma.txt
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ fsspec==2024.6.1
# via huggingface-hub
google-auth==2.34.0
# via kubernetes
googleapis-common-protos==1.63.2
googleapis-common-protos==1.64.0
# via opentelemetry-exporter-otlp-proto-grpc
grpcio==1.66.0
# via
Expand Down Expand Up @@ -245,7 +245,7 @@ wrapt==1.16.0
# -c ./ingest/../deps/constraints.txt
# deprecated
# opentelemetry-instrumentation
zipp==3.20.0
zipp==3.20.1
# via
# importlib-metadata
# importlib-resources
Expand Down
4 changes: 2 additions & 2 deletions requirements/ingest/clarifai.txt
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ clarifai-grpc==10.7.1
# via clarifai
contextlib2==21.6.0
# via schema
googleapis-common-protos==1.63.2
googleapis-common-protos==1.64.0
# via clarifai-grpc
grpcio==1.66.0
# via
Expand Down Expand Up @@ -61,7 +61,7 @@ requests==2.32.3
# via
# -c ./ingest/../base.txt
# clarifai-grpc
rich==13.7.1
rich==13.8.0
# via clarifai
schema==0.7.5
# via clarifai
Expand Down
2 changes: 1 addition & 1 deletion requirements/ingest/databricks-volumes.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ charset-normalizer==3.3.2
# via
# -c ./ingest/../base.txt
# requests
databricks-sdk==0.30.0
databricks-sdk==0.31.0
# via -r ./ingest/databricks-volumes.in
google-auth==2.34.0
# via databricks-sdk
Expand Down
2 changes: 1 addition & 1 deletion requirements/ingest/embed-aws-bedrock.txt
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ httpcore==1.0.5
# via
# -c ./ingest/../base.txt
# httpx
httpx==0.27.0
httpx==0.27.2
# via
# -c ./ingest/../base.txt
# langsmith
Expand Down
2 changes: 1 addition & 1 deletion requirements/ingest/embed-huggingface.txt
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ httpcore==1.0.5
# via
# -c ./ingest/../base.txt
# httpx
httpx==0.27.0
httpx==0.27.2
# via
# -c ./ingest/../base.txt
# langsmith
Expand Down
2 changes: 1 addition & 1 deletion requirements/ingest/embed-octoai.txt
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ httpcore==1.0.5
# via
# -c ./ingest/../base.txt
# httpx
httpx==0.27.0
httpx==0.27.2
# via
# -c ./ingest/../base.txt
# openai
Expand Down
2 changes: 1 addition & 1 deletion requirements/ingest/embed-openai.txt
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ httpcore==1.0.5
# via
# -c ./ingest/../base.txt
# httpx
httpx==0.27.0
httpx==0.27.2
# via
# -c ./ingest/../base.txt
# langsmith
Expand Down
4 changes: 2 additions & 2 deletions requirements/ingest/embed-vertexai.txt
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ google-resumable-media==2.7.2
# via
# google-cloud-bigquery
# google-cloud-storage
googleapis-common-protos[grpc]==1.63.2
googleapis-common-protos[grpc]==1.64.0
# via
# google-api-core
# grpc-google-iam-v1
Expand All @@ -112,7 +112,7 @@ httpcore==1.0.5
# via
# -c ./ingest/../base.txt
# httpx
httpx==0.27.0
httpx==0.27.2
# via
# -c ./ingest/../base.txt
# langchain-google-vertexai
Expand Down
2 changes: 1 addition & 1 deletion requirements/ingest/embed-voyageai.txt
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ httpcore==1.0.5
# via
# -c ./ingest/../base.txt
# httpx
httpx==0.27.0
httpx==0.27.2
# via
# -c ./ingest/../base.txt
# langsmith
Expand Down
2 changes: 1 addition & 1 deletion requirements/ingest/gcs.txt
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ google-crc32c==1.5.0
# google-resumable-media
google-resumable-media==2.7.2
# via google-cloud-storage
googleapis-common-protos==1.63.2
googleapis-common-protos==1.64.0
# via google-api-core
idna==3.8
# via
Expand Down
2 changes: 1 addition & 1 deletion requirements/ingest/google-drive.txt
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ google-auth==2.34.0
# google-auth-httplib2
google-auth-httplib2==0.2.0
# via google-api-python-client
googleapis-common-protos==1.63.2
googleapis-common-protos==1.64.0
# via google-api-core
httplib2==0.22.0
# via
Expand Down
2 changes: 1 addition & 1 deletion requirements/ingest/notion.txt
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ httpcore==1.0.5
# via
# -c ./ingest/../base.txt
# httpx
httpx==0.27.0
httpx==0.27.2
# via
# -c ./ingest/../base.txt
# notion-client
Expand Down
2 changes: 1 addition & 1 deletion requirements/ingest/qdrant.txt
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ httpcore==1.0.5
# via
# -c ./ingest/../base.txt
# httpx
httpx[http2]==0.27.0
httpx[http2]==0.27.2
# via
# -c ./ingest/../base.txt
# qdrant-client
Expand Down
2 changes: 1 addition & 1 deletion requirements/ingest/singlestore.txt
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ wheel==0.44.0
# via
# -c ./ingest/../deps/constraints.txt
# singlestoredb
zipp==3.20.0
zipp==3.20.1
# via importlib-metadata

# The following packages are considered to be unsafe in a requirements file:
Expand Down
61 changes: 1 addition & 60 deletions requirements/ingest/weaviate.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,12 @@
#
# pip-compile ./ingest/weaviate.in
#
annotated-types==0.7.0
# via pydantic
anyio==4.4.0
# via
# -c ./ingest/../base.txt
# httpx
authlib==1.3.2
# via weaviate-client
certifi==2024.7.4
# via
# -c ./ingest/../base.txt
# -c ./ingest/../deps/constraints.txt
# httpcore
# httpx
# requests
cffi==1.17.0
# via cryptography
Expand All @@ -27,75 +19,24 @@ charset-normalizer==3.3.2
# requests
cryptography==43.0.0
# via authlib
exceptiongroup==1.2.2
# via
# -c ./ingest/../base.txt
# anyio
grpcio==1.66.0
# via
# -c ./ingest/../deps/constraints.txt
# grpcio-health-checking
# grpcio-tools
# weaviate-client
grpcio-health-checking==1.62.3
# via weaviate-client
grpcio-tools==1.62.3
# via weaviate-client
h11==0.14.0
# via
# -c ./ingest/../base.txt
# httpcore
httpcore==1.0.5
# via
# -c ./ingest/../base.txt
# httpx
httpx==0.27.0
# via
# -c ./ingest/../base.txt
# weaviate-client
idna==3.8
# via
# -c ./ingest/../base.txt
# anyio
# httpx
# requests
protobuf==4.23.4
# via
# -c ./ingest/../deps/constraints.txt
# grpcio-health-checking
# grpcio-tools
pycparser==2.22
# via cffi
pydantic==2.8.2
# via weaviate-client
pydantic-core==2.20.1
# via pydantic
requests==2.32.3
# via
# -c ./ingest/../base.txt
# weaviate-client
sniffio==1.3.1
# via
# -c ./ingest/../base.txt
# anyio
# httpx
typing-extensions==4.12.2
# via
# -c ./ingest/../base.txt
# anyio
# pydantic
# pydantic-core
urllib3==1.26.19
# via
# -c ./ingest/../base.txt
# -c ./ingest/../deps/constraints.txt
# requests
validators==0.33.0
# via weaviate-client
weaviate-client==4.7.1
weaviate-client==3.26.7
# via
# -c ./ingest/../deps/constraints.txt
# -r ./ingest/weaviate.in

# The following packages are considered to be unsafe in a requirements file:
# setuptools
2 changes: 1 addition & 1 deletion requirements/test.txt
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ httpcore==1.0.5
# via
# -c ./base.txt
# httpx
httpx==0.27.0
httpx==0.27.2
# via
# -c ./base.txt
# label-studio-sdk
Expand Down
2 changes: 1 addition & 1 deletion unstructured/__version__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "0.15.8-dev5" # pragma: no cover
__version__ = "0.15.8" # pragma: no cover
2 changes: 1 addition & 1 deletion unstructured/partition/pdf.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@
from pdfminer.layout import LTChar, LTContainer, LTImage, LTItem, LTTextBox
from pdfminer.pdftypes import PDFObjRef
from pdfminer.utils import open_filename
from pi_heif import register_heif_opener
from PIL import Image as PILImage
from pillow_heif import register_heif_opener
from pypdf import PdfReader

from unstructured.chunking import add_chunking_strategy
Expand Down
Loading