chore(deps): update dependency unstructured to v0.15.13 #1102
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
== 0.15.9
->==0.15.13
Release Notes
Unstructured-IO/unstructured (unstructured)
v0.15.13
Compare Source
BREAKING CHANGES
file_utils.experimental
andfile_utils.metadata
was removed. These functions were never published in the documentation, but if a client dug these out and used them this removal could break client code.Enhancements
pdfminer
image cleanup process. Optimized the removal of duplicated pdfminer images by performing the cleanup before merging elements, rather than after. This improvement reduces execution time and enhances overall processing speed of PDF documents.Features
Fixes
numpy.float32
for coordinates and remove intermediate variables to reduce memory usage when computing intersection areasarm64
image buildarm64
builds are now fixed and will be available against starting with the0.15.13
release.v0.15.12
Compare Source
Enhancements
pdfminer
element processing Implemented splitting ofpdfminer
elements (groups of text chunks) into smaller bounding boxes (text lines). This prevents loss of information from the object detection model and facilitates more effective removal of duplicatedpdfminer
text.Features
Fixes
v0.15.10
Compare Source
Enhancements
pdfminer
element cleanup Expand removal ofpdfminer
elements to include those inside allnon-pdfminer
elements, not justtables
.analysis
of thepartition_pdf
function is set toTrue
, the layout for Object Detection, Pdfminer Extraction, OCR and final layouts will be dumped as json files. The drawers now accept dict (dump) objects instead of internal classes instances.numpy
operations to compute IOU and sub-region membership instead of using simply loop. This improves the speed of deduplicating elements for pages with a lot of elements.Features
Fixes
Configuration
📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR was generated by Mend Renovate. View the repository job log.