Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(deps): update dependency unstructured to v0.15.13 #1102

Merged
merged 1 commit into from
Sep 25, 2024

Conversation

renovate[bot]
Copy link
Contributor

@renovate renovate bot commented Sep 23, 2024

This PR contains the following updates:

Package Change Age Adoption Passing Confidence
unstructured == 0.15.9 -> ==0.15.13 age adoption passing confidence

Release Notes

Unstructured-IO/unstructured (unstructured)

v0.15.13

Compare Source

BREAKING CHANGES
  • Remove dead experimental code. Unused code in file_utils.experimental and file_utils.metadata was removed. These functions were never published in the documentation, but if a client dug these out and used them this removal could break client code.
Enhancements
  • Improve pdfminer image cleanup process. Optimized the removal of duplicated pdfminer images by performing the cleanup before merging elements, rather than after. This improvement reduces execution time and enhances overall processing speed of PDF documents.
Features
Fixes
  • Fixes high memory overhead for intersection area computation Using numpy.float32 for coordinates and remove intermediate variables to reduce memory usage when computing intersection areas
  • Fixes the arm64 image build arm64 builds are now fixed and will be available against starting with the 0.15.13 release.

v0.15.12

Compare Source

Enhancements
  • Improve pdfminer element processing Implemented splitting of pdfminer elements (groups of text chunks) into smaller bounding boxes (text lines). This prevents loss of information from the object detection model and facilitates more effective removal of duplicated pdfminer text.
Features
Fixes
  • Fixed table accuracy metric Table accuracy was incorrectly using column content difference in calculating row accuracy.

v0.15.10

Compare Source

Enhancements
  • Enhance pdfminer element cleanup Expand removal of pdfminer elements to include those inside all non-pdfminer elements, not just tables.
  • Modified analysis drawing tools to dump to files and draw from dumps If the parameter analysis of the partition_pdf function is set to True, the layout for Object Detection, Pdfminer Extraction, OCR and final layouts will be dumped as json files. The drawers now accept dict (dump) objects instead of internal classes instances.
  • Vectorize pdfminer elements deduplication computation. Use numpy operations to compute IOU and sub-region membership instead of using simply loop. This improves the speed of deduplicating elements for pages with a lot of elements.
Features
Fixes

Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

@renovate renovate bot requested a review from a team as a code owner September 23, 2024 16:00
@renovate renovate bot added dependencies Pull requests that update a dependency file tech-debt Not a feature, but still necessary labels Sep 23, 2024
Copy link

netlify bot commented Sep 23, 2024

Deploy Preview for leapfrogai-docs canceled.

Name Link
🔨 Latest commit 5a18dd4
🔍 Latest deploy log https://app.netlify.com/sites/leapfrogai-docs/deploys/66f36c3e8a0a3700081bd181

| datasource | package      | from   | to      |
| ---------- | ------------ | ------ | ------- |
| pypi       | unstructured | 0.15.9 | 0.15.13 |
@renovate renovate bot force-pushed the renovate/unstructured-0.x branch from e768b1e to 5a18dd4 Compare September 25, 2024 01:49
@justinthelaw justinthelaw merged commit 590e946 into main Sep 25, 2024
24 of 26 checks passed
@justinthelaw justinthelaw deleted the renovate/unstructured-0.x branch September 25, 2024 02:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file tech-debt Not a feature, but still necessary
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant