You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Enhancements
Refactor code related to embedded text extraction. The embedded text extraction code is moved from unstructured-inference to unstructured.
Features
Large improvements to the ingest process:
Support for multiprocessing and async, with limits for both.
Streamlined to process when mapping CLI invocations to the underlying code
More granular steps introduced to give better control over process (i.e. dedicated step to uncompress files already in the local filesystem, new optional staging step before upload)
Use the python client when calling the unstructured api for partitioning or chunking
Saving the final content is now a dedicated destination connector (local) set as the default if none are provided. Avoids adding new files locally if uploading elsewhere.
Leverage last modified date when deciding if new files should be downloaded and reprocessed.
Add attribution to the pinecone connector
Add support for Python 3.12. unstructured now works with Python 3.12!