Releases: Unstructured-IO/unstructured
Releases · Unstructured-IO/unstructured
0.3.4
0.3.4
- Python-3.7 compat
0.3.3
0.3.3
- Removes BasicConfig from logger configuration
- Adds the
partition_email
partitioning brick - Adds the
replace_mime_encodings
cleaning bricks - Small fix to HTML parsing related to processing list items with sub-tags
0.3.2
0.3.2
- Added
translate_text
brick for translating text between languages - Add an
apply
method to make it easier to apply cleaners to elements
0.3.1
0.3.1
- Added __init.py__ to
partition
0.3.0
0.3.0
- Implement staging brick for Argilla. Converts lists of
Text
elements toargilla
dataset classes. - Removing the local PDF parsing code and any dependencies and tests.
- Reorganizes the staging bricks in the unstructured.partition module
- Allow entities to be passed into the Datasaur staging brick
- Added HTML escapes to the
replace_unicode_quotes
brick - Fix bad responses in partition_pdf to raise ValueError
- Adds
partition_html
for partitioning HTML documents.
0.2.4
- Add an alternative way of importing
Final
to support google colab
0.2.3
0.2.3
- Add cleaning bricks for removing prefixes and postfixes
- Add cleaning bricks for extracting text before and after a pattern
0.2.2
0.2.2
- Add staging brick for Datasaur
0.2.1
0.2.1
- Added brick to convert an ISD dictionary to a list of elements
- Update
PDFDocument
to use thefrom_file
method - Added staging brick for CSV format for ISD (Initial Structured Data) format.
- Added staging brick for separating text into attention window size chunks for
transformers
. - Added staging brick for LabelBox.
- Added ability to upload LabelStudio predictions
- Added utility function for JSONL reading and writing
- Added staging brick for CSV format for Prodigy
- Added staging brick for Prodigy
- Added ability to upload LabelStudio annotations
- Added text_field and id_field to stage_for_label_studio signature