Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/main' into feat/default-to-not-o…
Browse files Browse the repository at this point in the history
…utput-table-cell-structure
  • Loading branch information
badGarnet committed May 23, 2024
2 parents dcd7103 + 35ec21e commit dc92295
Show file tree
Hide file tree
Showing 5 changed files with 391 additions and 5 deletions.
6 changes: 4 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## 0.14.3-dev3
## 0.14.3-dev4

### Enhancements

Expand All @@ -11,7 +11,9 @@

* **Turn off XML resolve entities** Sets `resolve_entities=False` for XML parsing with `lxml`
to avoid text being dynamically injected into the XML document.
* Add the missing `form_extraction_skip_tables` argument to the `partition_pdf_or_image` call.
* **Add backward compatibility for the deprecated pdf_infer_table_structure parameter**.
* **Add the missing `form_extraction_skip_tables` argument to the `partition_pdf_or_image` call**.
to avoid text being dynamically injected into the XML document.
* **Chromadb change from Add to Upsert using element_id to make idempotent**
* **Diable `table_as_cells` output by default** to reduce overhead in partition; now `table_as_cells` is only produced when the env `EXTACT_TABLE_AS_CELLS` is `true`

Expand Down
2 changes: 1 addition & 1 deletion test_unstructured/partition/test_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -350,7 +350,7 @@ def test_auto_partition_pdf_uses_table_extraction():
"unstructured.partition.pdf_image.ocr.process_file_with_ocr",
) as mock_process_file_with_model:
partition(filename, pdf_infer_table_structure=True, strategy=PartitionStrategy.HI_RES)
assert mock_process_file_with_model.call_args[1]["infer_table_structure"] is False
assert mock_process_file_with_model.call_args[1]["infer_table_structure"]


def test_auto_partition_pdf_with_fast_strategy(monkeypatch):
Expand Down
Loading

0 comments on commit dc92295

Please sign in to comment.