Skip to content

Commit

Permalink
Merge branch 'refs/heads/main' into fix/3119-pdf-empty-table-cell
Browse files Browse the repository at this point in the history
# Conflicts:
#	CHANGELOG.md
#	unstructured/__version__.py
  • Loading branch information
christinestraub committed May 31, 2024
2 parents 76c1cb4 + 1b43102 commit ffb0dbd
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 0 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
### Fixes

* **Address the issue of unrecognized tables in `UnstructuredTableTransformerModel`** When a table is not recognized, the `element.metadata.text_as_html` attribute is set to an empty string.
* **Remove root handlers in ingest logger**. Removes root handlers in ingest loggers to ensure secrets aren't accidentally exposed in Colab notebooks.
* **Fix V2 S3 Destination Connector authentication** Fixes bugs with S3 Destination Connector where the connection config was neither registered nor properly deserialized.
* **Clarified dependence on particular version of `python-docx`** Pinned `python-docx` version to ensure a particular method `unstructured` uses is included.
* **Ingest preserves original file extension** Ingest V2 introduced a change that dropped the original extension for upgraded connectors. This reverts that change.
Expand Down
11 changes: 11 additions & 0 deletions unstructured/ingest/logger.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,15 @@ def format(self, record):
return redact_jsons(s)


def remove_root_handlers(logger: logging.Logger) -> None:
# NOTE(robinson) - in some environments such as Google Colab, there is a root handler
# that doesn't not mask secrets, meaning sensitive info such as api keys appear in logs.
# Removing these when they exist prevents this behavior
if logger.root.hasHandlers():
for handler in logger.root.handlers:
logger.root.removeHandler(handler)


def ingest_log_streaming_init(level: int) -> None:
handler = logging.StreamHandler()
handler.name = "ingest_log_handler"
Expand All @@ -104,6 +113,7 @@ def ingest_log_streaming_init(level: int) -> None:
if "ingest_log_handler" not in [h.name for h in logger.handlers]:
logger.addHandler(handler)

remove_root_handlers(logger)
logger.setLevel(level)


Expand All @@ -116,4 +126,5 @@ def make_default_logger(level: int) -> logging.Logger:
handler.setFormatter(formatter)
logger.addHandler(handler)
logger.setLevel(level)
remove_root_handlers(logger)
return logger

0 comments on commit ffb0dbd

Please sign in to comment.