-
Notifications
You must be signed in to change notification settings - Fork 836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add GLOBAL_WORKING_DIR and GLOBAL_WORKING_PROCESS_DIR config parameteres #3014
feat: add GLOBAL_WORKING_DIR and GLOBAL_WORKING_PROCESS_DIR config parameteres #3014
Conversation
|
||
@property | ||
def STORAGE_DIR(self) -> str: | ||
"""Path to Unstructured storage directory.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
STORAGE_DIR is a misleading name, which has permanent or at least caching connotations. could this instead be TMP_STORAGE_DIR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are parameters STORAGE_DIR
and STORAGE_TMPDIR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe UNSTRUCTURED_DIR
and UNSTRUCTURED_TMPDIR
as those by default point to ~/.cache/unstructured
and ~/.cache/unstructured/tmp/{gid}
respectively
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or UNSTRUCTURED_CACHE_DIR
and UNSTRUCTURED_TMP_DIR
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made the changes, waiting for greenlight @cragwolfe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just env names to be corrected.
…-usage-of-temporary-storage
@@ -160,7 +160,6 @@ def _try_process_document(self, doc: Path) -> Optional[list]: | |||
@abstractmethod | |||
def _process_document(self, doc: Path) -> list: | |||
"""Should return all metadata and metrics for a single document.""" | |||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was removed by the linter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a docstring is added, the pass
keyword is optional for functions.
…-usage-of-temporary-storage
@@ -161,7 +163,12 @@ def test_save_elements_with_output_dir_path_none(): | |||
) | |||
|
|||
# Verify that the images are saved in the expected directory | |||
expected_output_dir = os.path.join(tmpdir, "figures") | |||
if storage_enabled: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
personally I'd prefer to see more usages of pathlib, but I believe it's not the mail goal of this PR, just a side note
This PR introduces GLOBAL_WORKING_DIR and GLOBAL_WORKING_PROCESS_DIR controlling where temporary files are stored during partition flow, via tempfile.tempdir.
Edit:
Renamed prefixes from STORAGE_ to UNSTRUCTURED_CACHE_
Edit 2:
Renamed prefixes from UNSTRUCTURED_CACHE to GLOBAL_WORKING_DIR_