-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add delay to retrieval #18
Conversation
observingClouds
commented
Dec 8, 2022
- allow queue to be filled before being processed
- fixes Retrieval starts before queue is filled with all requests #10
- allow queue to be filled before being processed
e34ac53
to
7dad56f
Compare
slkspec/core.py
Outdated
@@ -152,6 +155,7 @@ def _retrieve_items(self, retrieve_files: list[tuple[str, str]]) -> None: | |||
(output_dir / out_file.name).chmod(self.file_permissions) | |||
|
|||
def _cache_files(self) -> None: | |||
time.sleep(self.sleep) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind, quickly explaining why you think this is necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what is going on is that the init
method of slkspec is called (e.g. by xarray) several times for one bulk request. This has two implications:
- because the queue is created at module import and the init methods just have a copy of this queue, the different calls use the same queue (which is great)
- however this means that without any delay each call is processing potentially only a part of this queue, whatever is currently in the queue. The delay makes sure that all processes had the chance to add their part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, this doesn't solve this issue. See my comment in the original issue #10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to check your comment. For now I can only say that this PR solves the issues. I need to check if this is also true for the .zmetadata
(which is probably problematic as we need the metadata to know the file structure), but certainly for all following requests.
I think the solution to this problem is saving the zarr data in a zip storage object -> #10 (comment) |
@@ -152,6 +155,7 @@ def _retrieve_items(self, retrieve_files: list[tuple[str, str]]) -> None: | |||
(output_dir / out_file.name).chmod(self.file_permissions) | |||
|
|||
def _cache_files(self) -> None: | |||
time.sleep(self.delay) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is where potential race conditions might happen - try moving the sleep method into the thread lock block. But that might end you up with the same issue as before.
@@ -92,6 +93,7 @@ def __init__( | |||
mode: str = "rb", | |||
touch: bool = True, | |||
file_permissions: int = 0o3777, | |||
delay: int = 2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to make this configurable shouldn't you also add this option into the SLKFileSystem
class?