Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add delay to retrieval #18

Merged
merged 7 commits into from
Feb 5, 2023
Merged

add delay to retrieval #18

merged 7 commits into from
Feb 5, 2023

Conversation

observingClouds
Copy link
Owner

- allow queue to be filled before being processed
slkspec/core.py Outdated
@@ -152,6 +155,7 @@ def _retrieve_items(self, retrieve_files: list[tuple[str, str]]) -> None:
(output_dir / out_file.name).chmod(self.file_permissions)

def _cache_files(self) -> None:
time.sleep(self.sleep)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind, quickly explaining why you think this is necessary?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what is going on is that the init method of slkspec is called (e.g. by xarray) several times for one bulk request. This has two implications:

  • because the queue is created at module import and the init methods just have a copy of this queue, the different calls use the same queue (which is great)
  • however this means that without any delay each call is processing potentially only a part of this queue, whatever is currently in the queue. The delay makes sure that all processes had the chance to add their part.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this doesn't solve this issue. See my comment in the original issue #10

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to check your comment. For now I can only say that this PR solves the issues. I need to check if this is also true for the .zmetadata (which is probably problematic as we need the metadata to know the file structure), but certainly for all following requests.

@antarcticrainforest
Copy link
Contributor

I think the solution to this problem is saving the zarr data in a zip storage object -> #10 (comment)

@@ -152,6 +155,7 @@ def _retrieve_items(self, retrieve_files: list[tuple[str, str]]) -> None:
(output_dir / out_file.name).chmod(self.file_permissions)

def _cache_files(self) -> None:
time.sleep(self.delay)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is where potential race conditions might happen - try moving the sleep method into the thread lock block. But that might end you up with the same issue as before.

@@ -92,6 +93,7 @@ def __init__(
mode: str = "rb",
touch: bool = True,
file_permissions: int = 0o3777,
delay: int = 2,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to make this configurable shouldn't you also add this option into the SLKFileSystem class?

@observingClouds observingClouds merged commit be911f3 into main Feb 5, 2023
@observingClouds observingClouds deleted the delay branch February 5, 2023 07:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Retrieval starts before queue is filled with all requests
2 participants