Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit processing of queued files to 500 per job #8

Open
observingClouds opened this issue Dec 7, 2022 · 7 comments
Open

Limit processing of queued files to 500 per job #8

observingClouds opened this issue Dec 7, 2022 · 7 comments
Labels

Comments

@observingClouds
Copy link
Owner

observingClouds commented Dec 7, 2022

On the compute, shared and interactive partitions, slk retrieve is allowed to retrieve 500 files at once. Thus, if more than 500 files are requested here, it should be split into several retrievals. @antarcticrainforest Or would you split up the file list to parts <501 anyway before calling this function? I'll try to implement this feature ("group_files_by_tape") to the slk_helpers as soon as possible. But currently, I am mainly bound to slk testing and user support. So, let's see ;-) .

Originally posted by @neumannd in #3 (comment)

@observingClouds
Copy link
Owner Author

On the login nodes we allow slk retrieve to retrieve one file with one call of slk retrieve. There is a StrongLink config file in /etc/stronglink.conf which is JSON and contains an attribute "retrieve_file_limit":1 (on login nodes) or "retrieve_file_limit":500 (on other nodes). This file could be imported somewhere to find out how many files are allowed to be retrieved. Maybe, this number is changed in future if needed.

_Originally posted by @neumannd in #3 (comment)

@florianziemen
Copy link

I think in the case of a retrieve for more than 500 files (or maybe rather 10/... tapes) we should assume the user to have done a mistake, cancel the whole thing and throw an error. Otherwise we run into the problem that users might accidentally trigger loading half the HSM into the cache...

@observingClouds
Copy link
Owner Author

I see that 500 files can be a massive request and a mistake. I would argue that this limitation should however be done at the lowest level, like slk or at least pyslk. This would ensure that the behaviour is the same across all access methods and that slkspec remains more general and could also be used for a tape archive at a different institution who may have different resources. Instead of using a number of files as limit, one could also think of restricting a retrieval by size.

@florianziemen
Copy link

yeah, just saying that we should not try to bypass such limitations, b/c they are there for a reason.

@observingClouds
Copy link
Owner Author

observingClouds commented Feb 9, 2023

I see where you are coming from. Retrievals are now for the most part combined into a single slk retrieve call. If slk has limitations in place, these will affect also slkspec retrievals.

@neumannd
Copy link
Collaborator

neumannd commented Feb 9, 2023

yeah, just saying that we should not try to bypass such limitations, b/c they are there for a reason.

Yes. We feared slk retrieve -R /arch . ;-)

It would be the savest to read out the retrieve_file_limit from this /etc/stronglink.conf.

@neumannd
Copy link
Collaborator

@observingClouds It would be reasonable to read /etc/stronglink.conf (example content):

{"host":"archive.dkrz.de","domain":"ldap","logSize":"10MB","retrieve_file_limit":500}

Then extract the value of retrieve_file_limit. Currently, it is 1 on levante login nodes and 500 on levante compute/interactive/shared nodes. This limit might be changed in future or on individual nodes (e.g. a "mass-data-retrieval-node" where it is set to 5000).

slk_conf_global="/etc/stronglink.conf"
# `-1` == no limit
retrieve_file_limit = -1
if os.path.exists(slk_conf_global):
  with open(slk_conf_global, 'r') as f:
    data = json.load(f)
  retrieve_file_limit = data.get("retrieve_file_limit", -1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants