Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update galaxy_jwd script to skip if the backend is not of type disk and if backend is removed from the object_store_conf file #1394

Merged

Conversation

sanjaysrikakulam
Copy link
Member

@sanjaysrikakulam sanjaysrikakulam commented Feb 5, 2025

When we run the galaxy_jwd.py script to clean up, we get the error below from some jobs because the script cannot handle the user objects and S3 storage types.

This PR skips the job id if:

  1. The backend is either commented out (disabled) or removed from the object_store_conf.xml
  2. If the backend is not of type disk (this is also for the S3 and other non disk type backends)
Traceback (most recent call last):
  File "/usr/local/bin/galaxy_jwd", line 505, in <module>
    main()
  File "/usr/local/bin/galaxy_jwd", line 226, in main
    jwd_path = decode_path(job_id, metadata, backends)
  File "/usr/local/bin/galaxy_jwd", line 360, in decode_path
    f"Object store id '{metadata[0]}' does not exist in the "
ValueError: Object store id 'user_objects://1a9e9db0-b787-4344-842e-93b071181cb0' does not exist in the object_store_conf.xml file.

I also tested the changes with a dry run, which skips them correctly and lists the ones to be deleted.

@bgruening
Copy link
Member

I'm not sure I understand why we should not find a jwd if we use an object store like S3.

@sanjaysrikakulam
Copy link
Member Author

sanjaysrikakulam commented Feb 5, 2025

I'm not sure I understand why we should not find a jwd if we use an object store like S3.

The user object stores (S3 or anything else) will use the object store cache directory as their JWDs (AFAIR; for the cache dir, we probably want to use a "watchdog" to remove dirs). The non-disk type backends that we define in our object_store_conf.xml, however, use the JWDs we define in extra_dir. We can still handle such a problem by changing the conditionals (in this PR) to check if the extra_dir and path exist (essentially just removing the extra disk based conditional added in this PR). Even if it's a non-disk type, our cleanup can continue.

@bgruening
Copy link
Member

(AFAIR; for the cache dir, we probably want to use a "watchdog" to remove dirs

Yes! :) We should maybe check if celery is now cleaning this already. I remember discussing this at some point.

https://github.com/galaxyproject/galaxy/blob/39e38c92accbdb29aaf670c1649740a250c965b6/lib/galaxy/config/sample/object_store_conf.xml.sample#L230

S3 has also an extra dir and I would argue we need to ensure that S3 backends needs to use our normal JWDs. There should be no differences on the job level. Independent from where the data is coming.

Is this script also used for the CLI tool change_to_jwd? I would still like to use this CLI to jump into my JWD.

user object stores will anyway get filtered out when checking whether the object_store_id exists in the object_store_conf.xml. So, the error would not appear.
@sanjaysrikakulam
Copy link
Member Author

sanjaysrikakulam commented Feb 6, 2025

(AFAIR; for the cache dir, we probably want to use a "watchdog" to remove dirs

Yes! :) We should maybe check if celery is now cleaning this already. I remember discussing this at some point.

https://github.com/galaxyproject/galaxy/blob/39e38c92accbdb29aaf670c1649740a250c965b6/lib/galaxy/config/sample/object_store_conf.xml.sample#L230

S3 has also an extra dir and I would argue we need to ensure that S3 backends needs to use our normal JWDs. There should be no differences on the job level. Independent from where the data is coming.

Is this script also used for the CLI tool change_to_jwd? I would still like to use this CLI to jump into my JWD.

I have removed the "skipping" based on the disk type. The skip will happen only when the object_store_id of a job does not exist in our object_store_conf.xml. This will apply for object stores that were either removed or commented out (the Python XML parser ignores the lines that are commented out, so the backends will not get extracted/registered when the script is running) or if they were user object stores.

Yes, the script is also used for the CLI tool.

@bgruening
Copy link
Member

@sanjaysrikakulam can you explain that tomorrow please? I missing something I guess.

@sanjaysrikakulam sanjaysrikakulam merged commit 8134e62 into usegalaxy-eu:master Feb 11, 2025
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants