-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cloud ObjectStore #4487
Cloud ObjectStore #4487
Conversation
- Added `Cloud` to objectstore import.
requirements of `cloudbridge` package.
requests via a different PR.
a generic exception--a temporary solution till CloudBridge wraps the exceptions properly.
because CloudBridge internally maps to appropriate functions depending on the input type.
Should use the wheel created at [this PR](galaxyproject/starforge#139).
log.debug("Cache cleaning done. Total space freed: %s", convert_bytes(deleted_amount)) | ||
return | ||
|
||
def _get_bucket(self, bucket_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nuwang; referring to your comment on #4352; the logic here is "get the bucket, if not available, then create it first, then return the bucket". However, the cloudbridge logic is "get the bucket, if not available, return none" (see this line). So, do you want to update cloudbridge and have this logic implemented internally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VJalili @afgane I think that sequence is alright. I was thinking more about the outer loop - having to loop 5 times before getting a bucket. Calling cloudbridge.create() should guarantee that the bucket is available when it returns, or it should raise an exception. Having to loop suggests that the cloudbridge code is not portable across providers (e.g. this might work in one go on openstack, and a developer would think the code is working correctly, but then things will actually fall over on AWS). We could either document this as being necessary at the interface level, or handle this internally within cloudbridge. Do you know when specifically this occurs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created an issue here so we can continue the discussion: CloudVE/cloudbridge#64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nuwang I agree. I removed the retries here, and now we rely on the first try :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VJalili Ok, sounds good :-) Will close the corresponding issue.
Looks good to me as an intermediate step that focuses on replacing AWS/ +1 |
This file may be loaded (e.g. for testing) but not used, so we shouldn't log a generic error about cloudbridge being unavailable, if someone attempts to actually use the object store and it isn't available then raise an informative exception.
I've opened a PR to rework the conditional dependencies here (VJalili#3) - see PR for more information. I'd also like to see this line:
Moved somewhere else - unconditionally applying it as a result of importing a random module in Galaxy seems wrong - but the same line appears in s3.py so I wouldn't hold up this PR on that I guess. |
Rework conditional dependency handling in galaxy.objectstore.cloud.
@jmchilton Thanks for mentioning this. Actually, this line should be removed as we're not (explicitly) using |
I've had about 2 minutes to look at this and have about 1 minute to write this comment but I wanted to get something in here just so I'll get notifications as a participant. ;) Without having looked myself, it seems? like there's a lot of duplication from S3? Can these not be refactored into a CachingObjectStore (which I could've sworn we already had as a parent of the S3ObjectStore)? |
@natefoo you're right, the The |
Ah, so the plan is to merge them in the future. |
Actually, I would also prefer to merge the overlapping modules in favor of orthogonal functionality. But as you may read in #4314, it seems it is more "galaxy-ish" to keep S3 and Azure (even-though you get the very same functionality with |
In that case, can we aim for refactoring the duplication into a caching object store class and have S3, Cloud, and Azure subclass from that? |
In testing, I'm finding that the path to the cached version of a file is actually what gets uploaded as the contents of the file, instead of the contents as expected. I set up a new IAM user, uploaded a new dataset, and observed that in my S3 bucket the contents of the uploaded file were actually something like The cached file on disk is fine, but when I delete that cached file, the file is replaced by a plain text file containing the string as mentioned above. That's a good news / bad news thing in that the good news is that fetching the file from S3 is working, but we definitely need to fix the contents that get uploaded. Feel free to move it back to 17.09 if you feel inclined, but because this is a fairly critical issue for this PR (and I'm not seeing this as an urgent feature for 17.09), I'm going to bump this to 18.01. |
between uploading from a file vs. string).
Interesting point, thanks @dannon It seems there was a piece of code in the original PR which was differentiating between uploading from a file and object. The same logic was moved to this PR, but it seems I removed it at this commit assuming it is handled internally by CloudBridge; maybe a misunderstanding. I reverted the commit. |
Merged locally resolving conflicts. Thanks for the fixes @VJalili, I confirmed that the dataset contents bug above was fixed and stuff seemed to be working pretty well to me. |
Thanks @dannon . |
This PR is a follow up from #4352 which we had to close it due to an issue with rebase. You may check that PR for some comments and reviews.
The
Cloud
ObjectStore is a provider-agnostic cloud-based storage. In other words, theCloud
ObjectStore can read/write from various cloud-based storages without a provider-specific implementation; it leverages onCloudBridge
which provides a unified access to various providers.This PR sets a base for the User-based ObjectStore we have been working on. Initially, the
Cloud
ObjectStore was part of the User-based ObjectStore (PR #4314), and it is being submitted as a separate PR following the comments on the PR #4314.To simplify the comparison with
S3ObjectStore
, theCloud
ObjectStore provides the very same functionality asS3ObjectStore
, i.e., read/write to AWS S3 buckets. The only difference is:S3ObjectStore
leverages onboto
whileCloud
usesCloudBridge
.