Cloud ObjectStore #4487

VJalili · 2017-08-23T19:43:57Z

This PR is a follow up from #4352 which we had to close it due to an issue with rebase. You may check that PR for some comments and reviews.

The Cloud ObjectStore is a provider-agnostic cloud-based storage. In other words, the Cloud ObjectStore can read/write from various cloud-based storages without a provider-specific implementation; it leverages on CloudBridge which provides a unified access to various providers.

This PR sets a base for the User-based ObjectStore we have been working on. Initially, the Cloud ObjectStore was part of the User-based ObjectStore (PR #4314), and it is being submitted as a separate PR following the comments on the PR #4314.

To simplify the comparison with S3ObjectStore, the Cloud ObjectStore provides the very same functionality as S3ObjectStore, i.e., read/write to AWS S3 buckets. The only difference is: S3ObjectStore leverages on boto while Cloud uses CloudBridge.

- Added `Cloud` to objectstore import.

requirements of `cloudbridge` package.

requests via a different PR.

a generic exception--a temporary solution till CloudBridge wraps the exceptions properly.

because CloudBridge internally maps to appropriate functions depending on the input type.

Should use the wheel created at [this PR](galaxyproject/starforge#139).

…tore

VJalili · 2017-08-28T22:27:37Z

lib/galaxy/objectstore/cloud.py

+                log.debug("Cache cleaning done. Total space freed: %s", convert_bytes(deleted_amount))
+                return
+
+    def _get_bucket(self, bucket_name):


@nuwang; referring to your comment on #4352; the logic here is "get the bucket, if not available, then create it first, then return the bucket". However, the cloudbridge logic is "get the bucket, if not available, return none" (see this line). So, do you want to update cloudbridge and have this logic implemented internally?

@VJalili @afgane I think that sequence is alright. I was thinking more about the outer loop - having to loop 5 times before getting a bucket. Calling cloudbridge.create() should guarantee that the bucket is available when it returns, or it should raise an exception. Having to loop suggests that the cloudbridge code is not portable across providers (e.g. this might work in one go on openstack, and a developer would think the code is working correctly, but then things will actually fall over on AWS). We could either document this as being necessary at the interface level, or handle this internally within cloudbridge. Do you know when specifically this occurs?

I've created an issue here so we can continue the discussion: CloudVE/cloudbridge#64

@nuwang I agree. I removed the retries here, and now we rely on the first try :)

@VJalili Ok, sounds good :-) Will close the corresponding issue.

afgane · 2017-08-30T18:03:31Z

Looks good to me as an intermediate step that focuses on replacing AWS/boto code with the cloud-agnostic CloudBridge code. Much of this will get redone again with the move toward the user-based object store so even though there are a few AWS-specific terms, that's fine.

+1

This file may be loaded (e.g. for testing) but not used, so we shouldn't log a generic error about cloudbridge being unavailable, if someone attempts to actually use the object store and it isn't available then raise an informative exception.

jmchilton · 2017-09-05T13:41:12Z

I've opened a PR to rework the conditional dependencies here (VJalili#3) - see PR for more information.

I'd also like to see this line:

logging.getLogger('boto').setLevel(logging.INFO)  # Otherwise boto is quite noisy

Moved somewhere else - unconditionally applying it as a result of importing a random module in Galaxy seems wrong - but the same line appears in s3.py so I wouldn't hold up this PR on that I guess.

Rework conditional dependency handling in galaxy.objectstore.cloud.

VJalili · 2017-09-05T18:00:39Z

logging.getLogger('boto').setLevel(logging.INFO)  # Otherwise boto is quite noisy
Moved somewhere else - unconditionally applying it as a result of importing a random module in Galaxy seems wrong - but the same line appears in s3.py so I wouldn't hold up this PR on that I guess.

@jmchilton Thanks for mentioning this. Actually, this line should be removed as we're not (explicitly) using boto; I removed it.

natefoo · 2017-09-05T20:36:12Z

I've had about 2 minutes to look at this and have about 1 minute to write this comment but I wanted to get something in here just so I'll get notifications as a participant. ;)

Without having looked myself, it seems? like there's a lot of duplication from S3? Can these not be refactored into a CachingObjectStore (which I could've sworn we already had as a parent of the S3ObjectStore)?

VJalili · 2017-09-05T21:49:38Z

@natefoo you're right, the Cloud class is very close to the S3ObjectStore ObjectStore, and that is intentional :) the reason is #4314.

The Cloud module introduced in this PR, will be updated in follow-up PRs to enable a per user-based ObjectStore (i.e., plug-and-play your own resource such as Amazon S3, Microsoft Azure, Google drive, and etc.).

natefoo · 2017-09-06T00:12:44Z

Ah, so the plan is to merge them in the future.

VJalili · 2017-09-06T00:20:48Z

Actually, I would also prefer to merge the overlapping modules in favor of orthogonal functionality. But as you may read in #4314, it seems it is more "galaxy-ish" to keep S3 and Azure (even-though you get the very same functionality with Cloud) for backward compatibility. Therefore, merging cloud with s3 and azure_blob is not in the plan :)

natefoo · 2017-09-06T13:38:01Z

In that case, can we aim for refactoring the duplication into a caching object store class and have S3, Cloud, and Azure subclass from that?

dannon · 2017-09-06T13:44:53Z

@natefoo That was the plan AFAIK, we just wanted to keep the initial implementation of this simple and isolated, and make a refactoring pass after.

@VJalili Sorry for the delay getting to this, I'll test it for ya today and see how it goes.

dannon · 2017-09-11T20:11:44Z

In testing, I'm finding that the path to the cached version of a file is actually what gets uploaded as the contents of the file, instead of the contents as expected.

I set up a new IAM user, uploaded a new dataset, and observed that in my S3 bucket the contents of the uploaded file were actually something like /Users/yoplait/work/galaxy/database/object_store_cache/000/dataset_183.dat, instead of the data expected.

The cached file on disk is fine, but when I delete that cached file, the file is replaced by a plain text file containing the string as mentioned above. That's a good news / bad news thing in that the good news is that fetching the file from S3 is working, but we definitely need to fix the contents that get uploaded.

Feel free to move it back to 17.09 if you feel inclined, but because this is a fairly critical issue for this PR (and I'm not seeing this as an urgent feature for 17.09), I'm going to bump this to 18.01.

between uploading from a file vs. string).

VJalili · 2017-09-12T05:31:21Z

Interesting point, thanks @dannon
I fixed it.

It seems there was a piece of code in the original PR which was differentiating between uploading from a file and object. The same logic was moved to this PR, but it seems I removed it at this commit assuming it is handled internally by CloudBridge; maybe a misunderstanding. I reverted the commit.

dannon · 2017-12-12T01:16:08Z

Merged locally resolving conflicts. Thanks for the fixes @VJalili, I confirmed that the dataset contents bug above was fixed and stuff seemed to be working pretty well to me.

VJalili · 2017-12-12T03:24:56Z

Thanks @dannon .

VJalili and others added 20 commits July 20, 2017 22:06

Added CloudBridge to the requirements list.

0516279

Added CloudBridge to the dependencies check.

aaad001

(1) Added Cloud to ObjectStore; (2) updated CloudBridge dependencies.

55c976c

Merge remote-tracking branch 'origin/release_17.05'

4a7045c

- Updated Cloud to account for changes in CloudBridge interface.

e4c00dd

- Added `Cloud` to objectstore import.

Updated Babel and requests packages version to meet the minimum

ab34fc9

requirements of `cloudbridge` package.

Reverted the changes on package versions, as these changes are

e8541a7

requests via a different PR.

Merge remote-tracking branch 'upstream/master'

cde2ff8

Changed boto import style.

7ad7d88

Updated CloudBridge to its current latest version.

9c57adb

Commented on boto import, and removed a use_reduced_redundancy config.

2589c74

Removed boto import, and replaced all S3-specific exception catches with

89be432

a generic exception--a temporary solution till CloudBridge wraps the exceptions properly.

Consolidated two upload methods, i.e., upload from file and string,

fafb019

because CloudBridge internally maps to appropriate functions depending on the input type.

Removed a S3-related comment, and swapped an if/else condition.

fb25495

Removed a comment section.

7727c0b

Changed indentation of two lines.

37efbd0

Update CloudBridge to its current latest

9a5016f

Should use the wheel created at [this PR](galaxyproject/starforge#139).

Add Cloud configuration to object_store_conf.xml.sample

4e17c8e

Merge remote-tracking branch 'remotes/upstream/dev' into CloudObjectS…

0ec02b5

…tore

Removed white spaces.

a878436

VJalili mentioned this pull request Aug 23, 2017

Cloud ObjectStore #4352

Closed

galaxybot added the triage label Aug 23, 2017

galaxybot added this to the 17.09 milestone Aug 23, 2017

VJalili added 3 commits August 23, 2017 13:29

Updated requirements: sqlalchemy-migrate and pbr.

eaa5931

Removed a white space between parenthesis and arguments

90c444d

Surrounded CloudBridge import in a try-catch block.

f6586a8

VJalili commented Aug 28, 2017

View reviewed changes

jgoecks added area/framework kind/feature status/review labels Aug 30, 2017

jgoecks removed the triage label Aug 30, 2017

afgane assigned dannon Aug 30, 2017

VJalili and others added 3 commits September 5, 2017 10:54

Merge pull request #3 from jmchilton/CloudObjectStore

cf56d1d

Rework conditional dependency handling in galaxy.objectstore.cloud.

Updated an error message.

6d65e90

Removed boto logging level set.

cfce6c2

Removed retries for getting a bucket.

b555837

dannon modified the milestones: 18.01, 17.09 Sep 11, 2017

Resolved a bug with uploading a dataset using Cloud (not differentiating

5977056

between uploading from a file vs. string).

Removed an unused variable (i.e., mb_size) from Cloud

45ac767

VJalili mentioned this pull request Dec 7, 2017

The Roadmap #1928

Closed

dannon merged commit 45ac767 into galaxyproject:dev Dec 12, 2017

VJalili deleted the CloudObjectStore branch December 12, 2017 03:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cloud ObjectStore #4487

Cloud ObjectStore #4487

VJalili commented Aug 23, 2017 •

edited

Loading

VJalili Aug 28, 2017 •

edited

Loading

nuwang Sep 1, 2017

nuwang Sep 1, 2017

VJalili Sep 5, 2017

nuwang Sep 5, 2017

afgane commented Aug 30, 2017

jmchilton commented Sep 5, 2017

VJalili commented Sep 5, 2017

natefoo commented Sep 5, 2017

VJalili commented Sep 5, 2017 •

edited

Loading

natefoo commented Sep 6, 2017

VJalili commented Sep 6, 2017 •

edited

Loading

natefoo commented Sep 6, 2017

dannon commented Sep 6, 2017

dannon commented Sep 11, 2017 •

edited

Loading

VJalili commented Sep 12, 2017

dannon commented Dec 12, 2017

VJalili commented Dec 12, 2017

Cloud ObjectStore #4487

Cloud ObjectStore #4487

Conversation

VJalili commented Aug 23, 2017 • edited Loading

VJalili Aug 28, 2017 • edited Loading

Choose a reason for hiding this comment

nuwang Sep 1, 2017

Choose a reason for hiding this comment

nuwang Sep 1, 2017

Choose a reason for hiding this comment

VJalili Sep 5, 2017

Choose a reason for hiding this comment

nuwang Sep 5, 2017

Choose a reason for hiding this comment

afgane commented Aug 30, 2017

jmchilton commented Sep 5, 2017

VJalili commented Sep 5, 2017

natefoo commented Sep 5, 2017

VJalili commented Sep 5, 2017 • edited Loading

natefoo commented Sep 6, 2017

VJalili commented Sep 6, 2017 • edited Loading

natefoo commented Sep 6, 2017

dannon commented Sep 6, 2017

dannon commented Sep 11, 2017 • edited Loading

VJalili commented Sep 12, 2017

dannon commented Dec 12, 2017

VJalili commented Dec 12, 2017

VJalili commented Aug 23, 2017 •

edited

Loading

VJalili Aug 28, 2017 •

edited

Loading

VJalili commented Sep 5, 2017 •

edited

Loading

VJalili commented Sep 6, 2017 •

edited

Loading

dannon commented Sep 11, 2017 •

edited

Loading