-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cloud ObjectStore #4352
Cloud ObjectStore #4352
Conversation
lib/galaxy/objectstore/cloud.py
Outdated
import boto | ||
|
||
from boto.exception import S3ResponseError | ||
from boto.s3.key import Key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The linter doesn't like unused imports (this line and the following), but you can ignore this by doing
from boto.s3.key import Key # noqa: F401
from boto.s3.connection import S3Connection # noqa: F401
Although I'm not sure in which scenario it would help to import unused dependencies ...
If external code needs it, it should probably import the dependencies where they are needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mvdbeek Thanks, the imports are updated accordingly.
Actually, boto
import is temporary solution for handling AWS S3 exceptions until cloudbridge
implements such exceptions for all the providers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good. Clearly, this is still AWS-only even if it implements the CloudBridge interface but I guess that's just the first step.
There are a few comment lines and debug print statements that were also in the previous implementation. I think those can be removed now. Finally, I made a few comments inline, some of which repeat themselves in multiple places but the comment should apply to whole the file.
@@ -13,6 +13,7 @@ graphitesend | |||
azure-storage==0.32.0 | |||
# PyRods not in PyPI | |||
python-ldap==2.4.27 | |||
cloudbridge==0.3.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to go with the latest current release (0.3.2)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No specific reason, I updated to the current latest version (also see starforge PR #138).
lib/galaxy/objectstore/cloud.py
Outdated
from ..objectstore import convert_bytes, ObjectStore | ||
from cloudbridge.cloud.factory import CloudProviderFactory, ProviderList | ||
|
||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably, this is for backward compatibility? Either way, it'd be good to have a comment why is this necessary in addition to the CloudBridge import.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boto
was only used to handle some exceptions. However, I replaced all boto
-specific exceptions with a generic exception. This is totally a temporary patch till exceptions are properly wrapped and thrown by CloudBridge
.
lib/galaxy/objectstore/cloud.py
Outdated
bucket = self.conn.object_store.create(bucket_name) | ||
log.debug("Using cloud object store with bucket '%s'", bucket.name) | ||
return bucket | ||
except S3ResponseError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not imported anywhere. Also, create
method returns S3CreateError
(http://boto.cloudhackers.com/en/latest/ref/boto.html#module-boto.exception).
More importantly, this will not work cross-provider so having an extra, general except
may be desirable?
Same comment applies to other refs to S3ResponseError
.
(xref CloudVE/cloudbridge#9 and custom/standardized CloudBridge exceptions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, and as explained in the previous comments, as a temporary patch, replaced S3
/boto
-specific exceptions with generic exceptions.
lib/galaxy/objectstore/cloud.py
Outdated
self.secret_key = a_xml.get('secret_key') | ||
b_xml = config_xml.findall('bucket')[0] | ||
self.bucket = b_xml.get('name') | ||
self.use_rr = string_as_bool(b_xml.get('use_reduced_redundancy', "False")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like we're using this option anymore (which is understandable from the multi-cloud perspective), so probably best to remove this line and relevant config.
lib/galaxy/objectstore/cloud.py
Outdated
created_obj = self.bucket.create_object(rel_path) | ||
created_obj.upload(source_file) | ||
else: | ||
self.bucket.get(rel_path).upload(source_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably keep the same code structure as in the if
case (or the other way around, as long as it's consistent).
lib/galaxy/objectstore/cloud.py
Outdated
rel_path) | ||
return True | ||
# FIXME: don't need to differenciate between uploading from a string or file, | ||
# because CloudBridge handles this internally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's there to fix? if
and else
here look the same so shouldn't it just be consolidated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. But I'm postponing function signature change for future, just to keep the changes at this PR at a minimal possible level.
lib/galaxy/objectstore/cloud.py
Outdated
if not os.path.exists(cache_dir): | ||
os.makedirs(cache_dir) | ||
|
||
# Although not really necessary to create S3 folders (because S3 has |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably remove this comment.
if entire_dir and extra_dir: | ||
shutil.rmtree(self._get_cache_path(rel_path)) | ||
results = self.bucket.list(prefix=rel_path) | ||
for key in results: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be better to replace references like this that use AWS terminology to something cloud agnostic? e.g., object
. This applies throughout the file (e.g., _key_exists
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. However, as I mentioned in other comments, I'm trying to keep Cloud
very similar to S3ObjectStore
(following the comments on the PR #4314). Using a general terminology might escalate the challenges :) But we'll surely use more general nomenclature in future updates.
@afgane Thanks for the comments, I updated the code accordingly. Few points:
Correct. This PR sets the ground for User-based ObjectStore (initially discussed at #4314); which is a user-based and provider-agnostic ObjectStore. However, following the comments at #4314, I made I'll comment inline other comments. |
It's generally not great to use the generic There are a couple of tests still failing though. How do the dependencies that are still getting built (i.e., the open PR for CloudBridge wheel) get pulled into the test env? Could that be it? |
Maybe the tests are failing because the current latest version of the |
Unit tests are failing with:
|
Refactor bug reporting into plugins / Remove bug report mako
Small GRT bugfixes
Created alphabetic and numerical sort tool for collection operations
Vcf gz datatype
- They now uniformly handle server option processing. - Reports and Tool Shed now respect many new and important options such as --skip_venv. - Reports and Tool Shed now do extra checking around Python version that Galaxy does. In addition to improving the reports and tool shed startup scripts and reducing cognative load with respect to option handling, this is an important pre-condition to us being able us to switch on uwsgi by default for all three services simultaneously without a bunch of copy and pasting.
Initial outline of allowing launch via uwsgi.
This has been rebased several times and includes a variety of fixes (some thanks to martenson).
- Implement kwalify schema to validate/define the schema. - Replace ini configuration with YAML (completely backward compatible). - Rebuild a sample configuration from the schema (config/tool_shed.yml.sample).
Fix quota function name
replace exclamation with a clock icon
…ctStore # Conflicts: # lib/galaxy/objectstore/cloud.py
don't uppercase envoverride
- Added `Cloud` to objectstore import.
requests via a different PR.
a generic exception--a temporary solution till CloudBridge wraps the exceptions properly.
because CloudBridge internally maps to appropriate functions depending on the input type.
Should use the wheel created at [this PR](galaxyproject/starforge#139).
…into CloudObjectStore
Some thing went awfully wrong with |
The
Cloud
ObjectStore is a provider-agnostic cloud-based storage. In other words, theCloud
ObjectStore can read/write from various cloud-based storages without a provider-specific implementation; it leverages onCloudBridge
which provides a unified access to various providers.This PR sets a base for the User-based ObjectStore we have been working on. Initially, the
Cloud
ObjectStore was part of the User-based ObjectStore (PR #4314), and it is being submitted as a separate PR following the comments on the PR #4314.To simplify the comparison with
S3ObjectStore
, theCloud
ObjectStore provides the very same functionality asS3ObjectStore
, i.e., read/write to AWS S3 buckets. The only difference is:S3ObjectStore
leverages onboto
whileCloud
usesCloudBridge
.pinging @bgruening @nsoranzo and @erasche