Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloud ObjectStore #4352

Closed
wants to merge 740 commits into from
Closed

Cloud ObjectStore #4352

wants to merge 740 commits into from

Conversation

VJalili
Copy link
Member

@VJalili VJalili commented Jul 27, 2017

The Cloud ObjectStore is a provider-agnostic cloud-based storage. In other words, the Cloud ObjectStore can read/write from various cloud-based storages without a provider-specific implementation; it leverages on CloudBridge which provides a unified access to various providers.

This PR sets a base for the User-based ObjectStore we have been working on. Initially, the Cloud ObjectStore was part of the User-based ObjectStore (PR #4314), and it is being submitted as a separate PR following the comments on the PR #4314.

To simplify the comparison with S3ObjectStore, the Cloud ObjectStore provides the very same functionality as S3ObjectStore, i.e., read/write to AWS S3 buckets. The only difference is: S3ObjectStore leverages on boto while Cloud uses CloudBridge.

pinging @bgruening @nsoranzo and @erasche

@VJalili VJalili mentioned this pull request Jul 27, 2017
@galaxybot galaxybot added this to the 17.09 milestone Jul 27, 2017
import boto

from boto.exception import S3ResponseError
from boto.s3.key import Key
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linter doesn't like unused imports (this line and the following), but you can ignore this by doing

from boto.s3.key import Key  # noqa: F401
from boto.s3.connection import S3Connection  # noqa: F401

Although I'm not sure in which scenario it would help to import unused dependencies ...
If external code needs it, it should probably import the dependencies where they are needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mvdbeek Thanks, the imports are updated accordingly.
Actually, boto import is temporary solution for handling AWS S3 exceptions until cloudbridge implements such exceptions for all the providers.

Copy link
Contributor

@afgane afgane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good. Clearly, this is still AWS-only even if it implements the CloudBridge interface but I guess that's just the first step.
There are a few comment lines and debug print statements that were also in the previous implementation. I think those can be removed now. Finally, I made a few comments inline, some of which repeat themselves in multiple places but the comment should apply to whole the file.

@@ -13,6 +13,7 @@ graphitesend
azure-storage==0.32.0
# PyRods not in PyPI
python-ldap==2.4.27
cloudbridge==0.3.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to go with the latest current release (0.3.2)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No specific reason, I updated to the current latest version (also see starforge PR #138).

from ..objectstore import convert_bytes, ObjectStore
from cloudbridge.cloud.factory import CloudProviderFactory, ProviderList

try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably, this is for backward compatibility? Either way, it'd be good to have a comment why is this necessary in addition to the CloudBridge import.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boto was only used to handle some exceptions. However, I replaced all boto-specific exceptions with a generic exception. This is totally a temporary patch till exceptions are properly wrapped and thrown by CloudBridge.

bucket = self.conn.object_store.create(bucket_name)
log.debug("Using cloud object store with bucket '%s'", bucket.name)
return bucket
except S3ResponseError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not imported anywhere. Also, create method returns S3CreateError (http://boto.cloudhackers.com/en/latest/ref/boto.html#module-boto.exception).
More importantly, this will not work cross-provider so having an extra, general except may be desirable?
Same comment applies to other refs to S3ResponseError.
(xref CloudVE/cloudbridge#9 and custom/standardized CloudBridge exceptions).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, and as explained in the previous comments, as a temporary patch, replaced S3/boto-specific exceptions with generic exceptions.

self.secret_key = a_xml.get('secret_key')
b_xml = config_xml.findall('bucket')[0]
self.bucket = b_xml.get('name')
self.use_rr = string_as_bool(b_xml.get('use_reduced_redundancy', "False"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like we're using this option anymore (which is understandable from the multi-cloud perspective), so probably best to remove this line and relevant config.

created_obj = self.bucket.create_object(rel_path)
created_obj.upload(source_file)
else:
self.bucket.get(rel_path).upload(source_file)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably keep the same code structure as in the if case (or the other way around, as long as it's consistent).

rel_path)
return True
# FIXME: don't need to differenciate between uploading from a string or file,
# because CloudBridge handles this internally.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's there to fix? if and else here look the same so shouldn't it just be consolidated?

Copy link
Member Author

@VJalili VJalili Aug 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. But I'm postponing function signature change for future, just to keep the changes at this PR at a minimal possible level.

if not os.path.exists(cache_dir):
os.makedirs(cache_dir)

# Although not really necessary to create S3 folders (because S3 has
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably remove this comment.

if entire_dir and extra_dir:
shutil.rmtree(self._get_cache_path(rel_path))
results = self.bucket.list(prefix=rel_path)
for key in results:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be better to replace references like this that use AWS terminology to something cloud agnostic? e.g., object. This applies throughout the file (e.g., _key_exists).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. However, as I mentioned in other comments, I'm trying to keep Cloud very similar to S3ObjectStore (following the comments on the PR #4314). Using a general terminology might escalate the challenges :) But we'll surely use more general nomenclature in future updates.

@VJalili
Copy link
Member Author

VJalili commented Aug 2, 2017

@afgane Thanks for the comments, I updated the code accordingly. Few points:

Clearly, this is still AWS-only even if it implements the CloudBridge interface but I guess that's just the first step.

Correct. This PR sets the ground for User-based ObjectStore (initially discussed at #4314); which is a user-based and provider-agnostic ObjectStore. However, following the comments at #4314, I made Cloud with least possible changes to S3ObjectStore to (a) simplify comparison, (b) incremental and smaller impacts on the Galaxy codebase. Accordingly, you may see lots of S3-specific terminology and comments. I think we could be more provider-agnostic in future updates.

I'll comment inline other comments.

@afgane
Copy link
Contributor

afgane commented Aug 2, 2017

It's generally not great to use the generic Exception if it can be helped but given this is an intermediary step, I think it's ok.

There are a couple of tests still failing though. How do the dependencies that are still getting built (i.e., the open PR for CloudBridge wheel) get pulled into the test env? Could that be it?

@VJalili
Copy link
Member Author

VJalili commented Aug 2, 2017

Maybe the tests are failing because the current latest version of the CloudBridge is not built yet (galaxyproject/starforge#138)

@nsoranzo
Copy link
Member

nsoranzo commented Aug 7, 2017

Unit tests are failing with:

File "/home/travis/build/galaxyproject/galaxy/lib/galaxy/objectstore/cloud.py", line 25, in <module>

    from cloudbridge.cloud.factory import CloudProviderFactory, ProviderList

ImportError: No module named cloudbridge.cloud.factory

hexylena and others added 21 commits August 10, 2017 17:53
Refactor bug reporting into plugins / Remove bug report mako
Created alphabetic and numerical sort tool for collection operations
- They now uniformly handle server option processing.
- Reports and Tool Shed now respect many new and important options such as --skip_venv.
- Reports and Tool Shed now do extra checking around Python version that Galaxy does.

In addition to improving the reports and tool shed startup scripts and reducing cognative load with respect to option handling, this is an important pre-condition to us being able us to switch on uwsgi by default for all three services simultaneously without a bunch of copy and pasting.
Initial outline of allowing launch via uwsgi.
This has been rebased several times and includes a variety of fixes (some thanks to martenson).
 - Implement kwalify schema to validate/define the schema.
 - Replace ini configuration with YAML (completely backward compatible).
 - Rebuild a sample configuration from the schema (config/tool_shed.yml.sample).
guerler and others added 25 commits August 21, 2017 08:28
…ctStore

# Conflicts:
#	lib/galaxy/objectstore/cloud.py
a generic exception--a temporary solution till CloudBridge wraps the
exceptions properly.
because CloudBridge internally maps to appropriate functions depending
on the input type.
Should use the wheel created at [this PR](galaxyproject/starforge#139).
@VJalili
Copy link
Member Author

VJalili commented Aug 23, 2017

Some thing went awfully wrong with rebase. Please follow up #4487

@VJalili VJalili closed this Aug 23, 2017
@VJalili VJalili deleted the CloudObjectStore branch August 23, 2017 19:26
@VJalili VJalili mentioned this pull request Aug 28, 2017
@VJalili
Copy link
Member Author

VJalili commented Aug 28, 2017

@nuwang @afgane due to an issue with this branch due to a white-space removal rebase, please follow up on a new PR #4487

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.