-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User-based ObjectStore #4314
User-based ObjectStore #4314
Conversation
2. Added user to cloud create signature.
2. Added a function in dataset to retrieve a list of all the users its shared with.
… of the functions. - ObjectStore.Cloud calls for the configuration of a connection to the cloud-based storage providers at its first call (this has to be extended to other possible call paths).
…sed objectstore. - Replaced dataset.filename with dataset.get_file_name for what is needed for a user-based objectstore.
- previously was `Job` - now it's HDA
- previously was `Job` - now it's HDA
- added a table for it, and extended HDA table accordingly. - created migration scripts. - propagated PluggedMedia to the ObjectStore - added temporary code for defining a pluggedMedia (once the a user is created), and using it (once a dataset is being uploaded). These temporary codes has to be replaced with appropriate fucntions.
…the result of job execution.
…ocess. 2. Fixed some bugs in Objectstore.cloud.
…stances of objectstore could run. 2. cleaned-up cloud config file parser. 3. fixed a bug with contruct_path function.
…tore # Conflicts: # doc/source/lib/galaxy.objectstore.rst # doc/source/slideshow/architecture/images/objectstore.plantuml.svg # lib/galaxy/jobs/__init__.py
lib/galaxy/jobs/__init__.py
Outdated
@@ -902,10 +902,16 @@ def get_special( ): | |||
def _create_working_directory( self ): | |||
job = self.get_job() | |||
try: | |||
# TEMP BLOCK --- START | |||
pluggedMedia = None | |||
for pM in job.user.pluggedMedia: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This for
loop just to set a variable is a bit awkward. I think it'd be better to add a method to the User
class in lib/galaxy/model/__init__.py
that defines this.
def current_plugged_media(self):
return None if not len(self.plugged_media) else self.plugged_media[-1]
and then access like:
job.user.get_current_plugged_media()
It is really unfortunate that plugged_media
is also used as both a list and a single instance. I'd consider a name that has a more obvious, distinct singular vs. plural modality (e.g. plugged_volume
vs. plugged_volumes
) or use plugged_medium
for the single instance you are fetching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As marked, this part of the code is temporary. There will be some UI's asking user which plugged_media
to use, or allow user to set a default plugged_media
from preferences. However, till then, we're assuming that the user has only one plugged_media
and we're retrieving it this way.
regarding naming; indeed job.user.plugged_media
is a relation to the plugged_media
table.
""" | ||
Object store that stores objects as items in a Swift bucket. A local | ||
cache exists that is used as an intermediate location for files between | ||
Galaxy and Swift. | ||
""" | ||
|
||
def _configure_connection(self): | ||
# TODO: Replace with Cloudbridge connection. | ||
log.debug("Configuring Swift Connection") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you mark this PR as a WIP until this TODO is addressed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is already addressed in the super class - can you just delete the code and the comment?
This is exciting functionality and a great direction to move in - thanks for the contribution! That said, I'd really like to see some large structural changes to this PR - in particular I'd like to see this as 3 different PRs so that I can comment on the pieces individually. The three PRs I'd like to see are:
All that said - I'm not a -1 or anything - this isn't something I'm intending to force on you. I'm just letting you know how I would approach it and how I would feel more comfortable reviewing it. Good luck! |
@bgruening thanks for the reference. Indeed having a section in user preferences where the user can define PluggedMedia (e.g., AWS S3, Azure Blob, and etc.) is required to improve usability of user-based objectstore. Additionally that section should allow user to define hierarchical relation between his/her various plugged medias (similar to object_store_config). |
+1 on @jmchilton's suggestions. These will help in scoping/modularizing/understanding this excellent work and generating digestable PRs. |
are you suggesting to close this PR, separate cloudbridge, and create a new PR when use-based ObjectStore is fully functional ? |
I didn't want to jump in on this PR right away since I feel like we've talked about this in person quite a bit, but I agree pretty much completely with what @jmchilton suggests, which closely echoes what I feel like we've talked about already. Small logical units of incremental change are vastly preferable and are much easier to understand and review/approve, or suggest improvements for. I like the three-PR breakdown John suggests -- one stylistic, with no substantial changes, for if you need/want to fix style in places prior to work. Then the cloudbridge part on existing components. Finally, new PluggableMedia changes. I'll also echo, as I've mentioned several times in person, that we should not just throw away S3ObjectStore without a dead-simple automated upgrade path where existing configurations will continue to work (this is general practice for anything changing in Galaxy configuration). The easiest solution here is probably to, instead of renaming S3ObjectStore, just leave it as-is for now and add the CloudObjectStore separately. |
I also don't understand the reason for removing it. If we just leave it we have backward compatibility without needing to write any conversion scripts or whatever, which is what we usually strive for. |
What do you think about the user-based objectstore concept this PR is introducing? |
Thanks @VJalili, I'm closing this one then. |
@nsoranzo , The other PRs are focused on
The |
I supposed you wanted to open a clean PR after the others are merged, but I can surely reopen this, but I'll mark it WIP in the mean time. |
@nsoranzo Thanks. |
This PR is in accordance with the Federated user-based ObjectStore goal. The updates introduced in this PR are mock-up of the Federated user-based ObjectStore, and can generally be grouped as:
User-based ObjectStore. Current ObjectStore reads its configuration (e.g., provider, bucket name, access and secret keys) from
config/object_store_conf.xml
and applies them Galaxy instance-wide. In other words, the data uploaded/created/processes by any of the users are written-to or read-from a common bucket to which the Galaxy instance has full access (accordingly, implicitly accessible by all the users of the galaxy instance). This PR introduces user-based objectstore, where each user defines his/her ownPluggedMedia
(e.g., AWS S3 bucket(s), Azure Blob, and etc.) where adataset
is read/written to thatPluggedMedia
.Provider agnostic ObjectStore. This PR replaces the
S3ObjecStore
module withcloud
module. The former usesboto
and provides access to AWS S3 only; while, the latter leverages onCloudBridge
which provides an interface for unified access to various providers (including S3 and Swift).What is expected in future related PRs:
at this PR, only
cloud
module is user-based; other modules such asDistributedObjectStore
are not (user should be able to define how to distribute his/her own data on thePluggedMedia
s he/she has defined using a user-basedDistributedObjectStore
).some UI elements will be defined to allow user (a) define a
PluggedMedia
, (b) choose aPluggedMedia
when uploading/creating a new dataset. At this PR, to define aPluggedMedia
, one should manually add it to the database (PluggedMedia
table). Additionally, the code has two temporary code-snippets (marked withTEMP BLOCK
) which choose the first availablePluggedMedia
of the user.