-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] pyDKB: storages #277
base: pyDKB-loggable-object
Are you sure you want to change the base?
Conversation
174a9b3
to
d0eb05e
Compare
I presume that the last line of the example should be Side note - I've missed the [WIP] status of the PR, so my apologies if the problems here were already noted and planned to be dealt with later, and I started pointing at them too early. |
Right, it was just a copy-paste mistake, thank you for bringing it to my attention.
It`s Ok, the main idea of creating this PR was exactly to let others see what`s going on and where the development goes. Any comments are welcome, including general ones like "what it is supposed to look like for $this or $that kind of storage?", "suggested way of usage does not look very natural, I`d expect something like $this or $that" etc. |
else: | ||
scope = _scope | ||
if scope is None: | ||
raise exceptions.StorageException("Storages scope not specified") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise exceptions.StorageException("Storages scope not specified") | |
raise exceptions.StorageException("Storage's scope not specified") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not a mistake (or, better to say, it is a different mistake): the message says that scope of storages, from which should be picked one (or maybe "storage lookup scope"?), is not specified. "Storage scope" is more like a scope of a single storage -- which wouldn`t have much sense in this situation.
Will be changed, thank you!
Sometimes we have to interact with the same storage in different scripts. When it happens we got to implement same functionality multiple times: read configuration, check that Python module is available, create client, ... This submodule is a place to implement this once and then reuse whenever it is needed. Initial module structire: ``` pyDKB |---> storages |---> exceptions | |---> StorageException | |---> NotFound | |---> client |---> Client ```
Suggested way of the `pyDKB.storages` usage is: ``` from pyDKB import storages def processA(...): rucio_client = storages.getClient('rucio', 'atlas') rucio_client.get_metadata(...) ... def processB(...): rucio_client = storages.getClient('rucio', 'atlas') rucio_client.get_metadata(...) ... ``` The client used in `processA` and `processB` (and at every call of a function) will be the same instance, initialized at first call. In other words, client classes for individual storages are "singleton" classes (as long as used via `getClient()` methods, not directly). "Scope" abstraction is introduced to keep project-specific fucntionality localised in a single submodule, preventing it from infiltration into other, more general modules. Each "scope" is nothing but a submodule of `pyDKB.storages`. It must contain pre-configured clients for the storages, used in a given project and support general access method to these clients (`getClient()` function).
It allows to get clients simply by name, not specifying the scope every now and again: ``` from pyDKB import storages storages.setScope('atlas') def processA(...): s = storages.getClient('rucio') s.get_metadata(...) ```
…class. The idea of the interface to provide common set of methods to interact with different storages, not to (re-)implement all possible methods of interaction with storages. But different storages may have specific methods, and calling them as `my_client.c.client_method_A()` will look a bit wierd: `my_client.client_method_A()` looks more natural. To avoid re-implementation of all useful methods in this way: ``` def client_methodA(self, *args, **kwargs): self.c.client_method_A(*args, **kwargs) ``` multiple inheritance will be used: ``` class MyClient(pyDKB.storages.client.Client, ParentClientClass): ... ``` In case of `ParentClientClass` having same methods as Client, by default `Client` (interface) method will be used, raising `NotImplementedError`. If `ParentClientClass` method should be used, it is to be specified explicitly.
"id" is a reserved word in Python, so it should not be used as variable or parameter name.
Original Rucio `Client` does not have `get()` method.
Currently it is nothing but a wrapper around standard `elasticsearch.Elasticsearch` class, except that its `get()` method is overridden with `Client.get()` that raises `NotImplementedError`.
It allows to pass configuration to the client as hash. `__init__()` accepts parameters in the form consistent with the `elasticsearch.Elasticsearch()` parameters, so it is not possible to pass default index name when the object is created.
Co-Authored-By: Evildoor <[email protected]>
7b74454
to
a349cb6
Compare
...to rebase PR on the current version of the Update:
First push was a bit of a mess, so to check the real diff please use this link: diff between 7b74454 and 37de231. |
a349cb6
to
37de231
Compare
There's only two possibilities: an exception was or was not thrown. Whatever exception it is, it indicates that we failed to import something we wanted to import => so we must return a value that indicates "import has failed" -- `False`.
Who knows what value has this or that attribute one wants to import, right? And `False` looks like a pretty possible one... while string "NOT IMPORTED VALUE" should be less expected and I believe it won't conflict with real attributes values (except itself, but hopefully no one will use this function to import something added as service variable for this function itself).
Rewrites #244
This PR introduces to pyDKB library new submodule storages.
The module is intended to accumulate all the functionality related to the interaction with the storages we use, including both external and internal ones. It basically provides unified way to interact with any of them: object of
Client
class, that understands basic set of methods, like:get()
request (get item by ID);custom (parametric)
query.Current modules structure is as follows:
Suggested way to use the new functionality is like this:
For storages like ES will be added following submodules:
pyDKB.storages.client.es
-- wrapper for ES client withpyDKB.storages.client.Client
interface;pyDKB.storages.atlas.chicagoES
-- Chicago ES client;pyDKB.storages.atlas.dkb.es
-- DKB ES client.