Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] pyDKB: storages #277

Draft
wants to merge 20 commits into
base: pyDKB-loggable-object
Choose a base branch
from
Draft

Conversation

mgolosova
Copy link
Collaborator

@mgolosova mgolosova commented Jul 10, 2019

Rewrites #244

This PR introduces to pyDKB library new submodule storages.

The module is intended to accumulate all the functionality related to the interaction with the storages we use, including both external and internal ones. It basically provides unified way to interact with any of them: object of Client class, that understands basic set of methods, like:

  • apply configuration (create client object(s));
  • send get() request (get item by ID);
  • send custom (parametric) query.

Current modules structure is as follows:

pyDKB
  |---> storages
          |---> exceptions // Exceptions definitions
          |
          |---> client
          |        |---> Client  // Interface class
          |
          |---> atlas
                   |---> rucio  // Interface implementation for ATLAS-specific (external) storage

Suggested way to use the new functionality is like this:

import pyDKB.storages as s

s.setScope('atlas')
rucio = s.getClient('rucio')

mdata = rucio.get('data18_13TeV.00351223.physics_Main.merge.AOD.f937_m197',
                  fields=['events', 'bytes', 'deleted'])

print mdata

For storages like ES will be added following submodules:

  • pyDKB.storages.client.es -- wrapper for ES client with pyDKB.storages.client.Client interface;
  • pyDKB.storages.atlas.chicagoES -- Chicago ES client;
  • pyDKB.storages.atlas.dkb.es -- DKB ES client.

@Evildoor
Copy link
Contributor

Suggested way to use the new functionality is like this: ...

I presume that the last line of the example should be print mdata instead of print mdata2 - no special functionality for changing names like that was implemented here, right?

Side note - I've missed the [WIP] status of the PR, so my apologies if the problems here were already noted and planned to be dealt with later, and I started pointing at them too early.

@mgolosova
Copy link
Collaborator Author

mgolosova commented Jul 11, 2019

@Evildoor,

Suggested way to use the new functionality is like this: ...

I presume that the last line of the example should be print mdata instead of print mdata2 - no special functionality for changing names like that was implemented here, right?

Right, it was just a copy-paste mistake, thank you for bringing it to my attention.
But I wonder now if there can possibly be such a functionality, that would change name of a variable outside the function... :)

I've missed the [WIP] status of the PR, so my apologies if the problems here were already noted and planned to be dealt with later, and I started pointing at them too early.

It`s Ok, the main idea of creating this PR was exactly to let others see what`s going on and where the development goes. Any comments are welcome, including general ones like "what it is supposed to look like for $this or $that kind of storage?", "suggested way of usage does not look very natural, I`d expect something like $this or $that" etc.
I am looking at this from "within" so can make very questionable decisions, and it will be very useful to see these questions now, before the development gets to its end and it becomes difficult to change some basic things.

else:
scope = _scope
if scope is None:
raise exceptions.StorageException("Storages scope not specified")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise exceptions.StorageException("Storages scope not specified")
raise exceptions.StorageException("Storage's scope not specified")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a mistake (or, better to say, it is a different mistake): the message says that scope of storages, from which should be picked one (or maybe "storage lookup scope"?), is not specified. "Storage scope" is more like a scope of a single storage -- which wouldn`t have much sense in this situation.

Will be changed, thank you!

mgolosova and others added 16 commits January 9, 2020 14:31
Sometimes we have to interact with the same storage in different
scripts. When it happens we got to implement same functionality multiple
times: read configuration, check that Python module is available, create
client, ...

This submodule is a place to implement this once and then reuse whenever
it is needed.

Initial module structire:

```
pyDKB
  |---> storages
          |---> exceptions
          |       |---> StorageException
          |       |---> NotFound
          |
          |---> client
                  |---> Client
```
Suggested way of the `pyDKB.storages` usage is:
```
from pyDKB import storages

def processA(...):
    rucio_client = storages.getClient('rucio', 'atlas')
    rucio_client.get_metadata(...)
    ...

def processB(...):
    rucio_client = storages.getClient('rucio', 'atlas')
    rucio_client.get_metadata(...)
    ...
```

The client used in `processA` and `processB` (and at every call of a
function) will be the same instance, initialized at first call. In other
words, client classes for individual storages are "singleton" classes
(as long as used via `getClient()` methods, not directly).

"Scope" abstraction is introduced to keep project-specific fucntionality
localised in a single submodule, preventing it from infiltration into
other, more general modules. Each "scope" is nothing but a submodule of
`pyDKB.storages`. It must contain pre-configured clients for the storages,
used in a given project and support general access method to these
clients (`getClient()` function).
It allows to get clients simply by name, not specifying the scope every
now and again:
```
from pyDKB import storages
storages.setScope('atlas')

def processA(...):
    s = storages.getClient('rucio')
    s.get_metadata(...)
```
…class.

The idea of the interface to provide common set of methods to interact
with different storages, not to (re-)implement all possible methods of
interaction with storages. But different storages may have specific
methods, and calling them as `my_client.c.client_method_A()` will look a
bit wierd: `my_client.client_method_A()` looks more natural.

To avoid re-implementation of all useful methods in this way:
```
  def client_methodA(self, *args, **kwargs):
    self.c.client_method_A(*args, **kwargs)
```

multiple inheritance will be used:
```
  class MyClient(pyDKB.storages.client.Client, ParentClientClass):
    ...
```

In case of `ParentClientClass` having same methods as Client, by default
`Client` (interface) method will be used, raising `NotImplementedError`.
If `ParentClientClass` method should be used, it is to be specified
explicitly.
"id" is a reserved word in Python, so it should not be used as variable
or parameter name.
Original Rucio `Client` does not have `get()` method.
Currently it is nothing but a wrapper around standard
`elasticsearch.Elasticsearch` class, except that its `get()` method is
overridden with `Client.get()` that raises `NotImplementedError`.
It allows to pass configuration to the client as hash.
`__init__()` accepts parameters in the form consistent with the
`elasticsearch.Elasticsearch()` parameters, so it is not possible to
pass default index name when the object is created.
@mgolosova
Copy link
Collaborator Author

mgolosova commented Jan 9, 2020

@mgolosova force-pushed the pyDKB-storages-2 branch from 7b74454 to a349cb6

...to rebase PR on the current version of the pyDKB (where LoggableObject is already merged).


Update:

@mgolosova force-pushed the pyDKB-storages-2 branch from a349cb6 to 37de231

First push was a bit of a mess, so to check the real diff please use this link: diff between 7b74454 and 37de231.

There's only two possibilities: an exception was or was not thrown.
Whatever exception it is, it indicates that we failed to import
something we wanted to import => so we must return a value that
indicates "import has failed" -- `False`.
Who knows what value has this or that attribute one wants to import,
right? And `False` looks like a pretty possible one... while string "NOT
IMPORTED VALUE" should be less expected and I believe it won't conflict
with real attributes values (except itself, but hopefully no one will
use this function to import something added as service variable for this
function itself).
@mgolosova mgolosova marked this pull request as draft June 2, 2020 11:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants