Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for source and key aliases #367

Merged
merged 10 commits into from
Apr 20, 2023
Merged

Add support for source and key aliases #367

merged 10 commits into from
Apr 20, 2023

Conversation

philsmt
Copy link
Contributor

@philsmt philsmt commented Jan 4, 2023

It's been requested frequently to somehow allow easier access to sources and keys without the need to deal with Karabo device names and their properties. This PR adds support for this via an alias mechanism. As discussed towards the end of last year, it is mostly separate from the existing code using literal sources and keys via an index property run.alias[...] inspired by the likes of pandas and xarray.

Aliases may refer to a source or a source-key or equivalently to a SourceData or a KeyData object. There are no pure key aliases that may be used with different sources. They can be attached to a DataCollection object via two methods:

  • DataCollection.with_aliases(*alias_defs)
  • DataCollection.only_aliases(*alias_defs, strict=False, require_all=False)

They may be passed any number of alias definitions, which may be

  • A dict mapping the alias to a str (for sources) or 2-tuple of str (for source-keys).
  • A str pointing to a JSON, YAML or TOML file with alias definitions. These should define a dict in the same way, with the additional option of using nested dictionaries for multiple keys per source. (See docstring)

With the returned DataCollection, aliases may only be used through the DataCollection.alias property.

run = open_run(proposal=1, run=2).with_aliases('aliases.yaml')
run.alias['my-source']  # Returns SourceData
run.alias['my-source', 'a-key']  # Returns KeyData
run.alias['my-source-key']  # Returns KeyData
run.alias['my-alias-to-a-missing-source']  # Raises SourceNameError
run.alias['my-bogus-alias']  # Raises AliasError
run.alias['my-source-key', 'another-key']  # Raises ValueError
run.alias.select('my-source')  # returns DataCollection
run.alias.deselect('my-source')  # returns DataCollection

Once attached, aliases are handed down to any DataCollection created from this, e.g. by selection. The aliases are merged as well when using DataCollection.union() as long as there is no conflict - in that case a ValueError is raised.

Aliases are shown in DataCollection.info() in tags < > for both sources and keys, even if a source is not shown in detail.

Documentation follows after the initial discussion 😃

@philsmt philsmt added the enhancement New feature or request label Jan 4, 2023
extra_data/reader.py Fixed Show fixed Hide fixed
extra_data/reader.py Outdated Show resolved Hide resolved
extra_data/reader.py Outdated Show resolved Hide resolved
extra_data/reader.py Outdated Show resolved Hide resolved
extra_data/reader.py Outdated Show resolved Hide resolved
@takluyver
Copy link
Member

I've commented on a few specific parts, but overall I think this is in good shape.

@JamesWrigley
Copy link
Member

I wonder if we should have support for loading aliases automatically? The reason is because device names and properties can change so people will need to load different aliases for different proposals; for example MID has gone through a couple iterations of Karabo devices that aggregate properties from multiple devices (usually motors) in a single device, and each time the naming has changed slightly.

The first way I thought of handling this was through instrument libraries (like SCS's toolbox), which could wrap open_run():

from extra_data import open_run as ed_open_run

def open_run(proposal, run_no, **kwargs):
    run = ed_open_run(proposal, run_no, **kwargs)
    
    # Check if an alias file exists at a certain location (usr/extra-data-aliases.yaml?),
    # or load a default list of aliases (e.g. for the XGM etc), or use an alias based on the date of the run.
    return run.with_aliases(foo)

So the user could run e.g.:

import mid

run = mid.open_run(1234, 56)

And then they would have their instrument specific aliases without having to call with_aliases(). But what if we supported loading aliases from a specific file by default? I think standardizing it would be quite convenient since it means you wouldn't have to use instrument-specific libraries to load the right aliases for a proposal.

@takluyver
Copy link
Member

When I discussed this with Philipp, we agreed to get explicit alias support in and then discuss this question of how to make it more convenient.

I'm against automatically loading aliases based on something like the working directory of the notebook, or the current user running it, or anything like that. I don't want to make it easy for people to write notebooks which break (or worse, appear to work but do something different) when they're moved, or when someone else runs them, or whatever. But I could see using the proposal number we pass to open_run to get aliases defined for the proposal.

@philsmt
Copy link
Contributor Author

philsmt commented Jan 17, 2023

Ah, you beat me to replying to James just barely.

Indeed the file support was initially motivated by the idea of loading such files automatically, e.g. based on location of the running code. I ended up leaving it out of this MR for the sake of easier review, but also because I didn't end up liking the idea so much anymore after trying to implement it. There were some cheap ways in terms of runtime cost to do it, but you quickly run into corner cases where it's outright confusing. Next to the point Thomas raised, some people write modules rather than notebook, should it apply here as well?

I like the idea of attaching it to the proposal rather than the file as well, though it implies there's a canonical way to do aliases across a single user group. But one could define a fixed filename in some proposal location (say <proposal>/usr/.extra-data-aliases.yaml) that is searched for when open_run is used.

@JamesWrigley
Copy link
Member

Sorry I guess wasn't clear, I never had anything in mind other than alias files belonging to proposals :)

But one could define a fixed filename in some proposal location (say <proposal>/usr/.extra-data-aliases.yaml) that is searched for when open_run is used.

Yes, that's exactly what I put in my code snippet (though I used the non-hidden usr/extra-data-aliases.yaml, I think that's better than a hidden file). Would that be too much for this PR? (I'm happy to make a follow-up PR if you like)

@takluyver
Copy link
Member

I'd suggest we do that in a follow-up PR, if that's OK with everyone. Then we needn't hold this one up with discussion of what exactly we call the file, whether we prefer one of the available formats over another, and so on. 🙂

@philsmt philsmt force-pushed the feat/alias branch 2 times, most recently from 00735e1 to ce1a731 Compare April 5, 2023 08:45
Copy link
Member

@takluyver takluyver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nitpicks, but otherwise this LGTM

extra_data/reader.py Outdated Show resolved Hide resolved
extra_data/reader.py Outdated Show resolved Hide resolved
extra_data/reader.py Outdated Show resolved Hide resolved
extra_data/tests/test_reader_mockdata.py Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
@philsmt
Copy link
Contributor Author

philsmt commented Apr 9, 2023

Thanks, I've cleaned up all the fixup commits then, ready to go from my side.

@takluyver
Copy link
Member

Thanks, LGTM

@philsmt
Copy link
Contributor Author

philsmt commented Apr 13, 2023

Would you like to have documentation as part of this MR or add it afterwards? I plan to get it done next week.

@takluyver
Copy link
Member

I'm happy enough for it to come a bit later - even after a release if needed - so long as it doesn't get forgotten.

@philsmt
Copy link
Contributor Author

philsmt commented Apr 17, 2023

I rebased on top of master to fix the merge conflict, feel free to merge.

@takluyver takluyver merged commit 808afae into master Apr 20, 2023
@takluyver takluyver added this to the 1.13 milestone May 8, 2023
@tmichela tmichela deleted the feat/alias branch October 1, 2024 10:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants