Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow selecting raw/proc data in components layer #468

Merged
merged 12 commits into from
Oct 16, 2024

Conversation

takluyver
Copy link
Member

Preparing for the future when raw & proc data has different names.

You can pass raw=True, False or None. The default is None, which tries to preserve the legacy behaviour of using proc if possible and raw if not.

Opening a draft PR to try the CI.

extra_data/components.py Fixed Show fixed Hide fixed
Comment on lines +1630 to +1814
self.source_to_modno = {s: (m - first_modno + 1)
for (s, m) in self.source_to_modno.items()}

Check warning

Code scanning / CodeQL

Overwriting attribute in super-class or sub-class Warning

Assignment overwrites attribute source_to_modno, which was previously defined in superclass
MultimodDetectorBase
.
# JUNGFRAU modno is expected (e.g. extra_geom) to start with 1.
self.source_to_modno = {s: (m - first_modno + 1)
for (s, m) in self.source_to_modno.items()}
self.modno_to_source = {m: s for (s, m) in self.source_to_modno.items()}

Check warning

Code scanning / CodeQL

Overwriting attribute in super-class or sub-class Warning

Assignment overwrites attribute modno_to_source, which was previously defined in superclass
MultimodDetectorBase
.
@takluyver takluyver mentioned this pull request Mar 5, 2024
@takluyver takluyver force-pushed the feat/components-raw-proc branch from ca4632f to f3e861c Compare June 18, 2024 13:02
@takluyver takluyver force-pushed the feat/components-raw-proc branch 2 times, most recently from fcc832d to ac7240a Compare September 26, 2024 12:58
@takluyver takluyver force-pushed the feat/components-raw-proc branch from ac7240a to 26fa54d Compare September 26, 2024 13:01
@takluyver takluyver marked this pull request as ready for review September 26, 2024 15:31
@TimBerberich
Copy link

TimBerberich commented Oct 4, 2024

I went over the code changes in extra_data/components.py, though I still have to run the tests.
Seems good to me, here are some minor questions:

  • Why was the regex capture group 'detname' (line 123) removed. To me it seemed clearer to call
       detector_names.add(m.group('detname'))
    than (line 234 in current version )
      detector_names.add(m.group(1))
    one could even omit the .group and simply call
       detector_names.add(m['detname'])
    is there a reason to use .group.
  • In the funciton def _find_detector_names(cls, data): (line 226), why do we use the regular expressions for raw and proc data sources instead of only using the capture group for detector names
    re_detname  = re.compile(f'({cls._det_name_pat})')
    by the name of the function it should not care about raw or proc just the detector names that are found.

@TimBerberich
Copy link

Tests all passed. And the major added test test_modern_corr_sources looks good to me.

@takluyver
Copy link
Member Author

Why was the regex capture group 'detname' (line 123) removed.

Good question, I don't know why I did that. I'll put it back.

In the funciton def _find_detector_names(cls, data): (line 226), why do we use the regular expressions for raw and proc data sources instead of only using the capture group for detector names

In this code we only care about the detector data sources - there might be other sources related to detector control, power supply, data processing etc. in the run, which might introduce extra detector names. We don't want to throw an error saying you have to pick from these 3 names if 2 of them won't work.

@takluyver takluyver force-pushed the feat/components-raw-proc branch from c813001 to 6d15a45 Compare October 15, 2024 13:51
Copy link
Contributor

@philsmt philsmt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM! This looks really neat.

source_to_modno = dict(cls._source_matches(data, pat))
if not all(cls._data_is_raw(data, s) for s in source_to_modno):
# Older corrected data used the same names as raw
raise Exception("Raw data was requested, but only proc data was found")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, but why no ValueError?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I sometimes write Exception as a placeholder when I haven't figured out what the right type is.

In this scenario, what do you think about using SourceNameError? If we had had distinct source names from the beginning I think this would be a clear choice, but it could be somewhat confusing if the relevant source names are there, just pointing to the wrong thing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following discussion, I used SourceNameError when you want corrected data and only raw is there, because it can point to the new corrected source name, which will definitely be absent.

But when you want raw data and the .../DET/... source names point to corrected data, it seems confusing to say those sources are missing, so in that case I raise ValueError. Hopefully the disparity won't matter too much - the error message is probably the important thing in either case.

@takluyver
Copy link
Member Author

prnote:
Detector data classes have a new raw= parameter. In newer runs, both corrected and raw detector data can be accessed (depending on how you use open_run). Pass raw=False or True to specify which kind of data you want. The default behaviour is to use corrected data if available, and raw data if not, for compatibility with existing code. Older runs use the same source names for raw & corrected data, so only one kind will be accessible.

@takluyver takluyver added this to the 1.19 milestone Oct 16, 2024
@takluyver takluyver merged commit 479ea37 into master Oct 16, 2024
8 checks passed
@takluyver takluyver deleted the feat/components-raw-proc branch October 16, 2024 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants