-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add concept of legacy source names #527
Conversation
Just to note, as part of the name change, we'll also want #468 (or something like it) so that the components layer can pick between corrected & raw data when both are 'visible' under different names. |
extra_data/reader.py
Outdated
|
||
if sd.is_legacy: | ||
warn(f"{source} is a legacy name for {self.legacy_sources[source]}. " | ||
f"Access via this name will be removed at a future data.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpicking comment
f"Access via this name will be removed at a future data.", | |
"Access via this name will be removed at a future data.", |
Also, I assume here you don't mean at a future date but for future data.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did originally mean date, but actually referencing future data is a good idea. We're not going to remove the legacy name from existing files, just stop adding it. Thanks!
50fc57d
to
2682eba
Compare
This PR is now fully ready for review, I added tests for the new APIs. For this purpose, |
# Get FileAccess for first module. | ||
fa = sorted(RunDirectory(mock_modern_spb_proc_run).files, | ||
key=lambda fa: fa.filename)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
min()
also takes a key=
argument to do things like this. I'm not particularly requesting a change, I just remembered a neat feature.
Thank-you, LGTM |
Just to make sure, we all agree to invalidate existing run files maps? |
I'm OK with that. I don't want to invalidate that cache too frequently - that defeats the point of caching - but once in a while I don't think it's a big deal. If we wanted to get clever, we could say we invalidate it only for proc data, since it doesn't seem likely that the raw data will ever have links like this. But I suspect it's not worth the extra complexity to manage that, as the lower levels of EXtra-data don't have a raw/proc distinction so far. |
For the record, I did some benchmarking picking random files across all proposals to avoid caching. Looping over all INSTRUMENT sources, probing a single source whether it's a soft link or not takes between 500 us and 1 ms, 3.5 ms on average per file. |
The calibration team wants to discontinue the practice of writing corrected data under the same source name as raw data, but instead switch the type part of the source name from
DET
toCORR
. For limited backwards compatibility, we aim to insert soft links under the old source name in corrected files. While this would work transparently in EXtra-data, some more support for this concept seems prudent, which I'd like to start with this MR.It introduces the concept of legacy sources throughout the
FileAccess
,DataCollection
andSourceData
APIs. For now I limited this solely to tracking, i.e. it has no impact on any business logic when accessing files or data:FileAccess
will probe the source's leaf object for being a soft link and record its target. The source is counted as a regular source otherwise. For performance reasons, I limited this toINSTRUMENT
sources for now (simple benchmarking suggests tens of ms for this operation when cold)run_files_map
and will invalidate any existing cache.SourceData
objects know whether they're a legacy source through a non-None
value ofSourceData.legacy
,DataCollection
has alegacy_sources: dict
property.SourceData
object of a legacy source via aDataCollection
, aDeprecationWarning
is emitted.DataCollection.info()
tracks legacy sources separately in their own section alongside their target.For now it does not yet touch
data='all'
, which will still kick out the raw data. I would add tests once we're happy with the design.As part of some earlier tests, I also added support for multiple XTDF detectors in a single
DataCollection
. I'm happy to remove this, but it seems useful to have for the future (e.g. AGIPD1M and AGIPD4M in a single run).@takluyver @tmichela @dgoeries