Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace "inheritance" with "summarization" principle #65

Open
yarikoptic opened this issue Mar 19, 2024 · 25 comments
Open

Replace "inheritance" with "summarization" principle #65

yarikoptic opened this issue Mar 19, 2024 · 25 comments
Labels
inheritance modularity Issues affecting modularity and composition of BIDS datasets

Comments

@yarikoptic
Copy link
Contributor

yarikoptic commented Mar 19, 2024

It is a next step to the discussion which happened in

On a recent road-trip with @effigies we briefly discussed it and so far did not see a show stopper but it would require more minds to analyze.

ATM one of the problems of inheritance principle is unclear semantic in case of a value to be modified down the hierarchy: order can be unclear in case of multiple "candidate" files, unclear how to "remove" a value, etc.
And overall for a human it is cumbersome to "gather" the final value since for a file down the hierarchy someone needs to go through all possibly inherited files to arrive at the final value. But what if we take my suggestion in aforementioned issue further:

  • retain ability to "chain" candidates for metadata from higher to lower levels as in current inheritance principle
  • completely disallow overloading the value at lower (deeper in hierarchy) levels Corollaries:
    • if present at different levels (e.g. entire dataset and then specific sidecar .json) - value must be identical/consistent across all levels of inheritance, or otherwise not given at any higher level
    • if particular subject/session has some different value from the others as defined at higher (dataset) level, we need to remove that value from higher level and define at lower (e.g. subject/session) level

It will be a (now doable) job for a validator to ensure that all duplicated (across levels, if any) metadata is consistent.

As a result we would provide user a convenience that looking at top level metadata file provides a "guaranteed" correct metadata across all subject sessions, which is not the case currently as we can change it following the order of inheritance.

  • FWIW, we already do something like that in heudiconv, where top level task-*_bold.json files collate all identical values across subject/sessions -- makes it easy to see what is common (e.g. scanner ID etc)
  • Conceptually is what we have in BIDS ATM, e.g. participants.tsv summarizes metadata across participants and we expect it to be consistent with possible other phenotypic information to be found in subject/sessions.
    • Hence I think it also relates to BEP036 (Phenotypic Data Guidelines), attn @surchs @ericearl (I just now created @bids-standard/bep036 team) where the idea circles to be able to "segregate" metadata into subject/session level while keeping consistently in the top level (under phetotype/ folder).
  • It somewhat would allow for easier composition of Allow composition of a BIDS dataset (dataset level) from smaller (subj or subj/ses) level #59. Again -- metadata present on higher level would remain consistent with the lower, which would be easier to achieve (copy) and ensure (validator).

Attn @Lestropie as he has spent most time to improve Inheritance principle definition, and @dorahermes who is an active proponent and its user: do you think such "simplification" (removal of "value overload") of inheritance would simplify and remain usable? Or may be I do not see some common use case such additional "restriction" would disallow?

I think it might be worth writing some checker and apply it across all openneuro datasets to see if we run into such data "overloads". What would be a tool/functionality which implements inheritance principle already "closest to the bible", e.g. which pretty much would return a list of lists of .json/.tsv files in their "inherited" bundles? (specific code examples would be welcome)

Edit:

@yarikoptic yarikoptic moved this to Punted in BIDS 2.0 Mar 19, 2024
@yarikoptic yarikoptic removed the status in BIDS 2.0 Mar 19, 2024
@ericearl
Copy link

@yarikoptic Thanks for this thoughtful issue and for the @bids-standard/bep036 team. I can do the check in a month or so for how inheritance is currently being used in OpenNeuro, thanks to Datalad's datalad clone ///openneuro-ability, of course!

I mostly like the idea above, except maybe I'm confused about one thing. Let's say up top there's a task-rest_bold.json with most of the parameters put out by dcm2niix. Then down below in 20 subject's func directories there's a disagreement with FlipAngle or EchoTime or both between these 20 and a separate 5 subject's task-rest bold JSONs. I believe you're saying that EVERY subject's JSON in this scenario has to have FlipAngle and EchoTime in it though only 5 subjects differ. dcm2niix doesn't care that you shouldn't duplicate fields at lower levels. So you end up with a need to filter most of your JSONs in the subjects func folders for common metadata... This might be difficult for many new users especially.

I think whether we move forward with either the inheritance principle OR the summarization principle, the call is for tools to support either of them. If one small set of tools could be created to support either, whichever one makes it to the gate first could be most easily adopted. This is why I've had creating a set of inheritance software tools has been on my BIDS maintainers desirables list for a long time. Thoughts?

@yarikoptic
Copy link
Contributor Author

Let's say up top there's a task-rest_bold.json with most of the parameters put out by dcm2niix.

to be precise: it is not dcm2niix which places/creates such a file at the top level. It is a BIDS dataset "owner" who decides to take all or some fields from dcm2niix-produced sidecar .json for a specific .nii.gz and place that selection at the top of the dataset. So it is for some script/user to decide which fields to do copy to that file, dcm2niix is not really a "player" here.

I believe you're saying that EVERY subject's JSON in this scenario has to have FlipAngle and EchoTime in it though only 5 subjects differ.

Correct. Even if a single subject differs - such metadata should not be present at the level where it is not common for all levels below. Possible solutions:

  • remove that subject/section or scan as it is inconsistent with other scans for the task
  • if it is decided permissible to have such "divergent" acquisitions across subjects - contain such metadata only at the subject/session level (e.g. could have task-rest_bold.json on top level with all common, sub-X/task-rest_bold.json with subject specific, sub-X/ses-Y/task-rest_bold.json with subject/session specific, but must not be present with different values across levels).

So you end up with a need to filter most of your JSONs in the subjects func folders for common metadata... This might be difficult for many new users especially.

no -- new users just should not bother creating top level task-rest_bold.json with anything which is not common to all files underneath and they would stay "BIDS compliant". Then they could make use of some tools (e.g. here is function in heudiconv - populate_aggregated_jsons) to collect / rewrite top level .json files with only common metadata, or just write what they know is common (validator would verify that no conflicting/differing values present).

Re support -- correct, tools support would be needed... BUT "summarization" principle is just a more restricted case of inheritance if I see it right, so in principle any tool supporting current inheritance should work with "summarization" without any change.

An line of thought on .tsv + .json duality...

.tsv's we have are pretty much a case of summarization (as placing in a tabular structure within a single file) for entries where metadata could be different (e.g. age for a subject)... i.e. in participants.tsv we summarize commonalities and differences between participants. Overall we get {entitities}.tsv summarizing (flat list of) metadata fields typically (but not necessarily) different between values for that entity. In {entity}-{value}_{suffix}.json files we are providing what is common for that {value} (and paired with datatype {suffix}), and typically when we do not have such {entity}-{value}/ folder level separation (related #54), since then we would place common data and metadata under that folder.
Overall "gather metadata for {entity} of {value}" algorithm should load metadata from {entities}.tsv, and all applicable {entity}-{value}*.json. Any inconsistency in values make "order of loading" important and thus possibly ambiguous. They also make it mandatory to read all the files to get the ultimate value, as opposed to the proposed here case -- first loaded value is "good enough" since they all must be the same: age of participant from participants.tsv should be consistent with any other age loaded from e.g. somewhere in phenotype/ (shhh about multiple sessions etc...)

@Lestropie
Copy link

completely disallow overloading the value at lower (deeper in hierarchy) levels.

I'm a fan.
This would:

  • Simplify the description of the inheritance principle itself
  • Simplify software that wants to read all relevant sidecar data; they could just load all relevant JSONs in any directories in any order into a single dictionary
  • Be permissive of having multiple applicable sidecar files within a single directory level, which I would very much like to be able to utilise in BEP016
  • Be more faithful to the prospect of creating a piece of software that would analyse a BIDS dataset, identify sidecar data that is consistent and therefore promote it up the inheritance tree, thereby deferring utilisation of the inheritance principle entirely to software.
    (It sounds like you've already got a limited instance of this in heudiconv; I'd like for there to be something that does this across the board)

do you think such "simplification" (removal of "value overload") of inheritance would simplify and remain usable?

I would perhaps pose a different question. There's a bifurcation in opinions on the inheritance principle. I've personally been pushing for making it more powerful, which required improvement to the definition of current behaviour in order to facilitate the subsequent augmentation. Others would prefer that the whole principle disappear entirely, and all metadata relevant to a data file be present in the sidecar file.

The way I would therefore look at this proposal is: if the capacity for value overloading (specifically a present value at a higher level being overridden at a lower level) were to be removed, would this sway those previously opposed to the inheritance principle toward its preservation? So that's actually a question directed not at me but at others.

@yarikoptic
Copy link
Contributor Author

@Lestropie, a birdie said that you might be participating in BIDS hackathon (if only virtually)? Would you be interested to work on this one. It can already be done as a PR against

similar to WiP I just started

@Lestropie
Copy link

I'm not aiming to participant in the hackathon and looking for a project so much as want to take on automating the use of inheritance and see the hackathon as a potential way to motivate project commencement and get other people on board. I want to write it up as a proposal somewhere, but wasn't quite sure where would be best: it's not yet guaranteed that I'll be able to do the Hackathon, and what I have in mind is also not specific to BIDS 2.0. Maybe I should create an empty repository and start listing issues there.

@Lestropie
Copy link

See https://github.com/Lestropie/IP-me/issues for my current intentions on the topic.

@marcelzwiers
Copy link

I know that it looks like that the general consensus is that we should keep or improve the inheritance/summarization principle. However, I have yet to encounter a single dataset in which I have found any use for this principle, but I have encountered several datasets in which this principle caused headaches, hard maintenance work and created ugly / hacky codebases. If it were up to me, I would through out the whole principle and always store the complete metadata with the data. It costs nearly nothing in terms of diskspace and I think it would make everybody's life easier. TLDR: choose the KISS principle, not the inheritance principle

@marcelzwiers
Copy link

The way I see it is that the inheritance principle comes down to implementing a poor man's solution for a relational database on the filesystem level

@yarikoptic
Copy link
Contributor Author

@marcelzwiers the entire BIDS is "RDB on the filesystem level", so not surprising that pybids caches parsed structure in a local sql DB ;-)

re inheritance principle -- it is in heavy use everywhere, e.g. 20% of openneuro datasets use it for `*task-*_events.json` files
$> for d in ds*; do ls -ld $d/*events.json 2>/dev/null | head -n1; done | nl 
     1	-rw-r----- 1 yoh datalad 204 Apr 27  2020 ds000006/task-livingnonlivingdecisionwithplainormirrorreversedtext_events.json
     2	-rw------- 1 yoh datalad 128 Dec  2  2022 ds000031/events.json
     3	-rw-r----- 1 yoh datalad 284 Dec  4  2018 ds000164/task-stroop_events.json
     4	-rw-r----- 1 yoh datalad 596 Dec  4  2018 ds000214/task-Cyberball_events.json
     5	-rw-r----- 1 yoh datalad 1879 Apr 27  2020 ds000217/task-picturetest_events.json
     6	-rw-r----- 1 yoh datalad 857 Apr 27  2020 ds000223/task-mag_events.json
     7	-rw-r----- 1 yoh datalad 738 Apr 27  2020 ds000249/task-genInstrAv_events.json
     8	-rw-r--r-- 1 yoh datalad 1993 Aug 11  2020 ds001415/task-maplistening_events.json
     9	-rw-r----- 1 yoh datalad 1193 Jan 25  2019 ds001499/task-5000scenes_events.json
    10	-rw-r----- 1 yoh datalad 76 Dec  5  2018 ds001553/task-checkerboard_events.json
    11	-rw-r----- 1 yoh datalad 316 Aug 13  2019 ds001590/task-loc_events.json
    12	-rw-r----- 1 yoh datalad 869 Dec  5  2018 ds001597/task-cuedMFM_events.json
    13	-rw-r----- 1 yoh datalad 567 Aug 20  2019 ds001608/task-viewclips_events.json
    14	-rw-r----- 1 yoh datalad 528 Mar 18  2019 ds001740/task-convers_events.json
    15	-rw-r----- 1 yoh datalad 231 Aug 12  2019 ds001771/task-identification_events.json
    16	-rw-r----- 1 yoh datalad 739 Feb 26  2021 ds001785/task-adapt_events.json
    17	-rw-r----- 1 yoh datalad 969 Mar  4  2021 ds001787/task-meditation_events.json
    18	-rw-r----- 1 yoh datalad 1860 Aug 14  2019 ds001810/task-attentionalblink_events.json
    19	-rw-r----- 1 yoh datalad 2641 Feb 25  2021 ds001814/task-ARC_events.json
    20	-rw-r----- 1 yoh datalad 340 Feb 25  2021 ds001838/task-Adaptation_events.json
    21	-rw-r----- 1 yoh datalad 903 Aug 20  2019 ds001840/task-viewclips_events.json
    22	-rw-r----- 1 yoh datalad 1594 Jan 31  2022 ds001848/task-ParallelAdaptation_events.json
    23	-rw-r----- 1 yoh datalad 1274 Aug 14  2019 ds001894/task-AANonWord_events.json
    24	-rw-r----- 1 yoh datalad 2831 Aug 19  2019 ds001971/task-AudioCueWalkingStudy_events.json
    25	-rw-r----- 1 yoh datalad 3110 Aug 20  2019 ds002011/task-Overlap_events.json
    26	-rw-r----- 1 yoh datalad  410 Aug 20  2019 ds002013/task-CircCon_events.json
    27	-rw-r----- 1 yoh datalad 428 Aug 20  2019 ds002033/task-ChangeDetection_events.json
    28	-rw-r----- 1 yoh datalad 1671 Dec  3  2019 ds002041/task-TD_events.json
    29	-rw-r----- 1 yoh datalad 340 Feb 25  2021 ds002116/task-Adaptation_events.json
    30	-rw-r----- 1 yoh datalad 761 Feb 25  2021 ds002158/task-main_events.json
    31	-rw-r----- 1 yoh datalad 649 Dec  3  2019 ds002185/task-odors_events.json
    32	-rw-r----- 1 yoh datalad 604 Dec  3  2019 ds002218/task-Experiment_events.json
    33	-rw-r----- 1 yoh datalad 1242 Apr 27  2020 ds002236/task-AudRhyme_events.json
    34	-rw-r--r-- 1 yoh datalad 229 Jul 16  2020 ds002351/task-LDT_events.json
    35	-rw-r----- 1 yoh datalad 925 Apr 27  2020 ds002366/task-emoregRun1_events.json
    36	-rw-r----- 1 yoh datalad 924 Apr 27  2020 ds002411/task-ProgramCategorization_events.json
    37	-rw-r----- 1 yoh datalad 798 Apr 27  2020 ds002419/task-taste1_events.json
    38	-rw-r----- 1 yoh datalad 1400 May  4  2021 ds002424/task-SLD_events.json
    39	-rw-r----- 1 yoh datalad 193 Apr 27  2020 ds002522/task-CRF_events.json
    40	-rw------- 1 yoh datalad 1360 Dec  2  2022 ds002578/events.json
    41	-rw-r----- 1 yoh datalad 1068 Apr 25  2022 ds002603/task-wm_events.json
    42	-rw------- 1 yoh datalad 925 Apr 29 10:53 ds002620/task-emoregRun1_events.json
    43	-rw-r----- 1 yoh datalad 123 Feb 25  2021 ds002634/task-ArtVoc_events.json
    44	-rw-r----- 1 yoh datalad  696 Feb 25  2021 ds002647/task-IHG_events.json
    45	-rw------- 1 yoh datalad 974 Nov 28  2023 ds002680/events.json
    46	-rw-r----- 1 yoh datalad 1381 Apr 27  2020 ds002687/task-SLD_events.json
    47	-rw------- 1 yoh datalad 349 Nov 28  2023 ds002691/events.json
    48	-rw-r----- 1 yoh datalad 3151 Apr 25  2022 ds002718/task-FaceRecognition_events.json
    49	-rw-r----- 1 yoh datalad 1596 Jan 31  2022 ds002738/task-reward_events.json
    50	-rw------- 1 yoh datalad  977 Jan 19  2023 ds002761/task-loc_events.json
    51	-rw-r----- 1 yoh datalad 772 May 13  2020 ds002776/task-motorseq_events.json
    52	-rw-r----- 1 yoh datalad  423 Mar  4  2021 ds002785/task-anticipation_acq-seq_events.json
    53	-rw-r----- 1 yoh datalad 2658 Mar  4  2021 ds002790/task-emomatching_acq-seq_events.json
    54	-rw-r----- 1 yoh datalad 1224 Feb 25  2021 ds002813/task-fintest_events.json
    55	-rw-r----- 1 yoh datalad 1898 Feb 25  2021 ds002835/task-prospection_events.json
    56	-rw-r----- 1 yoh datalad 455 Jun 18  2021 ds002843/task-itc_events.json
    57	-rw-r--r-- 1 yoh datalad 489 Jun 11  2020 ds002872/task-illusion_events.json
    58	-rw-r--r-- 1 yoh datalad 1274 Jun  9  2020 ds002879/task-AANonWord_events.json
    59	-rw-r--r-- 1 yoh datalad 2613 Jun  9  2020 ds002886/task-Syllogisms_events.json
    60	-rw------- 1 yoh datalad 4504 Nov 28  2023 ds002893/task-AuditoryVisualShift_events.json
    61	-rw-r--r-- 1 yoh datalad  804 Jun 17  2020 ds002894/task-languagelocalizer_events.json
    62	-rw-r--r-- 1 yoh datalad  804 Jul 21  2020 ds002905/task-languagelocalizer_events.json
    63	-rw-r--r-- 1 yoh datalad 1295 Jun 29  2020 ds002941/task-Mult_events.json
    64	-rw------- 1 yoh datalad 1127 Dec 19  2022 ds002989/task-DDbid_events.json
    65	-rw-r--r-- 1 yoh datalad 1082 Jul  8  2020 ds002995/task-tastemap_events.json
    66	-rw-r--r-- 1 yoh datalad 1295 Aug 14  2020 ds003028/task-Mult_events.json
    67	-rw------- 1 yoh datalad 2979 Aug 22  2022 ds003061/task-P300_events.json
    68	-rw-r--r-- 1 yoh datalad 2613 Aug 14  2020 ds003076/task-Syllogisms_events.json
    69	-rw-r--r-- 1 yoh datalad 1295 Sep  2  2020 ds003083/task-Mult_events.json
    70	-rw-r----- 1 yoh datalad 66 Oct 22  2020 ds003136/task-affect_events.json
    71	-rw-r----- 1 yoh datalad 1194 Oct 23  2020 ds003242/task-CIC_events.json
    72	-rw-r----- 1 yoh datalad 1045 Jan 31  2022 ds003340/task-foodpicture_events.json
    73	-rw-r----- 1 yoh datalad 324 Oct 23  2020 ds003342/task-grasp_events.json
    74	-rw-r----- 1 yoh datalad 333 Mar  4  2021 ds003436/task-anim_events.json
    75	-rw-r----- 1 yoh datalad 949 Feb 25  2021 ds003454/task-rapm_events.json
    76	-rw-r----- 1 yoh datalad 1625 Feb 25  2021 ds003459/task-audortho_events.json
    77	-rw-r----- 1 yoh datalad 613 May  4  2021 ds003487/task-PIT_events.json
    78	-rw-r----- 1 yoh datalad 2661 Feb 25  2021 ds003495/task-emomatching_acq-seq_events.json
    79	-rw-r----- 1 yoh datalad 1091 Jan 18  2022 ds003499/task-freq1_events.json
    80	-rw-r----- 1 yoh datalad 2561 Feb 25  2021 ds003500/task-Conj19Sel_events.json
    81	-rw-r----- 1 yoh datalad 2052 Mar  8  2021 ds003511/task-Recall_events.json
    82	-rw-r----- 1 yoh datalad 2724 Jul 22  2021 ds003550/task-RepMem1_events.json
    83	-rw-r----- 1 yoh datalad 1647 Jul 22  2021 ds003553/task-FacesHousesTE27_events.json
    84	-rw-r----- 1 yoh datalad 2283 Jul 22  2021 ds003554/task-RepYo1_events.json
    85	-rw-r--r-- 1 yoh datalad 229 Mar 18  2021 ds003569/task-LDT_events.json
    86	-rw-r----- 1 yoh datalad 816 May  4  2021 ds003574/task-game_run-1_events.json
    87	-rw-r----- 1 yoh datalad 2624 May  4  2021 ds003604/task-Gram_events.json
    88	-rw-r----- 1 yoh datalad 8689 Jun 18  2021 ds003645/task-FacePerception_events.json
    89	-rw------- 1 yoh datalad 1659 Apr 29 10:54 ds003684/task-dsp_events.json
    90	-rw-r----- 1 yoh datalad 2371 Jul 22  2021 ds003703/task-listeningToSpeech_events.json
    91	-rw-r----- 1 yoh datalad 1772 Jul 22  2021 ds003708/task-ccep_events.json
    92	-rw-r----- 1 yoh datalad 801 Jul 22  2021 ds003711/events.json
    93	-rw-r----- 1 yoh datalad  487 Jan 18  2022 ds003721/task-BI_events.json
    94	-rw-r----- 1 yoh datalad 812 Jul 22  2021 ds003722/task-MIvsRest_events.json
    95	-rw-r----- 1 yoh datalad 2036 Jan 18  2022 ds003758/task-beads_events.json
    96	-rw-r----- 1 yoh datalad 1171 Jan 18  2022 ds003772/task-changepoint_events.json
    97	-rw-r----- 1 yoh datalad 812 Jan 18  2022 ds003810/task-MIvsRest_events.json
    98	-rw-r----- 1 yoh datalad 605 Jan 18  2022 ds003812/events.json
    99	-rw------- 1 yoh datalad 1287 Aug 22  2022 ds003823/task-emotionRegulation_events.json
   100	-rw-r----- 1 yoh datalad 1678 Jan 18  2022 ds003825/task-rsvp_events.json
   101	-rw------- 1 yoh datalad 1318 Apr 29 10:53 ds003834/task-fam1back_events.json
   102	-rw------- 1 yoh datalad 703 Apr 29 10:53 ds003835/events.json
   103	-rw-r----- 1 yoh datalad 3346 Jan 18  2022 ds003846/task-PredError_events.json
   104	-rw------- 1 yoh datalad 1290 Apr 29 10:54 ds003851/task-train_events.json
   105	-rw-r----- 1 yoh datalad 3530 Jan 18  2022 ds003858/task-MID_events.json
   106	-rw-r----- 1 yoh datalad 2425 Apr 25  2022 ds003965/task-face_events.json
   107	-rw-r----- 1 yoh datalad 4770 Jan 31  2022 ds004010/task-MultisensoryDetectionTask_events.json
   108	-rw------- 1 yoh datalad 520 Jan  4 14:22 ds004012/task-auditorystimuli_events.json
   109	-rw-r----- 1 yoh datalad 894 Apr 25  2022 ds004018/task-rsvp_events.json
   110	-rw-r----- 1 yoh datalad 793 Apr 25  2022 ds004073/task-PD_events.json
   111	-rw------- 1 yoh datalad 2134 May 25  2023 ds004080/events.json
   112	-rw------- 1 yoh datalad 853 Aug 22  2022 ds004086/task-RecogConf_events.json
   113	-rw-r----- 1 yoh datalad 3473 Apr 25  2022 ds004091/task-AttendFixGazeCenterFS_events.json
   114	-rw------- 1 yoh datalad 1928 Aug 22  2022 ds004094/task-induct_events.json
   115	-rw------- 1 yoh datalad 6295 Aug 22  2022 ds004105/task-DriveRandomSound_events.json
   116	-rw------- 1 yoh datalad 5092 Aug 22  2022 ds004106/task-GuardDuty_events.json
   117	-rw------- 1 yoh datalad 3934 Aug 22  2022 ds004117/task-WorkingMemory_events.json
   118	-rw------- 1 yoh datalad 5568 Aug 22  2022 ds004118/task-Drive_events.json
   119	-rw------- 1 yoh datalad 4126 Aug 22  2022 ds004119/task-GuardDuty_events.json
   120	-rw------- 1 yoh datalad 6013 Aug 22  2022 ds004120/task-DriveWithSpeedChange_events.json
   121	-rw------- 1 yoh datalad 9492 Aug 22  2022 ds004121/task-DriveWithTaskAudio_events.json
   122	-rw------- 1 yoh datalad 4475 Aug 22  2022 ds004122/task-Drive_events.json
   123	-rw------- 1 yoh datalad 9928 Aug 22  2022 ds004123/task-DriveWithComplexity_events.json
   124	-rw------- 1 yoh datalad 766 Aug 22  2022 ds004128/task-DG_events.json
   125	-rw------- 1 yoh datalad 1227 Jan 19  2023 ds004192/task-things_events.json
   126	-rw------- 1 yoh datalad 1302 Aug 22  2022 ds004194/task-prf_events.json
   127	-rw------- 1 yoh datalad 2543 Aug 22  2022 ds004200/task-temporalscaling_events.json
   128	-rw------- 1 yoh datalad 1089 May 26  2023 ds004212/task-main_events.json
   129	-rw------- 1 yoh datalad 692 Dec  2  2022 ds004228/task-piper_events.json
   130	-rw------- 1 yoh datalad 3473 May 25  2023 ds004271/task-AttendFixGazeCenterFS_events.json
   131	-rw------- 1 yoh datalad 2242 Dec 19  2022 ds004295/task-task_events.json
   132	-rw------- 1 yoh datalad 526 Dec 19  2022 ds004302/task-speech_events.json
   133	-rw------- 1 yoh datalad 1045 May 25  2023 ds004312/task-foodpicture_events.json
   134	-rw------- 1 yoh datalad 461 Aug  9  2023 ds004341/task-semenc_events.json
   135	-rw------- 1 yoh datalad 1584 Dec 19  2022 ds004349/task-expo_events.json
   136	-rw------- 1 yoh datalad 2904 Dec 19  2022 ds004350/task-LG_events.json
   137	-rw------- 1 yoh datalad 2114 Dec 19  2022 ds004356/task-MusicvsSpeech_events.json
   138	-rw------- 1 yoh datalad 2051 May 25  2023 ds004357/task-rsvp_events.json
   139	-rw------- 1 yoh datalad 4665 Dec 19  2022 ds004362/task-motion_events.json
   140	-rw------- 1 yoh datalad 1385 Dec 19  2022 ds004367/task-rdk_events.json
   141	-rw------- 1 yoh datalad 2632 Dec 19  2022 ds004368/task-task_events.json
   142	-rw------- 1 yoh datalad  977 May 25  2023 ds004398/task-loc_events.json
   143	-rw------- 1 yoh datalad 406 May 26  2023 ds004400/events.json
   144	-rw------- 1 yoh datalad 563 May 25  2023 ds004444/task-smrbmi_events.json
   145	-rw------- 1 yoh datalad 563 May 25  2023 ds004446/task-smrbmi_events.json
   146	-rw------- 1 yoh datalad 563 May 25  2023 ds004447/task-smrbmi_events.json
   147	-rw------- 1 yoh datalad 563 May 25  2023 ds004448/task-smrbmi_events.json
   148	-rw------- 1 yoh datalad 1772 May 25  2023 ds004457/task-ccep_events.json
   149	-rw------- 1 yoh datalad 424 May 25  2023 ds004460/task-Rotation_events.json
   150	-rw------- 1 yoh datalad 2 Aug  9  2023 ds004475/task-task_events.json
   151	-rw------- 1 yoh datalad 110 May 25  2023 ds004488/task-action_events.json
   152	-rw------- 1 yoh datalad 243 May 25  2023 ds004496/task-imagenet_events.json
   153	-rw------- 1 yoh datalad 2 May 25  2023 ds004519/task-ProAntiCue_events.json
   154	-rw------- 1 yoh datalad 2 May 25  2023 ds004520/task-Retrocue_events.json
   155	-rw------- 1 yoh datalad 2 May 25  2023 ds004521/task-Postcues_events.json
   156	-rw------- 1 yoh datalad 15794 May 25  2023 ds004532/task-PST_events.json
   157	-rw------- 1 yoh datalad 1385 May 25  2023 ds004554/task-picturenaming_events.json
   158	-rw------- 1 yoh datalad 676 May 25  2023 ds004556/task-feedback_events.json
   159	-rw------- 1 yoh datalad 676 May 25  2023 ds004557/task-feedback_events.json
   160	-rw------- 1 yoh datalad 1129 Sep 23  2023 ds004562/task-adaptation_events.json
   161	-rw------- 1 yoh datalad 1854 Aug  9  2023 ds004563/task-touchdecoding_events.json
   162	-rw------- 1 yoh datalad 523 May 25  2023 ds004574/task-Oddball_events.json
   163	-rw------- 1 yoh datalad 740 May 25  2023 ds004575/task-IntervalTiming_events.json
   164	-rw------- 1 yoh datalad 740 May 25  2023 ds004579/task-IntervalTiming_events.json
   165	-rw------- 1 yoh datalad 479 May 25  2023 ds004580/task-Simon_events.json
   166	-rw------- 1 yoh datalad 2 Jun 27  2023 ds004584/task-Rest_events.json
   167	-rw------- 1 yoh datalad 2 Jun  1  2023 ds004588/task-unnamed_events.json
   168	-rw------- 1 yoh datalad 1312 Jun 27  2023 ds004592/task-gradCPTface_events.json
   169	-rw------- 1 yoh datalad 684 Aug  9  2023 ds004602/task-ERNPsychometrics_events.json
   170	-rw------- 1 yoh datalad 687 Jun 27  2023 ds004606/task-msit_events.json
   171	-rw------- 1 yoh datalad 840 Jun 27  2023 ds004609/task-msit_events.json
   172	-rw------- 1 yoh datalad 840 Nov 28  2023 ds004621/task-msit_events.json
   173	-rw------- 1 yoh datalad 937 Apr 29 10:55 ds004625/task-UnevenTerrain_events.json
   174	-rw------- 1 yoh datalad 4579 Aug  9  2023 ds004626/task-DotProbe_events.json
   175	-rw------- 1 yoh datalad 1541 Aug  9  2023 ds004635/task-resting_events.json
   176	-rw------- 1 yoh datalad 2736 Nov 28  2023 ds004636/task-ANT_events.json
   177	-rw------- 1 yoh datalad 45169 Nov 28  2023 ds004657/task-Drive_events.json
   178	-rw------- 1 yoh datalad 7484 Nov 28  2023 ds004660/task-P300_events.json
   179	-rw------- 1 yoh datalad 11772 Nov 28  2023 ds004661/task-nback_events.json
   180	-rw------- 1 yoh datalad 3180 Sep 23  2023 ds004692/task-study_events.json
   181	-rw------- 1 yoh datalad 1664 Jan  4 14:20 ds004724/task-antisaccade_events.json
   182	-rw------- 1 yoh datalad 1929 Sep 23  2023 ds004745/task-unnamed_events.json
   183	-rw------- 1 yoh datalad 706 Nov 28  2023 ds004746/task-paingen_events.json
   184	-rw------- 1 yoh datalad 412 Nov 28  2023 ds004771/task-PY_events.json
   185	-rw------- 1 yoh datalad 734 Jan  4 14:21 ds004784/task-phantom_events.json
   186	-rw------- 1 yoh datalad 1385 Nov 28  2023 ds004785/task-unnamed_events.json
   187	-rw------- 1 yoh datalad 687 Apr 29 10:57 ds004796/task-msit_events.json
   188	-rw------- 1 yoh datalad 3900 Nov 28  2023 ds004802/task-roddball_events.json
   189	-rw------- 1 yoh datalad 1287 Nov 28  2023 ds004816/task-rsvp_events.json
   190	-rw------- 1 yoh datalad 1287 Nov 28  2023 ds004817/task-rsvp_events.json
   191	-rw------- 1 yoh datalad 26908 Nov 28  2023 ds004841/task-DriveOnMission_events.json
   192	-rw------- 1 yoh datalad 28251 Nov 28  2023 ds004842/task-DriveOnMission_events.json
   193	-rw------- 1 yoh datalad 20064 Nov 28  2023 ds004843/task-VisualSituationalAwareness_events.json
   194	-rw------- 1 yoh datalad 17863 Nov 28  2023 ds004844/task-Drive_events.json
   195	-rw------- 1 yoh datalad 11772 Nov 28  2023 ds004849/task-nback_events.json
   196	-rw------- 1 yoh datalad 11772 Nov 28  2023 ds004850/task-nback_events.json
   197	-rw------- 1 yoh datalad 11772 Nov 28  2023 ds004851/task-nback_events.json
   198	-rw------- 1 yoh datalad 11772 Nov 28  2023 ds004852/task-nback_events.json
   199	-rw------- 1 yoh datalad 11772 Nov 28  2023 ds004853/task-nback_events.json
   200	-rw------- 1 yoh datalad 11772 Nov 28  2023 ds004854/task-nback_events.json
   201	-rw------- 1 yoh datalad 11772 Nov 28  2023 ds004855/task-nback_events.json
   202	-rw------- 1 yoh datalad 1681 Nov 28  2023 ds004860/task-HarmN400_events.json
   203	-rw------- 1 yoh datalad 506 Jan  4 14:22 ds004883/task-FFERN_events.json
   204	-rw------- 1 yoh datalad 329 Apr 29 10:56 ds004894/events.json
   205	-rw------- 1 yoh datalad 1664 Apr 29 10:54 ds004935/task-antisaccade_events.json
   206	-rw------- 1 yoh datalad 1560 Apr 29 10:57 ds004942/task-SpatialMemory_events.json
   207	-rw------- 1 yoh datalad 6783 Apr 29 11:00 ds005012/task-mid_events.json
   208	-rw------- 1 yoh datalad 954 Apr 29 11:01 ds005021/task-tiltillusion_events.json

and as a "paradigm" it is pretty much is what participants.tsv, sessions.tsv etc are about -- summarization of metadata for underlying data in the hierarchy.

@yarikoptic
Copy link
Contributor Author

But besides "paradigm" applicability, I am not sure I saw (but I never looked) application of it for .tsv files. @effigies @Lestropie are you aware of some good examples uses of inheritance for .tsv files?

Moreover inheritance principle is somewhat specific for .tsv and .bval/.bvec files in that there is no "inheritance" -- lowest level in hierarchy is taken (.json - accumulates from higher levels), and that plays better with "summarization".

@effigies
Copy link

Inheritance is baked into channels.tsv and electrodes.tsv, as these are generally expected to be constant within sessions, so they have fewer entities than the data files they apply to. We are having to add entities because some dataset curators want to duplicate them for every data file, which was not previously excluded by the validator. While this is allowing curators to decrease their reliance on the inheritance principle, for tools, it increases it, as they now must look for the same file in more potential locations.

For TSV files, I think the equivalent to the summarization principle would be that there must be exactly one applicable TSV file of a given type. So you could have a channels.tsv for each data file, but that would be mutually exclusive with one for the entire session. Likewise, you could have one task-nback_events.tsv at the root level, but then that must not be overridden by a specific run.

@marcelzwiers
Copy link

the entire BIDS is "RDB on the filesystem level"

I don't agree with that and as I see it, BIDS is a study format, nothing relational about it. True, for some data there are two locations for storing thing, either in the subject folder, or e.g. in the particpants.tsv file. But then you always choose, one or the other, there is never a relation between them (like there is a relation between jsons adhering to inheritance principle).

And the fact that 20% of the openneuro datasets use it is a description of the situation, not an argument for it's benefits :-) (it would have been trivially easy for these studies to store it all on the sub/ses level)

Moreover inheritance principle is somewhat specific for .tsv and .bval/.bvec files in that there is no "inheritance" -- lowest level in hierarchy is taken (.json - accumulates from higher levels), and that plays better with "summarization".

I agree with that, and I do support your "summarization" proposal as an improvement over the inheritance principle... It meets the goal of metadata deduplication, while reducing ambiguities and overly complex schemes, e.g. when pooling data

@Remi-Gau
Copy link

it would have been trivially easy for these studies to store it all on the sub/ses level

Trivial to some but not to all: remember that some of the people who create the datasets barely know what a json file is or how to interact with it with python or matlab.

So creating (and especially updating) a single file at the root of the dataset will be a lot easier for them than having to edit manually many many many json files.

@marcelzwiers
Copy link

Trivial to some but not to all: remember that some of the people who create the datasets barely know what a json file is or how to interact with it with python or matlab.

I actually fear for them dealing properly with the inheritance principle. I think it would be better to have such people use tools for editing/maintaining BIDS datasets, such as CuBIDS?

@yarikoptic
Copy link
Contributor Author

And the fact that 20% of the openneuro datasets use it is a description of the situation, not an argument for it's benefits :-)

it was a response to your

However, I have yet to encounter a single dataset in which I have found any use for this principle

not an argument although it can easily become one if expanded, e.g. "I and others, as shown by above example, find it extremely useful". But this issue is not about that topic. If you would like to discuss inheritance principles cons, please chime in instead on

@yarikoptic
Copy link
Contributor Author

Likewise, you could have one task-nback_events.tsv at the root level, but then that must not be overridden by a specific run.

In principle, I think this should be ok in "summarization" formulation as "overridden" would be replaced with "duplicated". In practice it would be tricky/impossible since for _events.tsv there is really nothing which could constitute the "identity" of an event, so unless an event row duplicated exactly, it would be just another added event (possibly for the same onset/duration but different metadata), so impossible to identify and to warn user that there might be inconsistency etc.

@dorahermes
Copy link
Member

I completely agree with the above proposal as it eliminates value overloading (having a value at one level override a value at a different level).

Some notes from a BIDS curation perspective in a clinical environment. Non-technical staff does really well working with human readable files with simple rules and avoiding the use of additional software packages: either a file exists at the top, or at the individual level. This works most of the time. When we have to change a field at the individual level in hundreds of files, they often reach out to someone who can code.

One example use-case is the channels.tsv file for EEG/iEEG. This file exists for every data file and bad channels can be annotated there as they can differ across sessions and runs. The columns are the same across all subjects but include some optional user specified columns. If a channels.json file can exist only at the top level to specify these columns (that are the same across all subjects) that is convenient. The proposal described here would strongly facilitate this use case, which is extremely common for us, if I am correct.

@effigies
Copy link

I think we may be getting off-topic (feel free to hide this comment as off-topic if you agree), but I'm confused by the following:

In practice it would be tricky/impossible since for _events.tsv there is really nothing which could constitute the "identity" of an event, so unless an event row duplicated exactly, it would be just another added event (possibly for the same onset/duration but different metadata), so impossible to identify and to warn user that there might be inconsistency etc.

TSV files are not merged, they are located. Unless you are proposing this change, nobody would try to merge events.tsv files found at multiple levels. Given that you say it would be tricky/impossible, I don't think you're proposing it...

Now, TSV files can be joined, but those are specific ones. For example participants.tsv, sub-*_sessions.tsv and sub-*[_ses-*]_scans.tsv can be joined on the participant_id and session_id columns in order to provide metadata for each scan file, but this isn't a merging of two files with the same suffix.

@yarikoptic
Copy link
Contributor Author

@dorahermes re channels.tsv -- could you elaborate more, may be point to example dataset? The

The columns are the same across all subjects but include some optional user specified columns.

sounds like requiring common columns provided at top level channels.tsv and then per subj/session sub-*_ses-*_channels.tsv providing additional columns... If I get it right , it would go "against" current inheritance rule and our discussion above with @effigies on that:

TSV files are not merged, they are located. Unless you are proposing this change, nobody would try to merge events.tsv files found at multiple levels. Given that you say it would be tricky/impossible, I don't think you're proposing it...

I am "considering" or "approaching" it ;-) And as @dorahermes points out above (if I got her right) we might want to not just "append" but "extend" (more like we do for json if we consider json to be a simple single row, and tsv is a list of such rows). Overall, I think it could be very beneficial if we could generalize principle so it doesn't differ in handling .tsv and .json files.

Note that if we have participant_id and session_id , we only have name and not channel_id within channels.tsv

@yarikoptic
Copy link
Contributor Author

yarikoptic commented May 31, 2024

re _channels.tsv: a note that we do not force uniqueness on "name" of a channel. Also there is no entities for those suffixes such as channel and event, thus no _ids, name of which is {entitylongname}_id and value is {entityshortname}-{value} and already defined for

NB "edited" for difference in name/value

❯ grep '_id:$' objects/columns.yaml
desc_id:
participant_id:
sample_id:
session_id:

but I guess could be generalized for any entity (context: #54).
So inheritance/summarization could be easily extended to support loading from multiple .tsv files "appending" (rows) and/or "extending" (columns) for files with _id columns ensuring alignment etc.

@marcelzwiers
Copy link

marcelzwiers commented May 31, 2024

Another issue I haven't seen much discussion on (but correct me if I'm wrong, as I also missed the previous discussion on the inheritance cons, thank you @yarikoptic), is what I would call the file collection/grouping problem. So how to deal with e.g.:

[summarize.json]
sub-01
  `anat
     |-sub-01_run-1_acq-foo_MP2RAGE.nii.gz
     |-sub-01_run-1_acq-foo_UNIT1.nii.gz
     |-sub-01_run-2_acq-foo_MP2RAGE.nii.gz
     |-sub-01_run-2_acq-foo_UNIT1.nii.gz
     |-sub-01_acq-bar_MP2RAGE.nii.gz
     `-sub-01_acq-bar_UNIT1.nii.gz

How many summarize json sidecars would you make? Obviously, you would not make one for each run, but would you make one for each acq value? Would it be useful to have something like an IntendFor field in the summarize json (with support for wildcards, so you don't have to include an explicit list, but just the semantics. E.g. {"IntendedFor": "bids::sub-*/anat/sub-*_run-*_acq-foo_*.nii.gz"})?

@Lestropie
Copy link

Lestropie commented Jun 3, 2024

... I am not sure I saw (but I never looked) application of it for .tsv files.... are you aware of some good examples uses of inheritance for .tsv files?

I don't deal with a wide breadth of different BIDS data from which to generate examples, but one that always irks me is complex DWI data. It is increasingly recommended to export magnitude & phase data for DWI as it facilitates superior denoising. In the absence of inheritance, this means that the diffusion gradient table (which is currently exclusively bvec / .bval rather than .tsv as originally requested, but could be .tsv following bids-standard/bids-specification#352) would need to be exactly duplicated across the magnitude and phase component images. Defining these data once, omitting the _part-(mag|phase) entity, to me makes far more sense. But as with other discussion here, this is purely use of the IP (here: Inheritance Principle) to avoid duplication, not to supersede.

I know that it looks like that the general consensus is that we should keep or improve the inheritance/summarization principle.

The discussion on this Issue might skew differently to community opinion. I've been told on multiple occasions that there are many who would prefer for it to be removed entirely. I don't have my finger on the pulse on exactly what those proportions might look like.

One concern I have is that a naive community poll may skew toward removal because of a) an expectation of manual curation of such and b) consideration of raw BIDS data only, whereas community opinions following a) creation of a tool for automated application and b) consideration of the complexities of derivative data may yield a different result. So I'd like to at least create a compelling case.

So how to deal with e.g.:

All depends on the metadata contents; and more esoterically whether JSON files without suffices are permitted.
At the extreme end, I could imagine:

  • sub-01.json containing all fields applicable to all images
  • sub-01/anat/sub-01_MP2RAGE.json containing any fields consistent across all _MP2RAGE images
  • sub-01/anat/sub-01_UNIT1.json containing any fields consistent across all _UNIT1 images (eg. units?)
  • sub-01/anat/sub-01_acq-foo.json containing any fields applicable only to acq-foo (ie. how it differs from acq-bar)
  • sub-01/anat/sub-01_acq-bar.json containing any fields applicable only to acq-bar (ie. how they differ from acq-foo)
  • sub-01/anat/sub-01_run-1_acq-foo.json containing any fields applicable only to run 1 of acq-bar (ie. what differs to run 2, maybe acquisition time?)
  • sub-01/anat/sub-01_run-2_acq-foo.json containing any fields applicable only to run 2 of acq-bar (ie. what differs to run 1)

This actually ends up with more metadata files than there are data files. But unlike exclusively using sidecars, it is immediately discoverable exactly what it is that differs between eg. entity-linked file collections acq-foo and acq-bar, by the contents of the respectively named metadata files. These may be more obscure use cases in the context of BIDS Raw, but in my experience with trying to develop complex BIDS Derivatives I think that cases like these are going to be increasingly prevalent in time.

Regardless of my own opinion, I don't see the debate progressing in an informed way in the absence of tangible examples of what data look like with vs. without the IP, or in the absence of software to use or not use the IP (complex examples like that above I would never expect a human to manually curate). Hence why I invested some time and effort in generating an Issue list for such: https://github.com/Lestropie/IP-freely/issues.

@marcelzwiers
Copy link

But unlike exclusively using sidecars, it is immediately discoverable exactly what it is that differs between eg. entity-linked file collections acq-foo and acq-bar, by the contents of the respectively named metadata files.

Yes, that's nice, but I think that this level of complexity just to deduplicate to the bitter end can be hard to grasp and would harm the acceptation / proper use of BIDS amongst the average neuroscientists. The inheritance principle makes things much less human readable and simple. For instance, I cannot just inspect a sidecar file anymore, I need tooling to search for data in the filetree hierarchy to get a complete view. So before deciding on a solution, I think we should clearly define who the users are that the inheritance principle tries to target? Is it the neuroscientist that manually edits/curates their BIDS data? Is it the programmer that makes BIDS-derivatives processing pipelines? And we need to consider if the benefits for one group of users really outweighs the downsides for the other users. I believe the summarize proposal of @yarikoptic is aimed as a middle ground?

@Lestropie
Copy link

Fully appreciate the argument for IP abolition. There's a good reason there's no consensus on the topic.

The question of "not just inspecting a sidecar ... needing tooling to search for (meta)data in the file tree hierarchy" has a natural converse, being something like "metadata are not unique ... need tooling to determine what data in the file tree hierarchy take the same values". There's complex relationships between metadata across data files regardless of how you cut it, it's a question of what types of operations you want to best facilitate.

What's landed me on the pro-IP side is that I'm further along in attempting to standardise complex derivatives. Consider the second of the two cases above. In a BIDS raw dataset, if two data files have the same value for some metadata field, that might be interesting, or it might not be. I would personally argue that it communicates the natural hierarchical nature of the data, but agree it comes with a complexity cost if stored explicitly in such a way. But with BIDS raw, data files are generally pretty independent of one another (with the exception of entity-linked file collections, which I'll come back to). With BIDS Derivatives, it will be more common for there to be more strongly linked file collections: a "singular" computational outcome is often by necessity spread across multiple data files. Here, shared metadata across sidecars is not mere happenstance or an opportunity for storage compression: the dataset would be considered corrupt were those metadata to not be exactly identical across data files. Moreover, within a dataset containing many files in a modality directory, human discernment of what data files encode the results of that particular computation vs. encode something else becomes increasingly difficult; a metadata file containing the relevant fields that is applicable via IP only to data files encoding the outcome of that computation would clearly communicate that grouping.

This is really just the existing entity-linked file collections concept, only more strongly asserted. Enhancing the IP, particularly by removing the restriction of one applicable metadata file per filesystem level, would greatly enhance this concept. Currently, there's no way to really "encode" an entity-linked file collection. Different data files may have more or less metadata fields that are equal or different between them, and more or less mutual vs. distinct entities, but it's all quite "fuzzy". Defining a metadata file that is applicable to multiple data files, containing only the mutually shared metadata fields, and named based on only the mutual entities, would be what defines that entity-linked file collection.

Also, given the principle is not a novel proposal for 2.0 but has stood throughout 1.x, I think there's a need for better tooling regardless of what happens for 2.0. Any software for reading / writing BIDS data really should by now be fully IP-aware. And I think there's moreover a need for software dedicated to the IP. I think having such tooling at hand might help inform that decision making process.

I think we should clearly define who the users are that the inheritance principle tries to target? Is it the neuroscientist that manually edits/curates their BIDS data? Is it the programmer that makes BIDS-derivatives processing pipelines?

For anyone doing exclusively manual curation of a BIDS dataset, I would expect that curation to almost exclusively omit the IP. Most commonly they'll be running something like dcm2niix, which gives a NIfTI & JSON per DICOM series, followed by filesystem-level renaming. Introducing IP usage would be more manual effort and only increase likelihood of errors. So the only case where someone might manually utilise the IP is if they are forced to define all of their metadata manually. Even in this scenario, use of the IP is not compulsory: if a user understands the principle and their data, they can exploit it; if not, they can omit it.

I think longer-term the more prevalent "users" will be App developers / those who interpret the outputs from those Apps. Writing shared metadata once to one file, appropriately named, is slightly more concise in code than having a base shared dictionary and duplicating it with minor changes across multiple output metadata files, though it's a pretty subtle difference. To me it's moreso about communication of the relationships between data files. For a BIDS Derivatives dataset, not all data files in a modality directory are equally distinct from one another; some are more strongly related than others, and the IP is one way of communicating those relationships.

I believe the summarize proposal of @yarikoptic is aimed as a middle ground?

I think that proposing to change the name of the principle may be misleading as to the scope of that change. The proposal is only to forbid having some data file with multiple applicable metadata files where some field takes different values across such files. That I think would be an unambiguously good change, would simplify both lay and systematised descriptions of the principle, and would be more algorithmically compatible with automated approaches. But it wouldn't resolve any of the concerns you have yourself raised here.

@yarikoptic
Copy link
Contributor Author

Attn @dorahermes @Lestropie and others -- if in general you consider this issue/idea good -- please upvote by 👍 . If you consider it a bad idea -- downvote with 👎 .

I would appreciate if general discussion of IP "disadvantages" would be discussed elsewhere, e.g. in the issue #36 if you feel strongly that IP "must die". But as far as I see it, the IP is to stay in some form, which might potentially remove some aspects (e.g. as the summarization here removing overloading), and/or be enriched with additional tooling or principles (alike IntededFor for groupping). For those I would also advise to start projects (like @Lestropie did) or other issues and cross-link back here.

In this issue, and later in a PR against bids-2.0 branch, I would appreciate more specific/targetted feedback or assistance with this idea. E.g.

  • I might wait for @Lestropie 's tool and then would love to use it (or see someone contributing) to implement that checker of openneuro datasets.
  • may be you have some use-cases where summarization would open opportunities?
  • may be you want even to take a stab at formalizing it in a PR?

@yarikoptic yarikoptic added the modularity Issues affecting modularity and composition of BIDS datasets label Sep 21, 2024
yarikoptic added a commit to yarikoptic/bids-specification that referenced this issue Dec 19, 2024
Upon re-reading current Inheritance principle formulation, nothing seems to
forbid that, and such use in general is great since allows to generalize
common metadata across all files of that datatype.

Notes on possible side-effects from "embracing" such approach (which in
principle I think is not disallowed ATM).

- per rule 4, presence of `bold.json` forbids presence of another `_bold.json`
  (i.e with entity) on the same level. So if further specialization e.g. per
  each task- is needed, common metadata needs to be duplicated across them
  (that is what heudiconv does ATM).

  Such restrictions could potentially be elevated if we adopt
  "summarization" refactoring of inheritance principle
  bids-standard/bids-2-devel#65
  since order would stop to matter and thus multiple files can apply.

- I think that bids-validators are fine as checked on a single
  ds000248/T1w.json in bids-examples and modified 7t_trt.

- I do not know if tools implement it though but since there was precedence
  for ds000248/T1w.json - they better do ;-)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inheritance modularity Issues affecting modularity and composition of BIDS datasets
Projects
Status: Todo
Development

No branches or pull requests

7 participants