-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace "inheritance" with "summarization" principle #65
Comments
@yarikoptic Thanks for this thoughtful issue and for the @bids-standard/bep036 team. I can do the check in a month or so for how inheritance is currently being used in OpenNeuro, thanks to Datalad's I mostly like the idea above, except maybe I'm confused about one thing. Let's say up top there's a I think whether we move forward with either the inheritance principle OR the summarization principle, the call is for tools to support either of them. If one small set of tools could be created to support either, whichever one makes it to the gate first could be most easily adopted. This is why I've had creating a set of inheritance software tools has been on my BIDS maintainers desirables list for a long time. Thoughts? |
to be precise: it is not
Correct. Even if a single subject differs - such metadata should not be present at the level where it is not common for all levels below. Possible solutions:
no -- new users just should not bother creating top level Re support -- correct, tools support would be needed... BUT "summarization" principle is just a more restricted case of inheritance if I see it right, so in principle any tool supporting current inheritance should work with "summarization" without any change. An line of thought on .tsv + .json duality....tsv's we have are pretty much a case of summarization (as placing in a tabular structure within a single file) for entries where metadata could be different (e.g. age for a subject)... i.e. in |
I'm a fan.
I would perhaps pose a different question. There's a bifurcation in opinions on the inheritance principle. I've personally been pushing for making it more powerful, which required improvement to the definition of current behaviour in order to facilitate the subsequent augmentation. Others would prefer that the whole principle disappear entirely, and all metadata relevant to a data file be present in the sidecar file. The way I would therefore look at this proposal is: if the capacity for value overloading (specifically a present value at a higher level being overridden at a lower level) were to be removed, would this sway those previously opposed to the inheritance principle toward its preservation? So that's actually a question directed not at me but at others. |
@Lestropie, a birdie said that you might be participating in BIDS hackathon (if only virtually)? Would you be interested to work on this one. It can already be done as a PR against similar to WiP I just started |
I'm not aiming to participant in the hackathon and looking for a project so much as want to take on automating the use of inheritance and see the hackathon as a potential way to motivate project commencement and get other people on board. I want to write it up as a proposal somewhere, but wasn't quite sure where would be best: it's not yet guaranteed that I'll be able to do the Hackathon, and what I have in mind is also not specific to BIDS 2.0. Maybe I should create an empty repository and start listing issues there. |
See https://github.com/Lestropie/IP-me/issues for my current intentions on the topic. |
I know that it looks like that the general consensus is that we should keep or improve the inheritance/summarization principle. However, I have yet to encounter a single dataset in which I have found any use for this principle, but I have encountered several datasets in which this principle caused headaches, hard maintenance work and created ugly / hacky codebases. If it were up to me, I would through out the whole principle and always store the complete metadata with the data. It costs nearly nothing in terms of diskspace and I think it would make everybody's life easier. TLDR: choose the KISS principle, not the inheritance principle |
The way I see it is that the inheritance principle comes down to implementing a poor man's solution for a relational database on the filesystem level |
@marcelzwiers the entire BIDS is "RDB on the filesystem level", so not surprising that pybids caches parsed structure in a local sql DB ;-) re inheritance principle -- it is in heavy use everywhere, e.g. 20% of openneuro datasets use it for `*task-*_events.json` files$> for d in ds*; do ls -ld $d/*events.json 2>/dev/null | head -n1; done | nl
1 -rw-r----- 1 yoh datalad 204 Apr 27 2020 ds000006/task-livingnonlivingdecisionwithplainormirrorreversedtext_events.json
2 -rw------- 1 yoh datalad 128 Dec 2 2022 ds000031/events.json
3 -rw-r----- 1 yoh datalad 284 Dec 4 2018 ds000164/task-stroop_events.json
4 -rw-r----- 1 yoh datalad 596 Dec 4 2018 ds000214/task-Cyberball_events.json
5 -rw-r----- 1 yoh datalad 1879 Apr 27 2020 ds000217/task-picturetest_events.json
6 -rw-r----- 1 yoh datalad 857 Apr 27 2020 ds000223/task-mag_events.json
7 -rw-r----- 1 yoh datalad 738 Apr 27 2020 ds000249/task-genInstrAv_events.json
8 -rw-r--r-- 1 yoh datalad 1993 Aug 11 2020 ds001415/task-maplistening_events.json
9 -rw-r----- 1 yoh datalad 1193 Jan 25 2019 ds001499/task-5000scenes_events.json
10 -rw-r----- 1 yoh datalad 76 Dec 5 2018 ds001553/task-checkerboard_events.json
11 -rw-r----- 1 yoh datalad 316 Aug 13 2019 ds001590/task-loc_events.json
12 -rw-r----- 1 yoh datalad 869 Dec 5 2018 ds001597/task-cuedMFM_events.json
13 -rw-r----- 1 yoh datalad 567 Aug 20 2019 ds001608/task-viewclips_events.json
14 -rw-r----- 1 yoh datalad 528 Mar 18 2019 ds001740/task-convers_events.json
15 -rw-r----- 1 yoh datalad 231 Aug 12 2019 ds001771/task-identification_events.json
16 -rw-r----- 1 yoh datalad 739 Feb 26 2021 ds001785/task-adapt_events.json
17 -rw-r----- 1 yoh datalad 969 Mar 4 2021 ds001787/task-meditation_events.json
18 -rw-r----- 1 yoh datalad 1860 Aug 14 2019 ds001810/task-attentionalblink_events.json
19 -rw-r----- 1 yoh datalad 2641 Feb 25 2021 ds001814/task-ARC_events.json
20 -rw-r----- 1 yoh datalad 340 Feb 25 2021 ds001838/task-Adaptation_events.json
21 -rw-r----- 1 yoh datalad 903 Aug 20 2019 ds001840/task-viewclips_events.json
22 -rw-r----- 1 yoh datalad 1594 Jan 31 2022 ds001848/task-ParallelAdaptation_events.json
23 -rw-r----- 1 yoh datalad 1274 Aug 14 2019 ds001894/task-AANonWord_events.json
24 -rw-r----- 1 yoh datalad 2831 Aug 19 2019 ds001971/task-AudioCueWalkingStudy_events.json
25 -rw-r----- 1 yoh datalad 3110 Aug 20 2019 ds002011/task-Overlap_events.json
26 -rw-r----- 1 yoh datalad 410 Aug 20 2019 ds002013/task-CircCon_events.json
27 -rw-r----- 1 yoh datalad 428 Aug 20 2019 ds002033/task-ChangeDetection_events.json
28 -rw-r----- 1 yoh datalad 1671 Dec 3 2019 ds002041/task-TD_events.json
29 -rw-r----- 1 yoh datalad 340 Feb 25 2021 ds002116/task-Adaptation_events.json
30 -rw-r----- 1 yoh datalad 761 Feb 25 2021 ds002158/task-main_events.json
31 -rw-r----- 1 yoh datalad 649 Dec 3 2019 ds002185/task-odors_events.json
32 -rw-r----- 1 yoh datalad 604 Dec 3 2019 ds002218/task-Experiment_events.json
33 -rw-r----- 1 yoh datalad 1242 Apr 27 2020 ds002236/task-AudRhyme_events.json
34 -rw-r--r-- 1 yoh datalad 229 Jul 16 2020 ds002351/task-LDT_events.json
35 -rw-r----- 1 yoh datalad 925 Apr 27 2020 ds002366/task-emoregRun1_events.json
36 -rw-r----- 1 yoh datalad 924 Apr 27 2020 ds002411/task-ProgramCategorization_events.json
37 -rw-r----- 1 yoh datalad 798 Apr 27 2020 ds002419/task-taste1_events.json
38 -rw-r----- 1 yoh datalad 1400 May 4 2021 ds002424/task-SLD_events.json
39 -rw-r----- 1 yoh datalad 193 Apr 27 2020 ds002522/task-CRF_events.json
40 -rw------- 1 yoh datalad 1360 Dec 2 2022 ds002578/events.json
41 -rw-r----- 1 yoh datalad 1068 Apr 25 2022 ds002603/task-wm_events.json
42 -rw------- 1 yoh datalad 925 Apr 29 10:53 ds002620/task-emoregRun1_events.json
43 -rw-r----- 1 yoh datalad 123 Feb 25 2021 ds002634/task-ArtVoc_events.json
44 -rw-r----- 1 yoh datalad 696 Feb 25 2021 ds002647/task-IHG_events.json
45 -rw------- 1 yoh datalad 974 Nov 28 2023 ds002680/events.json
46 -rw-r----- 1 yoh datalad 1381 Apr 27 2020 ds002687/task-SLD_events.json
47 -rw------- 1 yoh datalad 349 Nov 28 2023 ds002691/events.json
48 -rw-r----- 1 yoh datalad 3151 Apr 25 2022 ds002718/task-FaceRecognition_events.json
49 -rw-r----- 1 yoh datalad 1596 Jan 31 2022 ds002738/task-reward_events.json
50 -rw------- 1 yoh datalad 977 Jan 19 2023 ds002761/task-loc_events.json
51 -rw-r----- 1 yoh datalad 772 May 13 2020 ds002776/task-motorseq_events.json
52 -rw-r----- 1 yoh datalad 423 Mar 4 2021 ds002785/task-anticipation_acq-seq_events.json
53 -rw-r----- 1 yoh datalad 2658 Mar 4 2021 ds002790/task-emomatching_acq-seq_events.json
54 -rw-r----- 1 yoh datalad 1224 Feb 25 2021 ds002813/task-fintest_events.json
55 -rw-r----- 1 yoh datalad 1898 Feb 25 2021 ds002835/task-prospection_events.json
56 -rw-r----- 1 yoh datalad 455 Jun 18 2021 ds002843/task-itc_events.json
57 -rw-r--r-- 1 yoh datalad 489 Jun 11 2020 ds002872/task-illusion_events.json
58 -rw-r--r-- 1 yoh datalad 1274 Jun 9 2020 ds002879/task-AANonWord_events.json
59 -rw-r--r-- 1 yoh datalad 2613 Jun 9 2020 ds002886/task-Syllogisms_events.json
60 -rw------- 1 yoh datalad 4504 Nov 28 2023 ds002893/task-AuditoryVisualShift_events.json
61 -rw-r--r-- 1 yoh datalad 804 Jun 17 2020 ds002894/task-languagelocalizer_events.json
62 -rw-r--r-- 1 yoh datalad 804 Jul 21 2020 ds002905/task-languagelocalizer_events.json
63 -rw-r--r-- 1 yoh datalad 1295 Jun 29 2020 ds002941/task-Mult_events.json
64 -rw------- 1 yoh datalad 1127 Dec 19 2022 ds002989/task-DDbid_events.json
65 -rw-r--r-- 1 yoh datalad 1082 Jul 8 2020 ds002995/task-tastemap_events.json
66 -rw-r--r-- 1 yoh datalad 1295 Aug 14 2020 ds003028/task-Mult_events.json
67 -rw------- 1 yoh datalad 2979 Aug 22 2022 ds003061/task-P300_events.json
68 -rw-r--r-- 1 yoh datalad 2613 Aug 14 2020 ds003076/task-Syllogisms_events.json
69 -rw-r--r-- 1 yoh datalad 1295 Sep 2 2020 ds003083/task-Mult_events.json
70 -rw-r----- 1 yoh datalad 66 Oct 22 2020 ds003136/task-affect_events.json
71 -rw-r----- 1 yoh datalad 1194 Oct 23 2020 ds003242/task-CIC_events.json
72 -rw-r----- 1 yoh datalad 1045 Jan 31 2022 ds003340/task-foodpicture_events.json
73 -rw-r----- 1 yoh datalad 324 Oct 23 2020 ds003342/task-grasp_events.json
74 -rw-r----- 1 yoh datalad 333 Mar 4 2021 ds003436/task-anim_events.json
75 -rw-r----- 1 yoh datalad 949 Feb 25 2021 ds003454/task-rapm_events.json
76 -rw-r----- 1 yoh datalad 1625 Feb 25 2021 ds003459/task-audortho_events.json
77 -rw-r----- 1 yoh datalad 613 May 4 2021 ds003487/task-PIT_events.json
78 -rw-r----- 1 yoh datalad 2661 Feb 25 2021 ds003495/task-emomatching_acq-seq_events.json
79 -rw-r----- 1 yoh datalad 1091 Jan 18 2022 ds003499/task-freq1_events.json
80 -rw-r----- 1 yoh datalad 2561 Feb 25 2021 ds003500/task-Conj19Sel_events.json
81 -rw-r----- 1 yoh datalad 2052 Mar 8 2021 ds003511/task-Recall_events.json
82 -rw-r----- 1 yoh datalad 2724 Jul 22 2021 ds003550/task-RepMem1_events.json
83 -rw-r----- 1 yoh datalad 1647 Jul 22 2021 ds003553/task-FacesHousesTE27_events.json
84 -rw-r----- 1 yoh datalad 2283 Jul 22 2021 ds003554/task-RepYo1_events.json
85 -rw-r--r-- 1 yoh datalad 229 Mar 18 2021 ds003569/task-LDT_events.json
86 -rw-r----- 1 yoh datalad 816 May 4 2021 ds003574/task-game_run-1_events.json
87 -rw-r----- 1 yoh datalad 2624 May 4 2021 ds003604/task-Gram_events.json
88 -rw-r----- 1 yoh datalad 8689 Jun 18 2021 ds003645/task-FacePerception_events.json
89 -rw------- 1 yoh datalad 1659 Apr 29 10:54 ds003684/task-dsp_events.json
90 -rw-r----- 1 yoh datalad 2371 Jul 22 2021 ds003703/task-listeningToSpeech_events.json
91 -rw-r----- 1 yoh datalad 1772 Jul 22 2021 ds003708/task-ccep_events.json
92 -rw-r----- 1 yoh datalad 801 Jul 22 2021 ds003711/events.json
93 -rw-r----- 1 yoh datalad 487 Jan 18 2022 ds003721/task-BI_events.json
94 -rw-r----- 1 yoh datalad 812 Jul 22 2021 ds003722/task-MIvsRest_events.json
95 -rw-r----- 1 yoh datalad 2036 Jan 18 2022 ds003758/task-beads_events.json
96 -rw-r----- 1 yoh datalad 1171 Jan 18 2022 ds003772/task-changepoint_events.json
97 -rw-r----- 1 yoh datalad 812 Jan 18 2022 ds003810/task-MIvsRest_events.json
98 -rw-r----- 1 yoh datalad 605 Jan 18 2022 ds003812/events.json
99 -rw------- 1 yoh datalad 1287 Aug 22 2022 ds003823/task-emotionRegulation_events.json
100 -rw-r----- 1 yoh datalad 1678 Jan 18 2022 ds003825/task-rsvp_events.json
101 -rw------- 1 yoh datalad 1318 Apr 29 10:53 ds003834/task-fam1back_events.json
102 -rw------- 1 yoh datalad 703 Apr 29 10:53 ds003835/events.json
103 -rw-r----- 1 yoh datalad 3346 Jan 18 2022 ds003846/task-PredError_events.json
104 -rw------- 1 yoh datalad 1290 Apr 29 10:54 ds003851/task-train_events.json
105 -rw-r----- 1 yoh datalad 3530 Jan 18 2022 ds003858/task-MID_events.json
106 -rw-r----- 1 yoh datalad 2425 Apr 25 2022 ds003965/task-face_events.json
107 -rw-r----- 1 yoh datalad 4770 Jan 31 2022 ds004010/task-MultisensoryDetectionTask_events.json
108 -rw------- 1 yoh datalad 520 Jan 4 14:22 ds004012/task-auditorystimuli_events.json
109 -rw-r----- 1 yoh datalad 894 Apr 25 2022 ds004018/task-rsvp_events.json
110 -rw-r----- 1 yoh datalad 793 Apr 25 2022 ds004073/task-PD_events.json
111 -rw------- 1 yoh datalad 2134 May 25 2023 ds004080/events.json
112 -rw------- 1 yoh datalad 853 Aug 22 2022 ds004086/task-RecogConf_events.json
113 -rw-r----- 1 yoh datalad 3473 Apr 25 2022 ds004091/task-AttendFixGazeCenterFS_events.json
114 -rw------- 1 yoh datalad 1928 Aug 22 2022 ds004094/task-induct_events.json
115 -rw------- 1 yoh datalad 6295 Aug 22 2022 ds004105/task-DriveRandomSound_events.json
116 -rw------- 1 yoh datalad 5092 Aug 22 2022 ds004106/task-GuardDuty_events.json
117 -rw------- 1 yoh datalad 3934 Aug 22 2022 ds004117/task-WorkingMemory_events.json
118 -rw------- 1 yoh datalad 5568 Aug 22 2022 ds004118/task-Drive_events.json
119 -rw------- 1 yoh datalad 4126 Aug 22 2022 ds004119/task-GuardDuty_events.json
120 -rw------- 1 yoh datalad 6013 Aug 22 2022 ds004120/task-DriveWithSpeedChange_events.json
121 -rw------- 1 yoh datalad 9492 Aug 22 2022 ds004121/task-DriveWithTaskAudio_events.json
122 -rw------- 1 yoh datalad 4475 Aug 22 2022 ds004122/task-Drive_events.json
123 -rw------- 1 yoh datalad 9928 Aug 22 2022 ds004123/task-DriveWithComplexity_events.json
124 -rw------- 1 yoh datalad 766 Aug 22 2022 ds004128/task-DG_events.json
125 -rw------- 1 yoh datalad 1227 Jan 19 2023 ds004192/task-things_events.json
126 -rw------- 1 yoh datalad 1302 Aug 22 2022 ds004194/task-prf_events.json
127 -rw------- 1 yoh datalad 2543 Aug 22 2022 ds004200/task-temporalscaling_events.json
128 -rw------- 1 yoh datalad 1089 May 26 2023 ds004212/task-main_events.json
129 -rw------- 1 yoh datalad 692 Dec 2 2022 ds004228/task-piper_events.json
130 -rw------- 1 yoh datalad 3473 May 25 2023 ds004271/task-AttendFixGazeCenterFS_events.json
131 -rw------- 1 yoh datalad 2242 Dec 19 2022 ds004295/task-task_events.json
132 -rw------- 1 yoh datalad 526 Dec 19 2022 ds004302/task-speech_events.json
133 -rw------- 1 yoh datalad 1045 May 25 2023 ds004312/task-foodpicture_events.json
134 -rw------- 1 yoh datalad 461 Aug 9 2023 ds004341/task-semenc_events.json
135 -rw------- 1 yoh datalad 1584 Dec 19 2022 ds004349/task-expo_events.json
136 -rw------- 1 yoh datalad 2904 Dec 19 2022 ds004350/task-LG_events.json
137 -rw------- 1 yoh datalad 2114 Dec 19 2022 ds004356/task-MusicvsSpeech_events.json
138 -rw------- 1 yoh datalad 2051 May 25 2023 ds004357/task-rsvp_events.json
139 -rw------- 1 yoh datalad 4665 Dec 19 2022 ds004362/task-motion_events.json
140 -rw------- 1 yoh datalad 1385 Dec 19 2022 ds004367/task-rdk_events.json
141 -rw------- 1 yoh datalad 2632 Dec 19 2022 ds004368/task-task_events.json
142 -rw------- 1 yoh datalad 977 May 25 2023 ds004398/task-loc_events.json
143 -rw------- 1 yoh datalad 406 May 26 2023 ds004400/events.json
144 -rw------- 1 yoh datalad 563 May 25 2023 ds004444/task-smrbmi_events.json
145 -rw------- 1 yoh datalad 563 May 25 2023 ds004446/task-smrbmi_events.json
146 -rw------- 1 yoh datalad 563 May 25 2023 ds004447/task-smrbmi_events.json
147 -rw------- 1 yoh datalad 563 May 25 2023 ds004448/task-smrbmi_events.json
148 -rw------- 1 yoh datalad 1772 May 25 2023 ds004457/task-ccep_events.json
149 -rw------- 1 yoh datalad 424 May 25 2023 ds004460/task-Rotation_events.json
150 -rw------- 1 yoh datalad 2 Aug 9 2023 ds004475/task-task_events.json
151 -rw------- 1 yoh datalad 110 May 25 2023 ds004488/task-action_events.json
152 -rw------- 1 yoh datalad 243 May 25 2023 ds004496/task-imagenet_events.json
153 -rw------- 1 yoh datalad 2 May 25 2023 ds004519/task-ProAntiCue_events.json
154 -rw------- 1 yoh datalad 2 May 25 2023 ds004520/task-Retrocue_events.json
155 -rw------- 1 yoh datalad 2 May 25 2023 ds004521/task-Postcues_events.json
156 -rw------- 1 yoh datalad 15794 May 25 2023 ds004532/task-PST_events.json
157 -rw------- 1 yoh datalad 1385 May 25 2023 ds004554/task-picturenaming_events.json
158 -rw------- 1 yoh datalad 676 May 25 2023 ds004556/task-feedback_events.json
159 -rw------- 1 yoh datalad 676 May 25 2023 ds004557/task-feedback_events.json
160 -rw------- 1 yoh datalad 1129 Sep 23 2023 ds004562/task-adaptation_events.json
161 -rw------- 1 yoh datalad 1854 Aug 9 2023 ds004563/task-touchdecoding_events.json
162 -rw------- 1 yoh datalad 523 May 25 2023 ds004574/task-Oddball_events.json
163 -rw------- 1 yoh datalad 740 May 25 2023 ds004575/task-IntervalTiming_events.json
164 -rw------- 1 yoh datalad 740 May 25 2023 ds004579/task-IntervalTiming_events.json
165 -rw------- 1 yoh datalad 479 May 25 2023 ds004580/task-Simon_events.json
166 -rw------- 1 yoh datalad 2 Jun 27 2023 ds004584/task-Rest_events.json
167 -rw------- 1 yoh datalad 2 Jun 1 2023 ds004588/task-unnamed_events.json
168 -rw------- 1 yoh datalad 1312 Jun 27 2023 ds004592/task-gradCPTface_events.json
169 -rw------- 1 yoh datalad 684 Aug 9 2023 ds004602/task-ERNPsychometrics_events.json
170 -rw------- 1 yoh datalad 687 Jun 27 2023 ds004606/task-msit_events.json
171 -rw------- 1 yoh datalad 840 Jun 27 2023 ds004609/task-msit_events.json
172 -rw------- 1 yoh datalad 840 Nov 28 2023 ds004621/task-msit_events.json
173 -rw------- 1 yoh datalad 937 Apr 29 10:55 ds004625/task-UnevenTerrain_events.json
174 -rw------- 1 yoh datalad 4579 Aug 9 2023 ds004626/task-DotProbe_events.json
175 -rw------- 1 yoh datalad 1541 Aug 9 2023 ds004635/task-resting_events.json
176 -rw------- 1 yoh datalad 2736 Nov 28 2023 ds004636/task-ANT_events.json
177 -rw------- 1 yoh datalad 45169 Nov 28 2023 ds004657/task-Drive_events.json
178 -rw------- 1 yoh datalad 7484 Nov 28 2023 ds004660/task-P300_events.json
179 -rw------- 1 yoh datalad 11772 Nov 28 2023 ds004661/task-nback_events.json
180 -rw------- 1 yoh datalad 3180 Sep 23 2023 ds004692/task-study_events.json
181 -rw------- 1 yoh datalad 1664 Jan 4 14:20 ds004724/task-antisaccade_events.json
182 -rw------- 1 yoh datalad 1929 Sep 23 2023 ds004745/task-unnamed_events.json
183 -rw------- 1 yoh datalad 706 Nov 28 2023 ds004746/task-paingen_events.json
184 -rw------- 1 yoh datalad 412 Nov 28 2023 ds004771/task-PY_events.json
185 -rw------- 1 yoh datalad 734 Jan 4 14:21 ds004784/task-phantom_events.json
186 -rw------- 1 yoh datalad 1385 Nov 28 2023 ds004785/task-unnamed_events.json
187 -rw------- 1 yoh datalad 687 Apr 29 10:57 ds004796/task-msit_events.json
188 -rw------- 1 yoh datalad 3900 Nov 28 2023 ds004802/task-roddball_events.json
189 -rw------- 1 yoh datalad 1287 Nov 28 2023 ds004816/task-rsvp_events.json
190 -rw------- 1 yoh datalad 1287 Nov 28 2023 ds004817/task-rsvp_events.json
191 -rw------- 1 yoh datalad 26908 Nov 28 2023 ds004841/task-DriveOnMission_events.json
192 -rw------- 1 yoh datalad 28251 Nov 28 2023 ds004842/task-DriveOnMission_events.json
193 -rw------- 1 yoh datalad 20064 Nov 28 2023 ds004843/task-VisualSituationalAwareness_events.json
194 -rw------- 1 yoh datalad 17863 Nov 28 2023 ds004844/task-Drive_events.json
195 -rw------- 1 yoh datalad 11772 Nov 28 2023 ds004849/task-nback_events.json
196 -rw------- 1 yoh datalad 11772 Nov 28 2023 ds004850/task-nback_events.json
197 -rw------- 1 yoh datalad 11772 Nov 28 2023 ds004851/task-nback_events.json
198 -rw------- 1 yoh datalad 11772 Nov 28 2023 ds004852/task-nback_events.json
199 -rw------- 1 yoh datalad 11772 Nov 28 2023 ds004853/task-nback_events.json
200 -rw------- 1 yoh datalad 11772 Nov 28 2023 ds004854/task-nback_events.json
201 -rw------- 1 yoh datalad 11772 Nov 28 2023 ds004855/task-nback_events.json
202 -rw------- 1 yoh datalad 1681 Nov 28 2023 ds004860/task-HarmN400_events.json
203 -rw------- 1 yoh datalad 506 Jan 4 14:22 ds004883/task-FFERN_events.json
204 -rw------- 1 yoh datalad 329 Apr 29 10:56 ds004894/events.json
205 -rw------- 1 yoh datalad 1664 Apr 29 10:54 ds004935/task-antisaccade_events.json
206 -rw------- 1 yoh datalad 1560 Apr 29 10:57 ds004942/task-SpatialMemory_events.json
207 -rw------- 1 yoh datalad 6783 Apr 29 11:00 ds005012/task-mid_events.json
208 -rw------- 1 yoh datalad 954 Apr 29 11:01 ds005021/task-tiltillusion_events.json and as a "paradigm" it is pretty much is what |
But besides "paradigm" applicability, I am not sure I saw (but I never looked) application of it for Moreover inheritance principle is somewhat specific for .tsv and .bval/.bvec files in that there is no "inheritance" -- lowest level in hierarchy is taken (.json - accumulates from higher levels), and that plays better with "summarization". |
Inheritance is baked into For TSV files, I think the equivalent to the summarization principle would be that there must be exactly one applicable TSV file of a given type. So you could have a |
I don't agree with that and as I see it, BIDS is a study format, nothing relational about it. True, for some data there are two locations for storing thing, either in the subject folder, or e.g. in the particpants.tsv file. But then you always choose, one or the other, there is never a relation between them (like there is a relation between jsons adhering to inheritance principle). And the fact that 20% of the openneuro datasets use it is a description of the situation, not an argument for it's benefits :-) (it would have been trivially easy for these studies to store it all on the sub/ses level)
I agree with that, and I do support your "summarization" proposal as an improvement over the inheritance principle... It meets the goal of metadata deduplication, while reducing ambiguities and overly complex schemes, e.g. when pooling data |
Trivial to some but not to all: remember that some of the people who create the datasets barely know what a json file is or how to interact with it with python or matlab. So creating (and especially updating) a single file at the root of the dataset will be a lot easier for them than having to edit manually many many many json files. |
I actually fear for them dealing properly with the inheritance principle. I think it would be better to have such people use tools for editing/maintaining BIDS datasets, such as CuBIDS? |
it was a response to your
not an argument although it can easily become one if expanded, e.g. "I and others, as shown by above example, find it extremely useful". But this issue is not about that topic. If you would like to discuss inheritance principles cons, please chime in instead on |
In principle, I think this should be ok in "summarization" formulation as "overridden" would be replaced with "duplicated". In practice it would be tricky/impossible since for |
I completely agree with the above proposal as it eliminates value overloading (having a value at one level override a value at a different level). Some notes from a BIDS curation perspective in a clinical environment. Non-technical staff does really well working with human readable files with simple rules and avoiding the use of additional software packages: either a file exists at the top, or at the individual level. This works most of the time. When we have to change a field at the individual level in hundreds of files, they often reach out to someone who can code. One example use-case is the channels.tsv file for EEG/iEEG. This file exists for every data file and bad channels can be annotated there as they can differ across sessions and runs. The columns are the same across all subjects but include some optional user specified columns. If a channels.json file can exist only at the top level to specify these columns (that are the same across all subjects) that is convenient. The proposal described here would strongly facilitate this use case, which is extremely common for us, if I am correct. |
I think we may be getting off-topic (feel free to hide this comment as off-topic if you agree), but I'm confused by the following:
TSV files are not merged, they are located. Unless you are proposing this change, nobody would try to merge events.tsv files found at multiple levels. Given that you say it would be tricky/impossible, I don't think you're proposing it... Now, TSV files can be joined, but those are specific ones. For example |
@dorahermes re
sounds like requiring common columns provided at top level
I am "considering" or "approaching" it ;-) And as @dorahermes points out above (if I got her right) we might want to not just "append" but "extend" (more like we do for json if we consider json to be a simple single row, and tsv is a list of such rows). Overall, I think it could be very beneficial if we could generalize principle so it doesn't differ in handling .tsv and .json files. Note that if we have |
re NB "edited" for difference in name/value ❯ grep '_id:$' objects/columns.yaml
desc_id:
participant_id:
sample_id:
session_id: but I guess could be generalized for any entity (context: #54). |
Another issue I haven't seen much discussion on (but correct me if I'm wrong, as I also missed the previous discussion on the inheritance cons, thank you @yarikoptic), is what I would call the file collection/grouping problem. So how to deal with e.g.:
How many summarize json sidecars would you make? Obviously, you would not make one for each run, but would you make one for each |
I don't deal with a wide breadth of different BIDS data from which to generate examples, but one that always irks me is complex DWI data. It is increasingly recommended to export magnitude & phase data for DWI as it facilitates superior denoising. In the absence of inheritance, this means that the diffusion gradient table (which is currently exclusively
The discussion on this Issue might skew differently to community opinion. I've been told on multiple occasions that there are many who would prefer for it to be removed entirely. I don't have my finger on the pulse on exactly what those proportions might look like. One concern I have is that a naive community poll may skew toward removal because of a) an expectation of manual curation of such and b) consideration of raw BIDS data only, whereas community opinions following a) creation of a tool for automated application and b) consideration of the complexities of derivative data may yield a different result. So I'd like to at least create a compelling case.
All depends on the metadata contents; and more esoterically whether JSON files without suffices are permitted.
This actually ends up with more metadata files than there are data files. But unlike exclusively using sidecars, it is immediately discoverable exactly what it is that differs between eg. entity-linked file collections Regardless of my own opinion, I don't see the debate progressing in an informed way in the absence of tangible examples of what data look like with vs. without the IP, or in the absence of software to use or not use the IP (complex examples like that above I would never expect a human to manually curate). Hence why I invested some time and effort in generating an Issue list for such: https://github.com/Lestropie/IP-freely/issues. |
Yes, that's nice, but I think that this level of complexity just to deduplicate to the bitter end can be hard to grasp and would harm the acceptation / proper use of BIDS amongst the average neuroscientists. The inheritance principle makes things much less human readable and simple. For instance, I cannot just inspect a sidecar file anymore, I need tooling to search for data in the filetree hierarchy to get a complete view. So before deciding on a solution, I think we should clearly define who the users are that the inheritance principle tries to target? Is it the neuroscientist that manually edits/curates their BIDS data? Is it the programmer that makes BIDS-derivatives processing pipelines? And we need to consider if the benefits for one group of users really outweighs the downsides for the other users. I believe the summarize proposal of @yarikoptic is aimed as a middle ground? |
Fully appreciate the argument for IP abolition. There's a good reason there's no consensus on the topic. The question of "not just inspecting a sidecar ... needing tooling to search for (meta)data in the file tree hierarchy" has a natural converse, being something like "metadata are not unique ... need tooling to determine what data in the file tree hierarchy take the same values". There's complex relationships between metadata across data files regardless of how you cut it, it's a question of what types of operations you want to best facilitate. What's landed me on the pro-IP side is that I'm further along in attempting to standardise complex derivatives. Consider the second of the two cases above. In a BIDS raw dataset, if two data files have the same value for some metadata field, that might be interesting, or it might not be. I would personally argue that it communicates the natural hierarchical nature of the data, but agree it comes with a complexity cost if stored explicitly in such a way. But with BIDS raw, data files are generally pretty independent of one another (with the exception of entity-linked file collections, which I'll come back to). With BIDS Derivatives, it will be more common for there to be more strongly linked file collections: a "singular" computational outcome is often by necessity spread across multiple data files. Here, shared metadata across sidecars is not mere happenstance or an opportunity for storage compression: the dataset would be considered corrupt were those metadata to not be exactly identical across data files. Moreover, within a dataset containing many files in a modality directory, human discernment of what data files encode the results of that particular computation vs. encode something else becomes increasingly difficult; a metadata file containing the relevant fields that is applicable via IP only to data files encoding the outcome of that computation would clearly communicate that grouping. This is really just the existing entity-linked file collections concept, only more strongly asserted. Enhancing the IP, particularly by removing the restriction of one applicable metadata file per filesystem level, would greatly enhance this concept. Currently, there's no way to really "encode" an entity-linked file collection. Different data files may have more or less metadata fields that are equal or different between them, and more or less mutual vs. distinct entities, but it's all quite "fuzzy". Defining a metadata file that is applicable to multiple data files, containing only the mutually shared metadata fields, and named based on only the mutual entities, would be what defines that entity-linked file collection. Also, given the principle is not a novel proposal for 2.0 but has stood throughout 1.x, I think there's a need for better tooling regardless of what happens for 2.0. Any software for reading / writing BIDS data really should by now be fully IP-aware. And I think there's moreover a need for software dedicated to the IP. I think having such tooling at hand might help inform that decision making process.
For anyone doing exclusively manual curation of a BIDS dataset, I would expect that curation to almost exclusively omit the IP. Most commonly they'll be running something like I think longer-term the more prevalent "users" will be App developers / those who interpret the outputs from those Apps. Writing shared metadata once to one file, appropriately named, is slightly more concise in code than having a base shared dictionary and duplicating it with minor changes across multiple output metadata files, though it's a pretty subtle difference. To me it's moreso about communication of the relationships between data files. For a BIDS Derivatives dataset, not all data files in a modality directory are equally distinct from one another; some are more strongly related than others, and the IP is one way of communicating those relationships.
I think that proposing to change the name of the principle may be misleading as to the scope of that change. The proposal is only to forbid having some data file with multiple applicable metadata files where some field takes different values across such files. That I think would be an unambiguously good change, would simplify both lay and systematised descriptions of the principle, and would be more algorithmically compatible with automated approaches. But it wouldn't resolve any of the concerns you have yourself raised here. |
Attn @dorahermes @Lestropie and others -- if in general you consider this issue/idea good -- please upvote by 👍 . If you consider it a bad idea -- downvote with 👎 . I would appreciate if general discussion of IP "disadvantages" would be discussed elsewhere, e.g. in the issue #36 if you feel strongly that IP "must die". But as far as I see it, the IP is to stay in some form, which might potentially remove some aspects (e.g. as the summarization here removing overloading), and/or be enriched with additional tooling or principles (alike IntededFor for groupping). For those I would also advise to start projects (like @Lestropie did) or other issues and cross-link back here. In this issue, and later in a PR against bids-2.0 branch, I would appreciate more specific/targetted feedback or assistance with this idea. E.g.
|
Upon re-reading current Inheritance principle formulation, nothing seems to forbid that, and such use in general is great since allows to generalize common metadata across all files of that datatype. Notes on possible side-effects from "embracing" such approach (which in principle I think is not disallowed ATM). - per rule 4, presence of `bold.json` forbids presence of another `_bold.json` (i.e with entity) on the same level. So if further specialization e.g. per each task- is needed, common metadata needs to be duplicated across them (that is what heudiconv does ATM). Such restrictions could potentially be elevated if we adopt "summarization" refactoring of inheritance principle bids-standard/bids-2-devel#65 since order would stop to matter and thus multiple files can apply. - I think that bids-validators are fine as checked on a single ds000248/T1w.json in bids-examples and modified 7t_trt. - I do not know if tools implement it though but since there was precedence for ds000248/T1w.json - they better do ;-)
It is a next step to the discussion which happened in
On a recent road-trip with @effigies we briefly discussed it and so far did not see a show stopper but it would require more minds to analyze.
ATM one of the problems of inheritance principle is unclear semantic in case of a value to be modified down the hierarchy: order can be unclear in case of multiple "candidate" files, unclear how to "remove" a value, etc.
And overall for a human it is cumbersome to "gather" the final value since for a file down the hierarchy someone needs to go through all possibly inherited files to arrive at the final value. But what if we take my suggestion in aforementioned issue further:
It will be a (now doable) job for a validator to ensure that all duplicated (across levels, if any) metadata is consistent.
As a result we would provide user a convenience that looking at top level metadata file provides a "guaranteed" correct metadata across all subject sessions, which is not the case currently as we can change it following the order of inheritance.
task-*_bold.json
files collate all identical values across subject/sessions -- makes it easy to see what is common (e.g. scanner ID etc)participants.tsv
summarizes metadata across participants and we expect it to be consistent with possible other phenotypic information to be found in subject/sessions.phetotype/
folder).Attn @Lestropie as he has spent most time to improve Inheritance principle definition, and @dorahermes who is an active proponent and its user: do you think such "simplification" (removal of "value overload") of inheritance would simplify and remain usable? Or may be I do not see some common use case such additional "restriction" would disallow?
I think it might be worth writing some checker and apply it across all openneuro datasets to see if we run into such data "overloads". What would be a tool/functionality which implements inheritance principle already "closest to the bible", e.g. which pretty much would return a list of lists of .json/.tsv files in their "inherited" bundles? (specific code examples would be welcome)
Edit:
The text was updated successfully, but these errors were encountered: