Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make inheritance applicability explicit for scan.json #789

Closed
yarikoptic opened this issue Apr 27, 2021 · 11 comments
Closed

make inheritance applicability explicit for scan.json #789

yarikoptic opened this issue Apr 27, 2021 · 11 comments

Comments

@yarikoptic
Copy link
Collaborator

@robertoostenveld in #538 (comment) mentions that "using the inheritance principle there can be a scans.json at the top level". Current inheritance principle formulation seems to suggest that too, but documentation nowhere mentions scans.json although it could be quite a common use case.

ATM I see only a single dataset with a (broken) scans.json

(git)smaug:/mnt/datasets/datalad/crawl/openneuro[master]git
$> ls -ls ds00*/*scans.json
4 -rw------- 1 yoh datalad 656 Oct 23  2020 ds003242/scans.json

and which is

$> cat ds003242/.bidsignore 
scans.json

possibly likely because prior versions of bids-validator choked on it.

So I wonder if we should add scans.json

yarikoptic added a commit to dbic/heudiconv that referenced this issue Apr 30, 2021
Breeding those identical files is not useful at all:

- causes some pains for datalad (git-annex) since committed under
  annex, and if published online and identical -- there might be
  thousands of urls associated with that file annex key

- BIDS is not explicit (yet) about possibility to have scans.json
  on top but it seems to follow nicely from inheritance principles.
  See bids-standard/bids-specification#789
  and references there-in

- so why waste inodes and clutter the file tree?
yarikoptic added a commit to dbic/heudiconv that referenced this issue Apr 30, 2021
Breeding those identical files is not useful at all:

- causes some pains for datalad (git-annex) since committed under
  annex, and if published online and identical -- there might be
  thousands of urls associated with that file annex key

- BIDS is not explicit (yet) about possibility to have scans.json
  on top but it seems to follow nicely from inheritance principles.
  See bids-standard/bids-specification#789
  and references there-in

- so why waste inodes and clutter the file tree?
@sappelhoff
Copy link
Member

So I wonder if we should add scans.json to https://github.com/bids-standard/bids-specification/blob/master/src/schema/top_level_files.yaml (attn raw-electrophys-ieeg)

The top level files schema file says:

This file describes files which may appear at the top level of a dataset.
This does not include information about whether these files are required or optional.
For that information, see rules/top_level_files.yaml.

It contains files like:

  • README
  • LICENSE
  • CHANGES
  • dataset_description

however, due to inheritance many other files could be placed there ... for example a task-rest_eeg.json file. This is currently not part of the schema in this way. Not sure if that's a shortcoming or a good thing ... but if we were to add scans.json there, we'd surely have to add all those other files as well for consistency - WDYT?

as an explicit example to "Inheritance principle" section

I feel like with the recent clarifications in #946 this may not be so crucial anymore. But if you have a good idea for an example, why not add it 👍

@robertoostenveld
Copy link
Collaborator

I think the same applies to sessions.json.

@VisLab
Copy link
Member

VisLab commented Apr 11, 2022

events.json too....

@sappelhoff
Copy link
Member

I agree with both of you Robert and Kay - but what do you think: Should we add those to the top_level_files schema ... or rather not, because the only time they may ever be placed there is when inheritance applies.

I can see both solutions. What's your opinion @tsalo?

@VisLab
Copy link
Member

VisLab commented Apr 11, 2022

I am not sure what putting it in the schema would look like or what the trade-offs are.

For example, a filename for one of these files can't have a sub entity or ses entity as part of its name, right?

@tsalo
Copy link
Member

tsalo commented Apr 11, 2022

I am reticent to directly encode the inheritance principle into the schema. At the moment, .json is treated as an extension like any other, rather than as a special sidecar file. If we want to encode the inheritance principle, we basically need to have a separate entry for each suffix group just for .jsons, and then we need to make just about every required entity in those entries optional.

For example, a filename for one of these files can't have a sub entity or ses entity as part of its name, right?

It could have sub and ses, as long as it appears in sub-X/ses-Y/sub-X_ses-Y_scans.<tsv|json>. I suppose it could have ses, but not sub, too.

@robertoostenveld
Copy link
Collaborator

robertoostenveld commented Apr 11, 2022

I think the user story - linked to which the implementational changes are to be refined - is "as a user I want a dataset with inheritance of the scans.json/sessions.json at the top level to pass the validator (and events.json, but I think that already works).

To refine this user story, I suggest to make a test case, see whether the validator breaks.

  • if not, add it to the validator test suite for regression testing
  • if it breaks, fix the validator (which might mean "extend the schema"), and then add to the test suite

@monique2208
Copy link
Contributor

I just came across this issue as we ran into the error with the legacy validator, which does not accept a sessions.json file at the top level. The deno validator has no problem with it. Does that mean this issue is resolved?

@yarikoptic
Copy link
Collaborator Author

FWIW since original filing the number of scans.json files among openneuro datasets grew up
$> ls -ls ds00*/*scans.json | nl
     1	4 -rw-r----- 1 yoh datalad 656 Oct 23  2020 ds003242/scans.json
     2	4 -rw-r----- 1 yoh datalad 411 Jan 18  2022 ds003812/scans.json
     3	4 -rw------- 1 yoh datalad 339 Aug 22  2022 ds004141/scans.json
     4	4 -rw------- 1 yoh datalad 339 Aug 22  2022 ds004142/scans.json
     5	4 -rw------- 1 yoh datalad 339 Dec 19  2022 ds004331/scans.json
     6	4 -rw------- 1 yoh datalad 339 May 25  2023 ds004466/scans.json
     7	4 -rw------- 1 yoh datalad 186 Jun  5 11:49 ds004636/scans.json
     8	4 -rw------- 1 yoh datalad 339 Sep 23  2023 ds004693/scans.json
     9	4 -rw------- 1 yoh datalad 339 Apr 29 10:51 ds004920/scans.json
    10	4 -rw------- 1 yoh datalad 339 Apr 29 10:54 ds004943/scans.json
    11	4 -rw------- 1 yoh datalad 339 Apr 29 10:56 ds004956/scans.json
    12	4 -rw------- 1 yoh datalad 339 Jun  5 11:27 ds005085/scans.json
    13	4 -rw------- 1 yoh datalad 339 Jun  5 11:33 ds005123/scans.json
    14	4 -rw------- 1 yoh datalad 339 Sep 23 16:09 ds005134/scans.json
    15	4 -rw------- 1 yoh datalad 339 Sep 23 16:14 ds005238/scans.json
    16	4 -rw------- 1 yoh datalad 645 Sep 23 16:07 ds005250/scans.json
    17	4 -rw------- 1 yoh datalad 339 Sep 23 16:09 ds005365/scans.json
    18	4 -rw------- 1 yoh datalad 339 Sep 23 16:10 ds005374/scans.json
    19	4 -rw------- 1 yoh datalad 339 Sep 23 16:14 ds005454/scans.json
    20	4 -rw------- 1 yoh datalad 385 Sep 23 16:14 ds005455/scans.json

and only the earliest dataset has (only) scans.json explicitly .bidsignored:

$> for f in ds00*/*scans.json ; do fs=$(dirname $f)/.bidsignore; grep -l scans.json $fs; done
ds003242/.bidsignore

BIDS validation became indeed more schema and principles driven than before. As many pointed out, I think having scans.json is just a particular case of inheritance principle and should not be explicitly listed in the schema anyhow since in general should be applicable to any entity-less {suffix}.json. I think we should resolve this issue by an explicit example. Proposing a PR

@monique2208 , what version of legacy bids-validator did you use? I searched myself into

which was presumably addressed by

but didn't try

3 years ago and theoretically (and based on aforementioned openneuro datasets) should not be the case any more even with legacy one

@effigies
Copy link
Collaborator

effigies commented Oct 7, 2024

Agreed, the issue is comprehensively resolved in the schema validator. The legacy validator can have blind spots due to hand-written regular expressions being the validation mechanism. Adding a top-level sessions.json would be accepted as a PR in the validator.

I agree this issue can be closed, as the inheritance principle doesn't need to be repeated in the spec just because the validator failed to implement it in some circumstances.

@effigies effigies closed this as completed Oct 7, 2024
@monique2208
Copy link
Contributor

We were using v1.14, and we got the following error,

1: [ERR] Files with such naming scheme are not part of BIDS specification. This error is most commonly caused by typos in file names that make them not BIDS compatible. Please consult the specification and make sure your files are named correctly. If this is not a file naming issue (for example when including files not yet covered by the BIDS specification) you should include a ".bidsignore" file in your dataset (see https://github.com/bids-standard/bids-validator#bidsignore for details). Please note that derived (processed) data should be placed in /derivatives folder and source data (such as DICOMS or behavioural logs in proprietary formats) should be placed in the /sourcedata folder. (code: 1 - NOT_INCLUDED)
		./sessions.json
			Evidence: sessions.json
	Please visit https://neurostars.org/search?q=NOT_INCLUDED for existing conversations about this issue.

We are switching to the newer validator soon, so it should not be problem any longer

yarikoptic added a commit to yarikoptic/bids-specification that referenced this issue Oct 7, 2024
effigies pushed a commit that referenced this issue Oct 25, 2024
…iple (#1945)

* Include entity-less "scans.json" into an example of inheritance principle

Clarifies #789

* Apply suggestions from code review

---------

Co-authored-by: Remi Gau <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants