You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
IMO, YODA should clearly separate the principles from the suggestions, and should be fully decoupled from DataLad.
"Standards speak" would need to be expounded and explained to make sense to the unfamiliar, but this is what I have in mind for the formal bit. wdyt @yarikoptic?
Track all input data, code, and computational environments needed to produce analysis outputs in
version controlled datasets — and reproducibility you will achieve!
Learn control you must.
Size matters not!
- Subdataset references in a dataset are
extremely lightweight yet guarantee data identity via cryptographic hashes. Subdatasets can be
detached without losing this information, yielding massively improved storage efficiency and
reduced archive costs.
- Publicly shared data compliant with a common standard are an optimal element in a modular study
setup. From mid-2018 OpenNeuro (previously OpenFMRI) will offer DataLad datasets for direct
download
Principles
*P1* Use well-defined, portable computational environments to compute analysis results
*P2* Exhaustively track ALL analysis inputs in the same version control system
as the computed results, including:
- input data
- custom analysis code/scripts
- required computational environments (e.g. as container images)
*P3* Structure study elements (data, code, environments) in modular
components to facilitate reuse within or outside the context of the
original study
Dataset Layout
Dataset structure is fully flexible to be able to accommodate domain standards (e.g. BIDS). Element
location/name can be discovered from configuration.
Required (3rd-party) code repositories can be referenced as subdatasets just like datasets with data
files. Repository state is unambiguous version record.
Images of containerized computational environments are tracked in version control just like any
other data file. Actual storage can be local or in cloud
Any input data is referenced via the dataset that contans it. Dataset state provides unambi- guous
version specification for any data dependency.
DataLad can obtain required subdataset content on demand. Only content elements actually required
for an analysis are present. Directory structure is expanded recursively as needed
Test scripts can be used to check analysis code, verify data integrity, and assess computational
reproducibility.
YODA has also been proposed to be a standard/best-practice for ReproNim ReproNim/repronim.org#206.
IMO, YODA should clearly separate the principles from the suggestions, and should be fully decoupled from DataLad.
"Standards speak" would need to be expounded and explained to make sense to the unfamiliar, but this is what I have in mind for the formal bit. wdyt @yarikoptic?
YODA IDEALS
YODA PRINCIPLES:
same version control system
YODA ASSETS:
(This part could probably be left out of the formal section and discussed in the detailed explanation)
MUST:
SHOULD:
NOTES
Original Organigram: https://f1000research.com/posters/7-1965
Top level
Principles
Dataset Layout
Datalad Handbook
https://handbook.datalad.org/en/latest/basics/101-127-yoda.html
Principles
The text was updated successfully, but these errors were encountered: