-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Physio object generation from BIDS #4
Conversation
See my comment on the corresponding issue, I think we'll need to add more flexibility to column naming for this to be useful. |
@m-miedema it's fixed for the time being. I have implemented string literals searching, to give the physio_type property that is used in some functions. |
This looks reasonable to me! Let me know when it's ready for review :) |
Hi @maestroque ! For the addition of unit tests for those new functions, I've talked with a colleague and she recommended to just generate the structure on the fly when we run pytest (and generate random number for the timeseries if we even need to), if we are just concerned about the data structure ! If we are more interested about the content of those files and need actual real data you can take a look at that BIDS app |
Thanks @me-pic ! Sounds reasonable to me |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maestroque Nice work on that PR !
physutils/io.py
Outdated
|
||
for col in columns: | ||
col_physio_type = None | ||
if any([x in col for x in ["cardiac", "ppg", "ecg", "card"]]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maestroque Would it be possible to change for col.lower()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
from loguru import logger | ||
|
||
from physutils import physio | ||
|
||
EXPECTED = ["data", "fs", "history", "metadata"] | ||
|
||
|
||
def load_from_bids( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to add the parameter recording=None
? The reason is that according to the current version of the BIDS spec physio data with different sampling rate need to be saved under different file names using the recording
entity to differentiate them apart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And recording
should be added when you call layout.get
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, done!
physutils/io.py
Outdated
if not op.exists(bids_path): | ||
raise FileNotFoundError(f"Provided path {bids_path} does not exist") | ||
|
||
layout = BIDSLayout(bids_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if there is a reason not to do that but I think we should specify layout = BIDSLayout(bids_path, validate=False)
. I tested the current function with a BIDS dataset containing multiple physio files saved under names using different label values with the recording
entity (ex. recording-pulse and recording-breathing ), and if validate=False
is not specified, it just gives me an empty list after layout.get()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change + the addition of the recording parameter would allow people with BIDS dataset containing physio data with different sampling rate to still be able to use those functionalities
@me-pic the unit test is ready. I'll address your other comments shortly and it should be good |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we should also change the matplotlib >=3.9
line to non-constraining?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good catch !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @maestroque for integrating previous comments ! I've added some more 😅 Should be pretty straight forward to integrate, but let me know if you need help !
physutils/io.py
Outdated
|
||
for col in columns: | ||
col_physio_type = None | ||
if any([x in col.lower() for x in ["cardiac", "ppg", "ecg", "card"]]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@m-miedema Do you think it would be reasonable to add "pulse" as a valid column name for "cardiac" ? And also "o2" and "co2" for "respiratory" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although O2 and CO2 reflect respiration, I don't think they should be labelled as respiratory! This is mainly because the respiratory response function should not be used for these types of data (see e.g. the fMRI literature on end-tidal response functions), and users may become confused with putting this data in the same category as a belt measurement.
For pulse, I've never personally seen it used as a naming convention, but why not!
run_id = "01" | ||
|
||
bids_dir = pjoin( | ||
data_dir, "bids-dir", f"sub-{subject_id}", f"ses-{session_id}", "func" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should make sure that the test covers the possibility to have the recording entity mentioned in the filename. Ex. sub-006_ses-03_task-test_run-01_recording-cardiac_physio.tsv.gz
would be a valid filename according to the BIDS standard. And since we also included a recording
parameter in the load_from_bids
function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @maestroque I think my comment was not clear. The tests should cover the possibility to have filenames with the recording
entity as well as filenames that does not include that entity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good catch !
physutils/physio.py
Outdated
label = ref_physio.label if copy_label else None | ||
physio_type = ref_physio.physio_type if copy_physio_type else None | ||
computed_metrics = list(ref_physio.computed_metrics) if copy_computed_metrics else [] | ||
computed_metrics = ( | ||
dict(ref_physio.computed_metrics) if copy_computed_metrics else [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be dict(ref_physio.computed_metrics) if copy_computed_metrics else {}
instead ?
@@ -542,3 +544,55 @@ def neurokit2phys( | |||
metadata = dict(peaks=peaks) | |||
|
|||
return cls(data, fs=fs, metadata=metadata, **kwargs) | |||
|
|||
|
|||
class MRIConfig: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I missed it, but is there any test to cover that class ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it is not used yet, because it would need detection of the trigger channel in order to initialize it. I can open an issue defining the next steps for it, and I'll implement it if there is time after the workflow
# The data saved are the ones after t_0 = 0s | ||
assert phys_array[col].data.size == 70000 | ||
assert phys_array[col].fs == 10000.0 | ||
assert phys_array[col].history[0][0] == "physutils.io.load_from_bids" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to delete the bids dataset at the end of the function or just to add the bids-dir
directory in the .gitignore
?
physutils/io.py
Outdated
|
||
config_file = bids_file[0].get_metadata() | ||
fs = config_file["SamplingFrequency"] | ||
t_start = config_file["StartTime"] # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to define the start time here? It seems like it could be integrated into the time column checks, although its not necessary to do that in this PR. Perhaps open an issue to do a QA check and/or integrate the info into the specification of idx_0 in the event of a missing time column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I integrated it into the idx_0
definition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @maestroque ! This looks very nice and well-organized so far, I left a few comments to add to what @me-pic has included in her review. The main thing I want to understand better is the relationship between the MRIcfg and Physio objects, and whether it's relevant to extract any information from the trigger column itself (rather than the sidecar file). I don't think there's much in the MRI class that can be generated from the trigger column, other than the TR and number of scans, so while you can confirm this is correct from the sidecar, I don't think it's necessary to do so. I'm also wondering if you intend to e.g point the Physio object to its corresponding MRIcfg object in some way while you're setting them up - would this be helpful in metric calls later on?
@me-pic I integrated your comments, so it should be good from that side! @m-miedema I also addressed your comments on StartTime and n_slices. Regarding the use of the MRIConfig class and Physio, what you say is true, that was the intended use of the new class as we discussed in the past. It is pending the extraction of data from the trigger channel and the integration into the Physio object. I will open an issue about this, but I will tackle it if time allows after the workflow |
Other than that, it should be gtg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @maestroque for integrating the changes ! I've added to more comments, but once those are addressed I think we could merge 🚀
run_id = "01" | ||
|
||
bids_dir = pjoin( | ||
data_dir, "bids-dir", f"sub-{subject_id}", f"ses-{session_id}", "func" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @maestroque I think my comment was not clear. The tests should cover the possibility to have filenames with the recording
entity as well as filenames that does not include that entity.
df.to_csv(tsv_file, sep="\t", index=False, header=False, float_format="%.8e") | ||
|
||
# Compress tsv file into tsv.gz | ||
tsv_gz_file = pjoin( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand why the file is first saved as a tsp, then loaded and saved again as a compressed file. I think you can just directly save df
as a tsv.gz
since that is the required format according to the BIDS spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right, it was left as such from debugging
@me-pic I think we are now ready, let me know if you have anything else in mind |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🎉
Great work, thanks @maestroque ! |
🚀 PR was released in |
Closes #3
Proposed Changes
Change Type
bugfix
(+0.0.1)minor
(+0.1.0)major
(+1.0.0)refactoring
(no version update)test
(no version update)infrastructure
(no version update)documentation
(no version update)other
Checklist before review