-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented audio extraction, adding audio streams, displaying audio stream #4
base: main
Are you sure you want to change the base?
Conversation
Events associated with the stream. TODO: should they be integer sample | ||
numbers? Timestamps? Do we support both? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as the input format for events, I don't really know what researchers/annotators are going to want to do. We have seen cases where the data was in the form of timestamps, probably something like HH:MM:SS.123456
. So unless @NeuroLaunch has opinions about what (other) format(s) we should target, I'd say start with parsing HH:MM:SS.123456
-formatted data, and we can expand to other formats later.
As far as the output format of events: MNE-Python has 2 ways of representing events (event arrays, and Annotations
objects). We should decide which one (or both?) we want to use when converting/syncing camera timestamps to the Raw file's time domain. @NeuroLaunch do you have an opinion here? @ashtondoane are you familiar with the two kinds of MNE event representations?
If I had to put a stake in the ground I'd probably say "use Annotations" but I haven't thought very hard about it yet... maybe implement that first, and if we find that we need to also implement event array support, we can add that later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not familiar with the MNE representations, I will have to read the documentation. I'll begin with annotations as @NeuroLaunch also mentioned this as a possibility and we can adjust later if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not clear to me that this has actually been addressed, as nothing is done with events
in the code; unresolving.
Co-authored-by: Daniel McCloy <[email protected]>
…e extracted file in correct location.
…ference MEG. Implemented dispaly of all pulse channels (could be updated to be more user friendly).
Let me know if anything else needs to change here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments below regarding the StreamSync API.
self.streams = [] | ||
"""Initialize StreamSync object with 'Raw' MEG associated with it. | ||
|
||
reference_object: str TODO: is str the best method for this, or should this be pathlib obj? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I thought reference_object
should be an object in memory, not a path to a file (whether str
or Path
). I still lean that way, on the assumption that the user is very likely to also be at least a bit familiar with MNE-Python (and thus know how to load a Raw file).
But I guess there's a case to be made that if the add_stream
method takes a file path, then maybe the StreamSync
constructor should also take in a file path. After reflection I'd say let's support both. The way to annotate that is mne.io.Raw | path-like
, and we write the code so that Raw
and str
and Path
will all work:
if isinstance(reference_obj, str):
reference_obj = Path(reference_obj)
if isinstance(reference_obj, Path):
reference_obj = mne.io.read_raw(reference_obj, ...)
# from now on we can safely assume it's a Raw object
#Check type and value of pulse_channel, and ensure reference object has such a channel. | ||
if not pulse_channel: | ||
raise TypeError("pulse_channel is None. Please provide a channel name of type str.") | ||
if type(pulse_channel) is not str: | ||
raise TypeError("pulse_chanel parameter must be of type str.") | ||
if raw[pulse_channel] is None: | ||
raise ValueError('pulse_channel does not exist in refrence_object.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several points of feedback here:
- MNE-Python lets you pick channels by index (integer) or by name (string); in principle we don't need to restrict users to just strings.
- Even if we did want to restrict to strings, the preferred way to type check would be
if not isinstance(pulse_channel, str)
rather thanif type(pulse_channel) is not str
- all of these failure modes will be handled by MNE-Python already if the user passes an invalid channel selector. It should be enough to just do
self.ref_stream = reference_object.get_data(picks=[pulse_channel])
. If you want you could try/except that line, and provide a nicer error message than what MNE-Python provides when the channel is not found (if you think it would help the user substantially)
raise ValueError('pulse_channel does not exist in refrence_object.') | ||
|
||
|
||
self.raw = mne.io.read_raw_fif(reference_object, preload=False, allow_maxshield=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here you are loading in (a second time) the same file you already loaded into the variable raw
. Assuming that's a mistake?
setting that aside: what is the motivation for keeping a reference to the Raw object as part of the StreamSync object? I think all we need is sfreq
and a numpy array of the pulse channel data (or am I forgetting something)?
# Check provided reference_object for type and existence. | ||
if not reference_object: | ||
raise TypeError("reference_object is None. Please provide a path.") | ||
if type(reference_object) is not str: | ||
raise TypeError("reference_object must be a file path of type str.") | ||
ref_path_obj = pathlib.Path(reference_object) | ||
if not ref_path_obj.exists(): | ||
raise OSError("reference_object file path does not exist.") | ||
if not ref_path_obj.suffix == ".fif": | ||
raise ValueError("Provided reference object does not point to a .fif file.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is probably overkill. MNE-Python will already provide informative error messages if the path doesn't point to a valid Raw file, or if the thing that is passed as a filename isn't in fact path-like.
|
||
|
||
self.raw = mne.io.read_raw_fif(reference_object, preload=False, allow_maxshield=True) | ||
self.ref_stream = raw[pulse_channel] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raw[pulse_channel]
returns a tuple of two arrays: (data, times)
. do we need the times? If not we can do raw.get_data(picks=[pulse_channel])
Events associated with the stream. TODO: should they be integer sample | ||
numbers? Timestamps? Do we support both? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not clear to me that this has actually been addressed, as nothing is done with events
in the code; unresolving.
audio_file = p.with_stem(f"{p.stem}_16_bit").with_suffix(".wav").name | ||
if not p.exists(): | ||
raise ValueError('Path provided cannot be found.') | ||
if not overwrite and pathlib.PurePath.joinpath(pathlib.Path(output_dir), pathlib.Path(audio_file)).exists(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not overwrite and pathlib.PurePath.joinpath(pathlib.Path(output_dir), pathlib.Path(audio_file)).exists(): | |
if not overwrite and (pathlib.Path(output_dir) / audio_file).exists(): |
'-i', path_to_video, | ||
'-map', '0:a', # audio only (per DM) | ||
# '-af', 'highpass=f=0.1', | ||
'-acodec', 'pcm_s16le', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-acodec
is being passed twice, with different values. Is that intended?
My notes on this commit:
I decided not to implement glob/regex in the audio extraction function. To ensure that the function has a consistent effect (being one created file per call), I believe this regex processing should be done externally and then the function should be run in a for loop. Currently raises errors, which may need to be adjusted for bulk extraction, as this would halt the entire process. I'm open to notes and improvements for this one.
We discussed the tables with events being modified for true timing relative to the meg - the exact formatting of this is important for me to continue implementing this class.