Implemented audio extraction, adding audio streams, displaying audio stream #4

ashtondoane · 2024-08-20T20:46:37Z

My notes on this commit:

I decided not to implement glob/regex in the audio extraction function. To ensure that the function has a consistent effect (being one created file per call), I believe this regex processing should be done externally and then the function should be run in a for loop. Currently raises errors, which may need to be adjusted for bulk extraction, as this would halt the entire process. I'm open to notes and improvements for this one.
We discussed the tables with events being modified for true timing relative to the meg - the exact formatting of this is important for me to continue implementing this class.

… single file paths

src/ilabs_streamsync/streamsync.py

drammock · 2024-08-26T20:40:51Z

src/ilabs_streamsync/streamsync.py

            Events associated with the stream. TODO: should they be integer sample
            numbers? Timestamps? Do we support both?


As far as the input format for events, I don't really know what researchers/annotators are going to want to do. We have seen cases where the data was in the form of timestamps, probably something like HH:MM:SS.123456. So unless @NeuroLaunch has opinions about what (other) format(s) we should target, I'd say start with parsing HH:MM:SS.123456-formatted data, and we can expand to other formats later.

As far as the output format of events: MNE-Python has 2 ways of representing events (event arrays, and Annotations objects). We should decide which one (or both?) we want to use when converting/syncing camera timestamps to the Raw file's time domain. @NeuroLaunch do you have an opinion here? @ashtondoane are you familiar with the two kinds of MNE event representations?

If I had to put a stake in the ground I'd probably say "use Annotations" but I haven't thought very hard about it yet... maybe implement that first, and if we find that we need to also implement event array support, we can add that later.

I am not familiar with the MNE representations, I will have to read the documentation. I'll begin with annotations as @NeuroLaunch also mentioned this as a possibility and we can adjust later if necessary.

Not clear to me that this has actually been addressed, as nothing is done with events in the code; unresolving.

src/ilabs_streamsync/streamsync.py

Co-authored-by: Daniel McCloy <[email protected]>

…e extracted file in correct location.

…ference MEG. Implemented dispaly of all pulse channels (could be updated to be more user friendly).

ashtondoane · 2024-09-13T08:18:10Z

Let me know if anything else needs to change here.

drammock

See comments below regarding the StreamSync API.

drammock · 2024-09-13T19:19:51Z

src/ilabs_streamsync/streamsync.py

-        self.streams = []
+        """Initialize StreamSync object with 'Raw' MEG associated with it.
+
+        reference_object: str TODO: is str the best method for this, or should this be pathlib obj?


Initially I thought reference_object should be an object in memory, not a path to a file (whether str or Path). I still lean that way, on the assumption that the user is very likely to also be at least a bit familiar with MNE-Python (and thus know how to load a Raw file).

But I guess there's a case to be made that if the add_stream method takes a file path, then maybe the StreamSync constructor should also take in a file path. After reflection I'd say let's support both. The way to annotate that is mne.io.Raw | path-like, and we write the code so that Raw and str and Path will all work:

if isinstance(reference_obj, str): reference_obj = Path(reference_obj) if isinstance(reference_obj, Path): reference_obj = mne.io.read_raw(reference_obj, ...) # from now on we can safely assume it's a Raw object

drammock · 2024-09-13T19:27:25Z

src/ilabs_streamsync/streamsync.py

+        #Check type and value of pulse_channel, and ensure reference object has such a channel.
+        if not pulse_channel:
+            raise TypeError("pulse_channel is None. Please provide a channel name of type str.")
+        if type(pulse_channel) is not str:
+            raise TypeError("pulse_chanel parameter must be of type str.")
+        if raw[pulse_channel] is None:
+            raise ValueError('pulse_channel does not exist in refrence_object.')


Several points of feedback here:

MNE-Python lets you pick channels by index (integer) or by name (string); in principle we don't need to restrict users to just strings.

Even if we did want to restrict to strings, the preferred way to type check would be if not isinstance(pulse_channel, str) rather than if type(pulse_channel) is not str

all of these failure modes will be handled by MNE-Python already if the user passes an invalid channel selector. It should be enough to just do self.ref_stream = reference_object.get_data(picks=[pulse_channel]). If you want you could try/except that line, and provide a nicer error message than what MNE-Python provides when the channel is not found (if you think it would help the user substantially)

drammock · 2024-09-13T19:29:35Z

src/ilabs_streamsync/streamsync.py

+            raise ValueError('pulse_channel does not exist in refrence_object.')
+
+
+        self.raw = mne.io.read_raw_fif(reference_object, preload=False, allow_maxshield=True)


here you are loading in (a second time) the same file you already loaded into the variable raw. Assuming that's a mistake?

setting that aside: what is the motivation for keeping a reference to the Raw object as part of the StreamSync object? I think all we need is sfreq and a numpy array of the pulse channel data (or am I forgetting something)?

drammock · 2024-09-13T19:42:15Z

src/ilabs_streamsync/streamsync.py

+        # Check provided reference_object for type and existence.
+        if not reference_object:
+            raise TypeError("reference_object is None. Please provide a path.")
+        if type(reference_object) is not str:
+            raise TypeError("reference_object must be a file path of type str.")
+        ref_path_obj = pathlib.Path(reference_object)
+        if not ref_path_obj.exists():
+            raise OSError("reference_object file path does not exist.")
+        if not ref_path_obj.suffix == ".fif":
+            raise ValueError("Provided reference object does not point to a .fif file.")


I think this is probably overkill. MNE-Python will already provide informative error messages if the path doesn't point to a valid Raw file, or if the thing that is passed as a filename isn't in fact path-like.

drammock · 2024-09-13T19:49:45Z

src/ilabs_streamsync/streamsync.py

+
+
+        self.raw = mne.io.read_raw_fif(reference_object, preload=False, allow_maxshield=True)
+        self.ref_stream = raw[pulse_channel]


raw[pulse_channel] returns a tuple of two arrays: (data, times). do we need the times? If not we can do raw.get_data(picks=[pulse_channel])

drammock · 2024-09-13T19:51:51Z

src/ilabs_streamsync/streamsync.py

            Events associated with the stream. TODO: should they be integer sample
            numbers? Timestamps? Do we support both?


Not clear to me that this has actually been addressed, as nothing is done with events in the code; unresolving.

drammock · 2024-09-13T19:56:16Z

src/ilabs_streamsync/streamsync.py

+    audio_file = p.with_stem(f"{p.stem}_16_bit").with_suffix(".wav").name
+    if not p.exists():
+        raise ValueError('Path provided cannot be found.')
+    if not overwrite and pathlib.PurePath.joinpath(pathlib.Path(output_dir), pathlib.Path(audio_file)).exists():


Suggested change

if not overwrite and pathlib.PurePath.joinpath(pathlib.Path(output_dir), pathlib.Path(audio_file)).exists():

if not overwrite and (pathlib.Path(output_dir) / audio_file).exists():

drammock · 2024-09-13T20:00:48Z

src/ilabs_streamsync/streamsync.py

+        '-i', path_to_video,
+        '-map', '0:a',                # audio only (per DM)
+#         '-af', 'highpass=f=0.1',
+        '-acodec', 'pcm_s16le',


-acodec is being passed twice, with different values. Is that intended?

Ashton Doane added 5 commits August 20, 2024 12:13

Implemented extract_audio_from_video function. Currently only accepts…

ff32eb2

… single file paths

Extraction implemented.

a72d108

Implemented add_stream for wav files.

ff80659

Changed to using pathlib

e27fc50

Fixed docstring

df2bac0

drammock reviewed Aug 26, 2024

View reviewed changes

ashtondoane and others added 5 commits September 6, 2024 23:49

Corrected typo.

b1235c8

Co-authored-by: Daniel McCloy <[email protected]>

Implemented MNE file upload. Corrected subprocess to immediately plac…

c1ddcd8

…e extracted file in correct location.

Corrected errors mentioned in prior PR. Initializes SS object with re…

32648c6

…ference MEG. Implemented dispaly of all pulse channels (could be updated to be more user friendly).

Merged

d0e6511

Fixed final PR comments

3a6e501

drammock requested changes Sep 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented audio extraction, adding audio streams, displaying audio stream #4

Implemented audio extraction, adding audio streams, displaying audio stream #4

ashtondoane commented Aug 20, 2024

drammock Aug 26, 2024

ashtondoane Sep 7, 2024

drammock Sep 13, 2024

ashtondoane commented Sep 13, 2024

drammock left a comment

drammock Sep 13, 2024

drammock Sep 13, 2024

drammock Sep 13, 2024

drammock Sep 13, 2024

drammock Sep 13, 2024

drammock Sep 13, 2024

drammock Sep 13, 2024

drammock Sep 13, 2024

		Events associated with the stream. TODO: should they be integer sample
		numbers? Timestamps? Do we support both?

		raise ValueError('pulse_channel does not exist in refrence_object.')


		self.raw = mne.io.read_raw_fif(reference_object, preload=False, allow_maxshield=True)



		self.raw = mne.io.read_raw_fif(reference_object, preload=False, allow_maxshield=True)
		self.ref_stream = raw[pulse_channel]

	if not overwrite and pathlib.PurePath.joinpath(pathlib.Path(output_dir), pathlib.Path(audio_file)).exists():
	if not overwrite and (pathlib.Path(output_dir) / audio_file).exists():

Implemented audio extraction, adding audio streams, displaying audio stream #4

Are you sure you want to change the base?

Implemented audio extraction, adding audio streams, displaying audio stream #4

Conversation

ashtondoane commented Aug 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ashtondoane commented Sep 13, 2024

drammock left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment