Replies: 5 comments 10 replies
-
There’s also other “beyond core FAVE” things I think we could try to develop under the FAVE umbrella. I’m also committed to getting out an audio cleanup tool, and possibly a linguist-friendly front end to ASR. I’m pretty committed to the principle that any tools or development we do should focus on local computation. So, no tools that upload data for cloud processing. |
Beta Was this translation helpful? Give feedback.
-
All that sounds very good! |
Beta Was this translation helpful? Give feedback.
-
What would the difference be between FAVE + FastTrack and pure FastTrack? FastTrack already has evaluation for up to 24 candidates as opposed to FAVE's four. Would FAVE run FastTrack multiple times with different parameters and apply its Special Sauce to choose the best of the best? |
Beta Was this translation helpful? Give feedback.
-
I'm still unclear on the scope of fave-recode. My understanding of the general process from alignment to analysis is:
Where in this process is fave-recode intervening? Theoretically, My second point, and this goes towards the modularization point in the OP, is that we will want to provide hooks at various points that make it easy for modules to intervene. Wherever fave-recode intervenes, it shouldn't be hard-coded into this functional chain. At each step in this process, fave-extract should check for an modules that want to run on the output, execute them, and then pass their combined output into the next stage. This will make it easier to maintain and extend other modules by removing tight coupling between modules. |
Beta Was this translation helpful? Give feedback.
-
I've mocked up what I think the flow of inputs, outputs, and operations should be. A I'm working through fave_recode, I think any inputs that aren't a specialized format (i.e. textgrids or wav files) should be yaml. flowchart TD
tg --> aligned_textgrid("aligned_textgrid()")
recode_scheme --> fave_recode("fave_recode()")
target_labels --> target_selector
target_heuristics --> target_selector
subgraph pre_processing
aligned_textgrid --> atg
atg --> fave_recode
fave_recode --> ratg
ratg --> target_selector("target_selector()")
target_selector --> targets
end
fave_recode -.-> saved_recoded_tg?
targets --> fasttrackpy("fasttrackpy()")
wav --> fasttrackpy
point_heuristic --> point_selector("point_selector()")
subgraph audio_processing
fasttrackpy --> track_candidates
track_candidates --> point_selector
point_selector --> winning_points
winning_points -.-> point_selector
end
winning_points --> fave_output("fave_output()")
speaker_info --> fave_output
subgraph outputting
fave_output
end
fave_output --> measured_points
fave_output --> measured_tracks
fave_output --> logs
|
Beta Was this translation helpful? Give feedback.
-
I thought I’d lay out my general thoughts about where FAVE is going.
FAVE-classic & new-FAVE
FAVE in its current version needs to be maintained and available both for researchers’ workflow, and for documentation continuity for published research methods. But its methods and code base are no longer up-to-date best practice, and nearly impossible to update & maintain. Moreover, “off-label” usage (i.e. use outside of North American English) continues to grow. I want to support these researchers better, but can’t within the current FAVE infrastructure.
The kinds of changes I’m considering are substantial enough that I think it would be confusing to simply version bump FAVE. What it will be called is up for discussion, but I think “new-FAVE” is nicely evocative.
Development Plan
Modularization
Any new-FAVE package will be installable as a stand alone program, but behind the scenes it should call upon distinct modules which are themselves independently useful. I hope this will flexibilize maintenance & development of separate FAVE components, since all we need to ensure is that each module remains consistent in its input and output, but can be otherwise maintained internally.
✅ alignedTextGrid
The necessary components of alignedTextGrid are already in place for use as the textgrid navigation module for a new-FAVE. Right now I’m also working on expanding its features to be useful for people doing PoLaR annotation, but that’s not strictly necessary for a vowel extraction procedure.
🛠️ fave-recode
Work on fave-recode is underway. The goal is to create a sufficiently expressive schema to recode the labels in the output of a forced aligner. FAVE-classic does this internally right now like so:
The ARPABET symbols and old Plotnik codes are hard coded into the FAVE code, and the recoded values are not stored or reflected in the textgrids in any way.
I’m taking it as a given that researchers may want to do some re-labeling of vowel classes from the output of a forced aligner, but instead of hard coding this into the code, fave-recode will operate on a recoding schema, currently formatted as a yaml file. e.g. Relabeling pre-lateral “UW” to its own allophone would look like this:
I’ve been writing up the yaml files for the recoding FAVE does in order to explore how expressive I need the schema to be. But any language specific aspects of the recoding can be captured in these YAML files, rather than in the code base itself.
Formant Estimation
For formant tracking / estimation, I want to move to a python implementation of FastTrack. preliminary implementation here. The benefits of this will be
I’ve already been in conversation with Santiago Barreda about the plan, but I still think this part should be handled with care to ensure proper attribution & citational practices. There’s other issues to consider within pure FastTrackPy, like how it should operate all on its own, and what its mathematical implementation should be (numpy arrays? torch tensors? google jax?)
Measurement point heuristics
Even with formant tracks being foregrounded, researchers will still want single point measurements, and may want to define vowel class by vowel class measurement points. I imagine this could be defined in a yaml file as well. Perhaps something like:
I’ve done no preliminary development on this.
Just FastTrack?
A big question for me is whether the new formant extraction should just do one pass of FastTrack formant tracking, or should include some form of FAVE-like iteration over candidate sets of measurements within-speaker.
Prioritization
To get to a minimum viable tool, I think the priorities are the completion of fave-recode and beginning on new-FAVE-extract. FastTrackPy actually has all the necessary components right now to be used on a developmental basis.
Then, we need to do some comparison of the results from new-FAVE-extract to FAVE-classic. Some degree of mismatch is to be expected, but it should be quantified.
Beta Was this translation helpful? Give feedback.
All reactions