FAVE Development Plan #1

JoFrhwld · 2023-10-19T15:34:46Z

JoFrhwld
Oct 19, 2023
Maintainer

I thought I’d lay out my general thoughts about where FAVE is going.

FAVE-classic & new-FAVE

FAVE in its current version needs to be maintained and available both for researchers’ workflow, and for documentation continuity for published research methods. But its methods and code base are no longer up-to-date best practice, and nearly impossible to update & maintain. Moreover, “off-label” usage (i.e. use outside of North American English) continues to grow. I want to support these researchers better, but can’t within the current FAVE infrastructure.

The kinds of changes I’m considering are substantial enough that I think it would be confusing to simply version bump FAVE. What it will be called is up for discussion, but I think “new-FAVE” is nicely evocative.

Development Plan

Modularization

Any new-FAVE package will be installable as a stand alone program, but behind the scenes it should call upon distinct modules which are themselves independently useful. I hope this will flexibilize maintenance & development of separate FAVE components, since all we need to ensure is that each module remains consistent in its input and output, but can be otherwise maintained internally.

✅ alignedTextGrid

The necessary components of alignedTextGrid are already in place for use as the textgrid navigation module for a new-FAVE. Right now I’m also working on expanding its features to be useful for people doing PoLaR annotation, but that’s not strictly necessary for a vowel extraction procedure.

🛠️ fave-recode

Work on fave-recode is underway. The goal is to create a sufficiently expressive schema to recode the labels in the output of a forced aligner. FAVE-classic does this internally right now like so:

input: “AA1” in HOT
internal recode: “5”
output plt_vclass: “o”

The ARPABET symbols and old Plotnik codes are hard coded into the FAVE code, and the recoded values are not stored or reflected in the textgrids in any way.

I’m taking it as a given that researchers may want to do some re-labeling of vowel classes from the output of a forced aligner, but instead of hard coding this into the code, fave-recode will operate on a recoding schema, currently formatted as a yaml file. e.g. Relabeling pre-lateral “UW” to its own allophone would look like this:

- rule: uwl
    conditions:
      - attribute: label
        relation: contains
        set: UW
      - attribute: fol.label
        relation: ==
        set: L
      - attribute: fol.fol.label
        relation: ==
        set: “#”
    return: uwl

I’ve been writing up the yaml files for the recoding FAVE does in order to explore how expressive I need the schema to be. But any language specific aspects of the recoding can be captured in these YAML files, rather than in the code base itself.

Formant Estimation

For formant tracking / estimation, I want to move to a python implementation of FastTrack. preliminary implementation here. The benefits of this will be

Foregrounding formant tracks & dynamics
Does not require a reference set of vowel measurements for initial extraction
Does not require user choice of a max formant on a speaker-level basis
Would allow the extension of FAVE to any sonorant speech sound with formant structure (l, r, w, etc)

I’ve already been in conversation with Santiago Barreda about the plan, but I still think this part should be handled with care to ensure proper attribution & citational practices. There’s other issues to consider within pure FastTrackPy, like how it should operate all on its own, and what its mathematical implementation should be (numpy arrays? torch tensors? google jax?)

Measurement point heuristics

Even with formant tracks being foregrounded, researchers will still want single point measurements, and may want to define vowel class by vowel class measurement points. I imagine this could be defined in a yaml file as well. Perhaps something like:

- class: ay0
  dimension: f1
  location: max
- class: ae
  dimension: prop_time
  location: 0.3

I’ve done no preliminary development on this.

Just FastTrack?

A big question for me is whether the new formant extraction should just do one pass of FastTrack formant tracking, or should include some form of FAVE-like iteration over candidate sets of measurements within-speaker.

Prioritization

To get to a minimum viable tool, I think the priorities are the completion of fave-recode and beginning on new-FAVE-extract. FastTrackPy actually has all the necessary components right now to be used on a developmental basis.

Then, we need to do some comparison of the results from new-FAVE-extract to FAVE-classic. Some degree of mismatch is to be expected, but it should be quantified.

JoFrhwld · 2023-10-19T15:47:51Z

JoFrhwld
Oct 19, 2023
Maintainer Author

There’s also other “beyond core FAVE” things I think we could try to develop under the FAVE umbrella. I’m also committed to getting out an audio cleanup tool, and possibly a linguist-friendly front end to ASR. I’m pretty committed to the principle that any tools or development we do should focus on local computation. So, no tools that upload data for cloud processing.

6 replies

JoFrhwld Oct 20, 2023
Maintainer Author

Oh yeah, the FastTrakPy implementation won't be writing and reading from disc.

chrisbrickhouse Oct 21, 2023
Maintainer

a linguist-friendly front end to ASR. I’m pretty committed to the principle that any tools or development we do should focus on local computation. So, no tools that upload data for cloud processing.

I'm in the process of getting something like this working for my dissertation research. I'm using whisper for ASR. The current problem I'm dealing with is speaker diarization and overlapping speech. Currently using pyannote for that, but it's speaker chunks don't always align with whisper's output making it hard to merge the two programmatically. Whisper diarization is another workflow that I'm looking at but haven't tried; my concern being it claims to not do overlap well which is not ideal for conversational data. These are all local and FOSS solutions.

Anyway, this is to say that I'm willing to lead that aspect of development. I planned to do comparisons of this workflow against the voices of California's existing human alignments (n=388) to get measures of accuracy, and it can be compared to MFA's ASR and diarization workflows if that's worthwhile (I'm not sure if it's widely used, is it DARLA's backend?)

chrisbrickhouse Oct 21, 2023
Maintainer

@JoeyStanley

I guess I don't mind offloading the computation to DARLA's servers at the moment

The reason we would prefer local computation is to accommodate different ethical, consensual, and IRB constraints. For example, my working group is very averse to solutions which require uploading data to servers we don't control. It introduces data security risks that have burned us in the past, and proprietary ASR solutions offered by e.g. Google and Apple may be incompatible with participant consent. That's not to say non-local computation isn't a priority or valuable, just that local computation increases the number of groups and projects who can utilize the software. Ideally front ends like DARLA can adopt new-FAVE and offload the very real hardware cost of running some of these models.

JoFrhwld Oct 21, 2023
Maintainer Author

@chrisbrickhouse, feel free to kick things off with a repo, or even just a place holder repo. You can keep it private until you finish your diss if you prefer. We haven’t had any discussions about how every FAVE thing ought to be licensed, and I know you have a lot of knowledge & opinions about it, so license it how you will!

Or, especially since this is your diss work, you could maintain the main repo on your personal account and we just keep an up-to-date fork on the organization.

@chrisbrickhouse & @JoeyStanley re cloud vs local: In addition to general ethics concerns, one thing I ran into while working in a once EU-country is data protection laws that prohibit sending data to non-EU servers. I’ve also been having conversations with anthropologists who might find ASR & FAVE-like analysis usedul, but have much more severe data security concerns.

JoeyStanley Oct 23, 2023

Oh I hope my comment didn't come across the wrong way. I'm very much in favor of doing everything locally for all the reasons you've stated.

phimee · 2023-10-19T16:12:04Z

phimee
Oct 19, 2023

All that sounds very good!

0 replies

JoeyStanley · 2023-10-19T22:17:22Z

JoeyStanley
Oct 19, 2023

What would the difference be between FAVE + FastTrack and pure FastTrack? FastTrack already has evaluation for up to 24 candidates as opposed to FAVE's four. Would FAVE run FastTrack multiple times with different parameters and apply its Special Sauce to choose the best of the best?

3 replies

JoFrhwld Oct 20, 2023
Maintainer Author

This will require some thought, and some empirical comparisons? Also the candidate set in fast track is generated using a fixed number of poles and varying the max formant, while FAVE uses a fixed max formant and varies the poles. So there's a bit of an apples & oranges issue.

I think the main difference between a new-FAVE and just plain FastTrack is the pre-processing of the forced alignment, the assumption the user is analyzing a long-form recording, and the formatting of the output.

JoFrhwld Oct 20, 2023
Maintainer Author

ah! but even with using the candidate set generated by FastTrack, there should be no need to run it multiple times on one token. Instead, we can hang onto the candidate set for each token within a python object, eventually just returning the winning candidate.

JoeyStanley Nov 1, 2023

I do like empirically comparing methods :) That sounds like something I could actually help with.

chrisbrickhouse · 2023-10-21T15:34:08Z

chrisbrickhouse
Oct 21, 2023
Maintainer

I'm still unclear on the scope of fave-recode. My understanding of the general process from alignment to analysis is:

A Force Aligner gives an output, aligner_output, with an arbitrary coding scheme (e.g. ARPABET, but could be a bitmap for all we care)
A function, wide_targets(), reads aligner_output and returns a preliminary set of intervals to measure, wide_targets(aligner_output)
A function, eliminate_bad_tokens(), reads wide_targets(aligner_output), eliminates intervals which fail particular criteria (e.g. stop words, non-stressed syllables, pre-velar, etc), and returns a final set of intervals to measure, eliminate_bad_tokens(wide_targets(aligner_output))
A function, measure(), iterates over members of the set in eliminate_bad_tokens(wide_targets(aligner_output)) and returns a mapping from set item to measurement, measure(eliminate_bad_tokens(wide_targets(aligner_output)))
A function, remap() takes a mapping from indices to keys, label_map and a mapping from labels to measurements measure(...) and returns a remaps the indices of label_map onto the labels of measure(...) based on an equivalence between keys and labels.

Where in this process is fave-recode intervening? Theoretically, remap() could be done (e.g.) easily by the researcher in R as a postprocess, so I would argue that remap() is out of scope. My guess is that fave-recode is meant to intervene somewhere between aligner_output and measure(), perhaps multiple times? That's really the source of my confusion: where in this functional chain is fave-recode meant to intervene?

My second point, and this goes towards the modularization point in the OP, is that we will want to provide hooks at various points that make it easy for modules to intervene. Wherever fave-recode intervenes, it shouldn't be hard-coded into this functional chain. At each step in this process, fave-extract should check for an modules that want to run on the output, execute them, and then pass their combined output into the next stage. This will make it easier to maintain and extend other modules by removing tight coupling between modules.

1 reply

JoFrhwld Oct 21, 2023
Maintainer Author

Well, in fact in FAVE right now, remap() applies before measure(). The reference set of vowels that fave checks candidate values against are defined in terms of the output of remap(), not the output of align(). E.g., the Atlas of North American English data has a category Tuw. So right now FAVE recodes relevant ARPABET UW1 to Tuw in order to compare candidate measurements the the ANAE data.

So, the chain of operations right now is more like (and apologies, I know you don’t like piping)

(wav, textgrid) |>
  align() |>
  eliminate_bad_tokens() |>
  recode(recoding_scheme) |>
  get_candidates() |>
  get_best_candidates(anae_data) |>
  get_best_candidates(self)

If we include a FAVE-like step in the formant estimation (rather than just straight FastTrack), I think we’d want it to be possible for the (speaker internal) reference set of tokens to be defined by the analyst’s categories. e.g. see if this token of uwl is reasonable against the speaker’s other tokens of uwl, rather than against the speaker’s tokens of UW1.

With the goal of flexibilizing FAVE beyond ARPABET, users will have to provide a list of labels that should be targeted for analysis anyway, and having the default behavior of remap() be a simple identity function when no rewrite rules are provided should keep maintenance of the most generic FAVE operations pretty light weight.

JoFrhwld · 2023-11-01T15:06:37Z

JoFrhwld
Nov 1, 2023
Maintainer Author

I've mocked up what I think the flow of inputs, outputs, and operations should be. A I'm working through fave_recode, I think any inputs that aren't a specialized format (i.e. textgrids or wav files) should be yaml.

flowchart TD
tg --> aligned_textgrid("aligned_textgrid()")
recode_scheme --> fave_recode("fave_recode()")
target_labels --> target_selector
target_heuristics --> target_selector

subgraph pre_processing
   aligned_textgrid --> atg
   atg --> fave_recode
   fave_recode --> ratg
   ratg --> target_selector("target_selector()")
   target_selector --> targets
end

fave_recode -.-> saved_recoded_tg?

targets --> fasttrackpy("fasttrackpy()")
wav --> fasttrackpy
point_heuristic --> point_selector("point_selector()")

subgraph audio_processing
   fasttrackpy --> track_candidates
   track_candidates --> point_selector
   point_selector --> winning_points
   winning_points -.-> point_selector
end

winning_points --> fave_output("fave_output()")
speaker_info --> fave_output

subgraph outputting
   fave_output
end

fave_output --> measured_points
fave_output --> measured_tracks
fave_output --> logs

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Forced Alignment and Vowel Extraction

FAVE Development Plan #1

{{title}}

Replies: 5 comments 10 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Forced Alignment and Vowel Extraction

FAVE Development Plan #1

JoFrhwld Oct 19, 2023 Maintainer

FAVE-classic & new-FAVE

Development Plan

Modularization

✅ alignedTextGrid

🛠️ fave-recode

Formant Estimation

Measurement point heuristics

Just FastTrack?

Prioritization

Replies: 5 comments · 10 replies

JoFrhwld Oct 19, 2023 Maintainer Author

JoFrhwld Oct 20, 2023 Maintainer Author

chrisbrickhouse Oct 21, 2023 Maintainer

chrisbrickhouse Oct 21, 2023 Maintainer

JoFrhwld Oct 21, 2023 Maintainer Author

JoeyStanley Oct 23, 2023

phimee Oct 19, 2023

JoeyStanley Oct 19, 2023

JoFrhwld Oct 20, 2023 Maintainer Author

JoFrhwld Oct 20, 2023 Maintainer Author

JoeyStanley Nov 1, 2023

chrisbrickhouse Oct 21, 2023 Maintainer

JoFrhwld Oct 21, 2023 Maintainer Author

JoFrhwld Nov 1, 2023 Maintainer Author

JoFrhwld
Oct 19, 2023
Maintainer

Replies: 5 comments 10 replies

JoFrhwld
Oct 19, 2023
Maintainer Author

JoFrhwld Oct 20, 2023
Maintainer Author

chrisbrickhouse Oct 21, 2023
Maintainer

chrisbrickhouse Oct 21, 2023
Maintainer

JoFrhwld Oct 21, 2023
Maintainer Author

phimee
Oct 19, 2023

JoeyStanley
Oct 19, 2023

JoFrhwld Oct 20, 2023
Maintainer Author

JoFrhwld Oct 20, 2023
Maintainer Author

chrisbrickhouse
Oct 21, 2023
Maintainer

JoFrhwld Oct 21, 2023
Maintainer Author

JoFrhwld
Nov 1, 2023
Maintainer Author