Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: SequenceInterval splitting #171

Open
1 task done
JoFrhwld opened this issue Feb 28, 2024 · 3 comments
Open
1 task done

[Feature Request]: SequenceInterval splitting #171

JoFrhwld opened this issue Feb 28, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@JoFrhwld
Copy link
Member

What feature would you like added?

Right now, it is only possible to fuse intervals leftwards or rightwards, but impossible to split an interval. Thinking about how the SequenceInterval.split() could work:

Splitting

On timestamps

interval.split(at_times = [2.31, 2.35])

This should add new interval boundaries at the given times (with the interval's start and end time implicit). This example would result in 3 intervals.

On percentage time

interval.split(at_proportion = [0.2, 0.7]

This would place boundaries at 20% and 70% of the duration of the interval, resulting in 3 intervals.

On the subset

interval.split(on_subset = True)

This should, perhaps, be the default behavior. This would split the interval into sub-intervals based on the timestamps of its subset intervals.

Labelling

Explicit labels

interval.split(
  at_proportion = [0.2, 0.7],
  labels = ["a", "b", "c"]
)

Label Fun

def label_sequential(label, sequence_len):
  label_rep = label * sequence_len
  labels = [
    f"{lab}-{num}" 
    for lab, num in zip(label_rep, range(sequence_len))
  ]
  return labels

def label_rep(label, sequence_len):
  label_rep = label * sequence_len
  return label_rep

def label_blank(label, sequence_len):
  label_rep = "" * sequence_len
  return label_rep


interval.split(
  at_proportion = [0.2, 0.7],
  label_fun = sequential_number
)

What would the use case be for this feature?

When creating new-sub-interval tiers based in analytical landmarks. E.g.

  • Splitting an initial phrase interval into sub-phrases based on the presence of silences in the aligned word-tier.
  • Splitting a word interval into morphemic components, based on timing in the phone tier.
  • Splitting a phone interval into onset, target, offset components, based on acoustic measurements.

Would you like to help add this feature?

Yes, and I will submit a pull request soon.

Code of Conduct

  • I agree to follow this project's Code of Conduct
@JoFrhwld JoFrhwld added the enhancement New feature or request label Feb 28, 2024
@JoFrhwld JoFrhwld added this to the v0.7 milestone Feb 28, 2024
@chrisbrickhouse
Copy link
Member

@JoFrhwld Before you get deep in the weeds on this, I already have some code that does something similar in a project I've yet to push. I'm at a different machine today though, so I'll have to get the demo to you later this evening.

@chrisbrickhouse
Copy link
Member

See the TextGrid class in this repo for some example code. It's very quick and dirty, so there are probably ways to optimize the splitting algorithm for things like phrases, small pauses, priority tier groups, etc, but it does a pretty good job of isolating phrases from a word/phone grid.

@JoFrhwld
Copy link
Member Author

JoFrhwld commented Mar 1, 2024

That's real cool! I don't think I'd implement anything as particular as logic for splitting a phrase into subphrases. More like just convenience functions for any given kind of splitting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

No branches or pull requests

2 participants