[Help]: more duration predictor details & '50' meaning of target len in text2semantic of MaskGCT #346

dobby-seo · 2024-11-13T08:41:04Z

Problem Overview

(Briefly and clearly describe the issue you're facing and seeking help with.)
I want to understand whether the duration predictor is prepared for getting duration as training data or not. Additionally, I'm also curious about number '50' of rule-based calculating duration in text2semantic.

Steps Taken

1. duration predictor
In this paper, we also train a flow matching [45]
based duration prediction model to predict the total duration conditioned on the text and prompt
speech duration, leveraging in-context learning. More details can be found in Appendix A.5.

Is the duration predictor used for generating duration as training input for text to semantic model? Or just given approximated duration is used for evaluating?

2. '50' meaning of target len in text2semantic of MaskGCT
Below code is snippet of text2semantic in maskgct.

@torch.no_grad()
def text2semantic(
    self,
    prompt_speech,
    prompt_text,
    prompt_language,
    target_text,
    target_language,
    target_len=None,
    n_timesteps=50,
    cfg=2.5,
    rescale_cfg=0.75,
):
    prompt_phone_id = g2p_(prompt_text, prompt_language)[1]
    target_phone_id = g2p_(target_text, target_language)[1]

    if target_len is None:
        target_len = int(
            (len(prompt_speech) * len(target_phone_id) / len(prompt_phone_id))
            / 16000
            * 50
        )
    else:
        target_len = int(target_len * 50)

I was trying to find out this value in paper, but i wasn't.
I'm curious about what constant number '50' is. I guess this number is minimum frame numbers for uttering a one phoneme. Please let me know this number 🥲

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Help]: more duration predictor details & '50' meaning of target len in text2semantic of MaskGCT #346

[Help]: more duration predictor details & '50' meaning of target len in text2semantic of MaskGCT #346

dobby-seo commented Nov 13, 2024 •

edited

Loading

[Help]: more duration predictor details & '50' meaning of target len in text2semantic of MaskGCT #346

[Help]: more duration predictor details & '50' meaning of target len in text2semantic of MaskGCT #346

Comments

dobby-seo commented Nov 13, 2024 • edited Loading

Problem Overview

Steps Taken

dobby-seo commented Nov 13, 2024 •

edited

Loading