New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

一些问题 #12

Open

YoungSeng opened this issue Sep 5, 2023 · 0 comments

Owner

YoungSeng commented Sep 5, 2023

文中有说按0.5s去切分动作，但看代码好像没有这个逻辑？这个逻辑在哪里呢？

每次匹配是4个codes（Gesture VQVAE的Downsample rate=8，即对应32帧，BEAT为60fps，即约为0.5s）的划窗生成

动作的vq-vae是4s的，我理解是每4s的动作和4s的语音去匹配，不知道是否正确？我测试发现小于4s的语音是没法生成动作的

手势库中的database是按每32个codes一段保存的，是的，没有写对短于4s的padding代码；其实对长于4s的的最后一段不足4s的好像也没有处理直接丢掉了

Levenshtein distance在代码里并没有默认启用，这是为什么呢?

抱歉有疑问，是的，在生成自己的音频时不是总是文本信息，所以方便测试默认用语音的音频，代码很杂乱，供参考

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment