basic infomation about chinese #34

Adorablepet · 2020-05-21T06:21:11Z

Thanks for sharing your code.I ran the Chinese audio file with your demo, and my lips were not coordinated. Is there any solution? Does your model plan to train on the Chinese lip dataset?Thanks.

Adorablepet · 2020-05-27T08:03:59Z

@lelechen63 in lrw_data.py, what's the difference between generating_landmark_lips function and generating_demo_landmark_lips function? One landmark_path is landmark1d, the other is landmark3d. But when training the atnet model, it uses self.lmark_root_path = '../dataset/landmark1d'. I hope that you can explain it. Thanks.

Adorablepet · 2020-05-27T10:24:10Z

@lelechen63 Could it be understood that these two functions are two methods for extracting landmarks, and in demo.py are selected landmark1d.

Adorablepet · 2020-05-29T08:55:23Z

@lelechen63 I am a bit confused about landmarks. Does this parameter distinguish between training and testing? Is the PCA the same? U_lrw1.npy belongs to the training set, does the test set also contain a U_lrw1_test.npy? When I looked at the source code, I found that both training and testing used U_lrw1.py.Thanks.

lelechen63 · 2020-05-29T13:04:41Z

Thanks for sharing your code.I ran the Chinese audio file with your demo, and my lips were not coordinated. Is there any solution? Does your model plan to train on the Chinese lip dataset?Thanks.

The released model is trained on English, but it can be tested on any other language. The reason is that we consider the audio input as 0.04 seconds, which is not sensitive to linguistic(semantic) information about the language type.

lelechen63 · 2020-05-29T13:13:15Z

@lelechen63 in lrw_data.py, what's the difference between generating_landmark_lips function and generating_demo_landmark_lips function? One landmark_path is landmark1d, the other is landmark3d. But when training the atnet model, it uses self.lmark_root_path = '../dataset/landmark1d'. I hope that you can explain it. Thanks.

I will clean the code again this month. I will notify you once I finished it. The main process for the landmark is 2 steps: 1. align the image using affine transformation 2. detect landmark. In the original code, we have the third step: normalize the landmark. But actually we do not need this step.

lelechen63 · 2020-05-29T13:14:36Z

@lelechen63 I am a bit confused about landmarks. Does this parameter distinguish between training and testing? Is the PCA the same? U_lrw1.npy belongs to the training set, does the test set also contain a U_lrw1_test.npy? When I looked at the source code, I found that both training and testing used U_lrw1.py.Thanks.

The PCA for train and test are same. PCA parameters are extracted from train set and can be used for any videos including test set or videos in the wild.

Adorablepet · 2020-06-01T01:35:07Z

Thanks for sharing your code.I ran the Chinese audio file with your demo, and my lips were not coordinated. Is there any solution? Does your model plan to train on the Chinese lip dataset?Thanks.

The released model is trained on English, but it can be tested on any other language. The reason is that we consider the audio input as 0.04 seconds, which is not sensitive to linguistic(semantic) information about the language type.

Regarding your answer, can I understand that the audio is not the same as the lips, in fact, it has nothing to do with the training language, is it related to the model itself?

Adorablepet · 2020-06-03T08:53:18Z

@lelechen63 Could you release the parameters of the training AT-net and VG-net?Otherwise, it is difficult for us to achieve the effect in the paper.Thanks.

Adorablepet · 2020-06-11T09:03:27Z

@lelechen63What are the meanings of new_16_full_gt_train.pkl and region_16_wrap_gt_train2.pkl , can you explain? lrw_data.py is not very clear .Thanks.

liangzz1991 · 2020-06-18T12:23:00Z

@lelechen63What are the meanings of new_16_full_gt_train.pkl and region_16_wrap_gt_train2.pkl , can you explain? lrw_data.py is not very clear .Thanks.

臣附议，，，， @lelechen63

Adorablepet · 2020-07-22T09:59:08Z

Thanks for sharing your code.I ran the Chinese audio file with your demo, and my lips were not coordinated. Is there any solution? Does your model plan to train on the Chinese lip dataset?Thanks.

The released model is trained on English, but it can be tested on any other language. The reason is that we consider the audio input as 0.04 seconds, which is not sensitive to linguistic(semantic) information about the language type.

what is the mean of 0.04?winlen or winstep of ``mfcc?

Owen-Fish · 2022-02-11T07:20:04Z

why face normalization is no need? From my point of view, individual face shape are not different, which also contain rotations(raw, yaw, pitch). All of this parameters are not relevant to audio inputs. So I'm wondering why normalization is no need? Hope for your reply^

DWCTOD mentioned this issue Apr 12, 2021

Preprocess the data #39

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

basic infomation about chinese #34

basic infomation about chinese #34

Adorablepet commented May 21, 2020

Adorablepet commented May 27, 2020

Adorablepet commented May 27, 2020

Adorablepet commented May 29, 2020

lelechen63 commented May 29, 2020

lelechen63 commented May 29, 2020

lelechen63 commented May 29, 2020

Adorablepet commented Jun 1, 2020

Adorablepet commented Jun 3, 2020

Adorablepet commented Jun 11, 2020

liangzz1991 commented Jun 18, 2020

Adorablepet commented Jul 22, 2020

Owen-Fish commented Feb 11, 2022

basic infomation about chinese #34

basic infomation about chinese #34

Comments

Adorablepet commented May 21, 2020

Adorablepet commented May 27, 2020

Adorablepet commented May 27, 2020

Adorablepet commented May 29, 2020

lelechen63 commented May 29, 2020

lelechen63 commented May 29, 2020

lelechen63 commented May 29, 2020

Adorablepet commented Jun 1, 2020

Adorablepet commented Jun 3, 2020

Adorablepet commented Jun 11, 2020

liangzz1991 commented Jun 18, 2020

Adorablepet commented Jul 22, 2020

Owen-Fish commented Feb 11, 2022