Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

basic infomation about chinese #34

Open
Adorablepet opened this issue May 21, 2020 · 12 comments
Open

basic infomation about chinese #34

Adorablepet opened this issue May 21, 2020 · 12 comments

Comments

@Adorablepet
Copy link

Thanks for sharing your code.I ran the Chinese audio file with your demo, and my lips were not coordinated. Is there any solution? Does your model plan to train on the Chinese lip dataset?Thanks.

@Adorablepet
Copy link
Author

@lelechen63 in lrw_data.py, what's the difference between generating_landmark_lips function and generating_demo_landmark_lips function? One landmark_path is landmark1d, the other is landmark3d. But when training the atnet model, it uses self.lmark_root_path = '../dataset/landmark1d'. I hope that you can explain it. Thanks.

@Adorablepet
Copy link
Author

@lelechen63 Could it be understood that these two functions are two methods for extracting landmarks, and in demo.py are selected landmark1d.

@Adorablepet
Copy link
Author

@lelechen63 I am a bit confused about landmarks. Does this parameter distinguish between training and testing? Is the PCA the same? U_lrw1.npy belongs to the training set, does the test set also contain a U_lrw1_test.npy? When I looked at the source code, I found that both training and testing used U_lrw1.py.Thanks.

@lelechen63
Copy link
Owner

Thanks for sharing your code.I ran the Chinese audio file with your demo, and my lips were not coordinated. Is there any solution? Does your model plan to train on the Chinese lip dataset?Thanks.

The released model is trained on English, but it can be tested on any other language. The reason is that we consider the audio input as 0.04 seconds, which is not sensitive to linguistic(semantic) information about the language type.

@lelechen63
Copy link
Owner

@lelechen63 in lrw_data.py, what's the difference between generating_landmark_lips function and generating_demo_landmark_lips function? One landmark_path is landmark1d, the other is landmark3d. But when training the atnet model, it uses self.lmark_root_path = '../dataset/landmark1d'. I hope that you can explain it. Thanks.

I will clean the code again this month. I will notify you once I finished it. The main process for the landmark is 2 steps: 1. align the image using affine transformation 2. detect landmark. In the original code, we have the third step: normalize the landmark. But actually we do not need this step.

@lelechen63
Copy link
Owner

@lelechen63 I am a bit confused about landmarks. Does this parameter distinguish between training and testing? Is the PCA the same? U_lrw1.npy belongs to the training set, does the test set also contain a U_lrw1_test.npy? When I looked at the source code, I found that both training and testing used U_lrw1.py.Thanks.

The PCA for train and test are same. PCA parameters are extracted from train set and can be used for any videos including test set or videos in the wild.

@Adorablepet
Copy link
Author

Thanks for sharing your code.I ran the Chinese audio file with your demo, and my lips were not coordinated. Is there any solution? Does your model plan to train on the Chinese lip dataset?Thanks.

The released model is trained on English, but it can be tested on any other language. The reason is that we consider the audio input as 0.04 seconds, which is not sensitive to linguistic(semantic) information about the language type.

Regarding your answer, can I understand that the audio is not the same as the lips, in fact, it has nothing to do with the training language, is it related to the model itself?

@Adorablepet
Copy link
Author

@lelechen63 Could you release the parameters of the training AT-net and VG-net?Otherwise, it is difficult for us to achieve the effect in the paper.Thanks.

@Adorablepet
Copy link
Author

@lelechen63What are the meanings of new_16_full_gt_train.pkl and region_16_wrap_gt_train2.pkl , can you explain? lrw_data.py is not very clear .Thanks.

@liangzz1991
Copy link

@lelechen63What are the meanings of new_16_full_gt_train.pkl and region_16_wrap_gt_train2.pkl , can you explain? lrw_data.py is not very clear .Thanks.

臣附议,,,, @lelechen63

@Adorablepet
Copy link
Author

Thanks for sharing your code.I ran the Chinese audio file with your demo, and my lips were not coordinated. Is there any solution? Does your model plan to train on the Chinese lip dataset?Thanks.

The released model is trained on English, but it can be tested on any other language. The reason is that we consider the audio input as 0.04 seconds, which is not sensitive to linguistic(semantic) information about the language type.

what is the mean of 0.04?winlen or winstep of ``mfcc?

@Owen-Fish
Copy link

why face normalization is no need? From my point of view, individual face shape are not different, which also contain rotations(raw, yaw, pitch). All of this parameters are not relevant to audio inputs. So I'm wondering why normalization is no need? Hope for your reply^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants