Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confused about "normLmarks" function #27

Open
tlatlbtle opened this issue Nov 8, 2019 · 4 comments
Open

Confused about "normLmarks" function #27

tlatlbtle opened this issue Nov 8, 2019 · 4 comments

Comments

@tlatlbtle
Copy link

Many thanks for this repo. I am trying to reimplement your training process but I am stucked in data preprocessing.

Actually, I am confused about "normLmarks" function.

  1. I wonder that when there only exist one face for one frame ( len(lmarks.shape) == 2 ), will "normLmarks" always output with the same results? I mark related lines in your code with "#". It seems @ssinha89 also found this issue. How is head pose taken into account for VGnet #17 (comment).

  2. Would you tell me more about the meaning of "init_params", "params" and "predicted"? What does "S" or "SK" mean here? I know you use "procrustes" to align each landmarks to mean face, but I am confused about the process after that. Or could you provide some related papers for how to do that?

def normLmarks(lmarks):
    norm_list = []
    idx = -1
    max_openness = 0.2
    mouthParams = np.zeros((1, 100))
    mouthParams[:, 1] = -0.06
    tmp = deepcopy(MSK)
    tmp[:, 48*2:] += np.dot(mouthParams, SK)[0, :, 48*2:]
    open_mouth_params = np.reshape(np.dot(S, tmp[0, :] - MSK[0, :]), (1, 100))

    if len(lmarks.shape) == 2:
        lmarks = lmarks.reshape(1,68,2)
    for i in range(lmarks.shape[0]):
        mtx1, mtx2, disparity = procrustes(ms_img, lmarks[i, :, :])
        mtx1 = np.reshape(mtx1, [1, 136])
        mtx2 = np.reshape(mtx2, [1, 136])
        norm_list.append(mtx2[0, :])
    pred_seq = []
    init_params = np.reshape(np.dot(S, norm_list[idx] - mtx1[0, :]), (1, 100))
    for i in range(lmarks.shape[0]):
        params = np.reshape(np.dot(S, norm_list[i] - mtx1[0, :]), (1, 100)) - init_params - open_mouth_params
######## "params" will always be equal to  (-open_mouth_params) ######## 
        predicted = np.dot(params, SK)[0, :, :] + MSK
        pred_seq.append(predicted[0, :])
    return np.array(pred_seq), np.array(norm_list), 1
@lelechen63
Copy link
Owner

Many thanks for this repo. I am trying to reimplement your training process but I am stucked in data preprocessing.

Actually, I am confused about "normLmarks" function.

  1. I wonder that when there only exist one face for one frame ( len(lmarks.shape) == 2 ), will "normLmarks" always output with the same results? I mark related lines in your code with "#". It seems @ssinha89 also found this issue. #17 (comment).
  2. Would you tell me more about the meaning of "init_params", "params" and "predicted"? What does "S" or "SK" mean here? I know you use "procrustes" to align each landmarks to mean face, but I am confused about the process after that. Or could you provide some related papers for how to do that?
def normLmarks(lmarks):
    norm_list = []
    idx = -1
    max_openness = 0.2
    mouthParams = np.zeros((1, 100))
    mouthParams[:, 1] = -0.06
    tmp = deepcopy(MSK)
    tmp[:, 48*2:] += np.dot(mouthParams, SK)[0, :, 48*2:]
    open_mouth_params = np.reshape(np.dot(S, tmp[0, :] - MSK[0, :]), (1, 100))

    if len(lmarks.shape) == 2:
        lmarks = lmarks.reshape(1,68,2)
    for i in range(lmarks.shape[0]):
        mtx1, mtx2, disparity = procrustes(ms_img, lmarks[i, :, :])
        mtx1 = np.reshape(mtx1, [1, 136])
        mtx2 = np.reshape(mtx2, [1, 136])
        norm_list.append(mtx2[0, :])
    pred_seq = []
    init_params = np.reshape(np.dot(S, norm_list[idx] - mtx1[0, :]), (1, 100))
    for i in range(lmarks.shape[0]):
        params = np.reshape(np.dot(S, norm_list[i] - mtx1[0, :]), (1, 100)) - init_params - open_mouth_params
######## "params" will always be equal to  (-open_mouth_params) ######## 
        predicted = np.dot(params, SK)[0, :, :] + MSK
        pred_seq.append(predicted[0, :])
    return np.array(pred_seq), np.array(norm_list), 1

Please refer to https://github.com/eeskimez/Talking-Face-Landmarks-from-Speech for audio to landmark part

@tlatlbtle
Copy link
Author

Many thanks for this repo. I am trying to reimplement your training process but I am stucked in data preprocessing.
Actually, I am confused about "normLmarks" function.

  1. I wonder that when there only exist one face for one frame ( len(lmarks.shape) == 2 ), will "normLmarks" always output with the same results? I mark related lines in your code with "#". It seems @ssinha89 also found this issue. #17 (comment).
  2. Would you tell me more about the meaning of "init_params", "params" and "predicted"? What does "S" or "SK" mean here? I know you use "procrustes" to align each landmarks to mean face, but I am confused about the process after that. Or could you provide some related papers for how to do that?
def normLmarks(lmarks):
    norm_list = []
    idx = -1
    max_openness = 0.2
    mouthParams = np.zeros((1, 100))
    mouthParams[:, 1] = -0.06
    tmp = deepcopy(MSK)
    tmp[:, 48*2:] += np.dot(mouthParams, SK)[0, :, 48*2:]
    open_mouth_params = np.reshape(np.dot(S, tmp[0, :] - MSK[0, :]), (1, 100))

    if len(lmarks.shape) == 2:
        lmarks = lmarks.reshape(1,68,2)
    for i in range(lmarks.shape[0]):
        mtx1, mtx2, disparity = procrustes(ms_img, lmarks[i, :, :])
        mtx1 = np.reshape(mtx1, [1, 136])
        mtx2 = np.reshape(mtx2, [1, 136])
        norm_list.append(mtx2[0, :])
    pred_seq = []
    init_params = np.reshape(np.dot(S, norm_list[idx] - mtx1[0, :]), (1, 100))
    for i in range(lmarks.shape[0]):
        params = np.reshape(np.dot(S, norm_list[i] - mtx1[0, :]), (1, 100)) - init_params - open_mouth_params
######## "params" will always be equal to  (-open_mouth_params) ######## 
        predicted = np.dot(params, SK)[0, :, :] + MSK
        pred_seq.append(predicted[0, :])
    return np.array(pred_seq), np.array(norm_list), 1

Please refer to https://github.com/eeskimez/Talking-Face-Landmarks-from-Speech for audio to landmark part

Thanks. I get the information.

@hot-dog
Copy link

hot-dog commented May 25, 2020

@wjbKimberly @lelechen63 Hi there, i also encountered the same problem. I want to train the ATNet with my own dataset, the landmark data is preprocessed using code extracted from demo.py, the preprocessed landmark data is all the same. Now what confuse me is that is this normal? If this is not correct, did you solve this problem? Could you give some suggestion where went wrong? Thank you!

@liangzz1991
Copy link

@wjbKimberly @lelechen63 Hi there, i also encountered the same problem. I want to train the ATNet with my own dataset, the landmark data is preprocessed using code extracted from demo.py, the preprocessed landmark data is all the same. Now what confuse me is that is this normal? If this is not correct, did you solve this problem? Could you give some suggestion where went wrong? Thank you!

@hot-dog i find 'example_landmark' nerver change in demo.py when you change template image of input, similar to 'mean value' does not need to change? what should i do when training? but it does not match with picture in paper? confused,,,,,,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants