You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I was going through the gesticulator codebase and using GRU for speech feature encoding. I noticed that before sending the curr_speech input to GRU, you keep the first dimension as the batch_size and the second dimension as the temporal size. So batch_first=True flag should be used to initialize GRU layer in my opinion. Please let me know if this is the case. Thank you for sharing your awesome work :)
The text was updated successfully, but these errors were encountered:
Hi @ra1995. Thank you for raising your concern. It has been more than 3 years since I developed this model, so I don't remember exactly how I was doing things. But after a brief look at the code I agree with you. It does seems that the batch_size was the first dimension, which seems common to me. Since the code did not break, I assume that this was probably the default situation in the PyTorch version used ... but I am not sure.
Does this cause you an issue?
Yes, the model was not converging correctly for my custom dataset without the batch_first argument. After doing the necessary changes, its performing much better
Oh, that's very interesting! @ra1995, can you please make a PR with these changes? ( I could do it myself, but if you make a pull request - you will have the credit for finding this)
Hi, I was going through the gesticulator codebase and using GRU for speech feature encoding. I noticed that before sending the curr_speech input to GRU, you keep the first dimension as the batch_size and the second dimension as the temporal size. So batch_first=True flag should be used to initialize GRU layer in my opinion. Please let me know if this is the case. Thank you for sharing your awesome work :)
The text was updated successfully, but these errors were encountered: