Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the constructing of latent mask #27

Open
zhuhz22 opened this issue Oct 2, 2024 · 1 comment
Open

About the constructing of latent mask #27

zhuhz22 opened this issue Oct 2, 2024 · 1 comment

Comments

@zhuhz22
Copy link

zhuhz22 commented Oct 2, 2024

Hi, excellent work! A question here:
I noticed that in svd_interpolate_single_img_traj.py: 1082, the mask is directly reshaped from (576, 1024) to (72, 8, 128, 8) and then to (72,128,64) :

mask_erosion = mask_erosion.reshape(72,8,128,8).transpose(0,2,1,3).reshape(72,128,64)

As far as I understand, this reshaped mask will be used in the latent space for Eq. (14) and so on. However, why can the mask match the latents encoded by the VAE? In other words, the parts of the warped images that are known in pixel space should correspond to the parts of the original mask that are 1. However, how can we ensure that the parts of the warped images that are known in pixel space still correspond to the parts of the mask that are 1 after encoding into the latent space?

@mengyou2
Copy link
Collaborator

mengyou2 commented Oct 3, 2024

Thanks for your interest in our project. The VAE uses CNN to map the image from pixel space to latent space. CNNs are known for their ability to capture spatial structure through convolutional layers. The latent representation still retains a lower-resolution version of the spatial structure of the original image. Therefore, the encoder preserves the spatial relationships between different parts of the image, even though the resolution may be reduced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants