About the constructing of latent mask #27

zhuhz22 · 2024-10-02T07:58:39Z

Hi, excellent work! A question here:
I noticed that in svd_interpolate_single_img_traj.py: 1082, the mask is directly reshaped from (576, 1024) to (72, 8, 128, 8) and then to (72,128,64) :

mask_erosion = mask_erosion.reshape(72,8,128,8).transpose(0,2,1,3).reshape(72,128,64)

As far as I understand, this reshaped mask will be used in the latent space for Eq. (14) and so on. However, why can the mask match the latents encoded by the VAE? In other words, the parts of the warped images that are known in pixel space should correspond to the parts of the original mask that are 1. However, how can we ensure that the parts of the warped images that are known in pixel space still correspond to the parts of the mask that are 1 after encoding into the latent space?

The text was updated successfully, but these errors were encountered:

mengyou2 · 2024-10-03T10:03:52Z

Thanks for your interest in our project. The VAE uses CNN to map the image from pixel space to latent space. CNNs are known for their ability to capture spatial structure through convolutional layers. The latent representation still retains a lower-resolution version of the spatial structure of the original image. Therefore, the encoder preserves the spatial relationships between different parts of the image, even though the resolution may be reduced.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the constructing of latent mask #27

About the constructing of latent mask #27

zhuhz22 commented Oct 2, 2024 •

edited

Loading

mengyou2 commented Oct 3, 2024

About the constructing of latent mask #27

About the constructing of latent mask #27

Comments

zhuhz22 commented Oct 2, 2024 • edited Loading

mengyou2 commented Oct 3, 2024

zhuhz22 commented Oct 2, 2024 •

edited

Loading