Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Info regarding velocity implementation #31

Open
arstek131 opened this issue Jun 8, 2024 · 7 comments
Open

Info regarding velocity implementation #31

arstek131 opened this issue Jun 8, 2024 · 7 comments

Comments

@arstek131
Copy link

Hi, thank you for your nice work.
I've mainly two questions, regarding the concept of velocity in you paper and implementation.

  1. Could you argument more about the mean when it's time dependent? $\tilde{\mu}(t) = \mu + \frac{l}{2\pi} \cdot \sin\left( 2\pi \frac{t - \tau}{l} \right) \cdot v$
    Why did you model it using sin()? What is the reason behind this choice?
    Also could you explain better $v = \left. \frac{d\tilde{\mu}(t)}{dt} \right|_{t=\tau}$ I got that it's the instant velocity, but how it's interpreted in the code? What is the unit of measure?

  2. Regarding the code implementation In train.py at each iteration you calculate velocity like this:
    v = gaussians.get_inst_velocity
    Then you pass it to the render function
    render_pkg = render(viewpoint_cam, gaussians, args, background, env_map=env_map, other=other, time_shift=time_shift, is_training=True)
    Once rendering is completed you get the render velocity as:
    feature = render_pkg['feature'] / alpha.clamp_min(EPS)
    v_map = feature[1:]

And v_map is a torch tensor with 3 channels, and I suppose that each channel describes the instantaneous velocity of that point in the x, y, and z directions respectively.
In which values this v_map is normalized? What is the unit of measure?

Thanks

@Fumore
Copy link

Fumore commented Jun 9, 2024

Hi, sorry for the confusion. 1. we use sin() because this kind of periodic function can both model the dynamic and static well. ( when $\beta$ is small, the point move linearly and fade away, while when $\beta$ is large, it tends to be static around $\mu$.) The dimension of $v$ is $m/s$ and the dimension of $l$ is $s$. We parameter the $v$ in gaussians._velocity. There's some naming confusion about gaussians.get_inst_velocity, actually we get the $\bar{v}$ here instead of the instant velocity at a certain time.
2. The understanding of v_map is right. And it is normalized by accumulative opacity alpha which is a nondimension parameter.

@arstek131
Copy link
Author

arstek131 commented Jun 9, 2024

Hi, thank you for the clarifications!
So if I get it right gaussians.get_inst_velocity is the $\bar{v}$ (average velocity) that in the paper is defined as $\bar{v} = v \cdot \exp(-\frac{\rho}{2})$ while $v$ is gaussians._velocity that in the paper is defined as instant velocity $v = \left. \frac{d\tilde{\mu}(t)}{dt} \right|_{t=\tau}$

So v_map represents the rendered average velocity and not the instantaneous?

I see by debugging the code that gaussians._velocity is a tensor torch.Size([2146010, 3]) (which I think represents for each Gaussian point the velocity in x,y,z).
Now, for each frame in the scene I've available the ground truth velocity (instant velocity) of the objects, represented as a torch tensor (H, W), where each pixel has a velocity value (basically I've the velocity map).

Do you have any suggestion about which velocity from the model I should use and how? My goal is to supervise the predicted velocity with the ground truth one I've.
If you feel more comfortable, you can pm me.
Thanks!

@Fumore
Copy link

Fumore commented Jun 11, 2024

okey, I think using map of velocity which is used in temproal smoothing is more reasonable, i.e. the map of $\bar{v}$. Because we actually use $\bar{v}$ as a estimated 3D scene flow for self-supervision (temproal smoothing).

@arstek131
Copy link
Author

Great, thanks for you reply.

When the velocity and other features are passed to the rasterizer, in case of the velocity map, what is the meaning of the values pixelwise of the rasterized image (v_map)? Because as far as I've understood they don't represent velocity values in $m/s$, how should I interpret them?

Thanks

@Fumore
Copy link

Fumore commented Jun 15, 2024

Why doesn't the v_map indicates velocity in $m/s$ (each channel)? Roughly speaking, each pixel represents the expectation of velocity on the corresponding ray (using alpha blending weights as the probability distribution).

@arstek131
Copy link
Author

Ok, but how should I interpret this pixel representation? For example, it is possible to recover the velocity, in $m/s$, from the rendered v_map? If yes, how?

@Fumore
Copy link

Fumore commented Jun 16, 2024

Such as projecting the objects' velocity as well as their masks to the camera images to get the GT v_map label, or using depth map and back project the v_map to the 3D space as point cloud or directly supervising the PVG points.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants