-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
about the coordinate system #9
Comments
Hi @hdzmtsssw , In the PyTorch3D NDC coordinate, "+X points left, and +Y points up and +Z points out from the image plane". In COLMAP, "X axis points to the right, the Y axis to the bottom, and the Z axis to the front as seen from the image." Because PyTorch3D conducts multiplication by left, we need to transpose the rotation matrix. I will provide a transform function when we release the evaluation code. If you want to do it early, you can try something like below (I do not have time to carefully check it now so please be careful):
Ground truth pose is not necessary. We just use it to show how to compute the error metric. Actually the model coordinate in NDC was not "compressed". For a given set of images, people cannot get its real "scale". So different methods just use some normalised units.
|
@jytime Thanks for your fast reply! I plan to use your method to replace Colmap in obtaining poses for NeRF training. Do I need to provide the ground truth pose from Colmap to obtain the correct scale and aligned version? If so, which coordinate system should the ground truth pose follow(PyTorch3D NDC coordinate?) and which transform function should it use(the same code you provided?)? Lastly, could you please let me know when the script for NeRF training will be released?
|
Hi @hdzmtsssw , The result looks a bit weird. Forward-facing images should be okay. Did you use GGS for this result? Or can you share some of the images? |
Hi, @jytime Unfortunately, the dataset is confidential, so I cannot share the images. The camera array was fan-shaped and captured four people standing in front of a green screen, with varying intrinsics between the cameras. In another scene captured using the same inward-facing camera array as before, the transformed predict_cameras appear to be outward-facing. I also tried the LLFF forward-facing dataset, such as the fern scene, which appears to be correct(?), but it seems that it is not aligned. I attempted to align it, but the result does not match with Colmap. |
Hi @hdzmtsssw , I guess the inward-facing and outward-facing problem comes from the coordinate transform, e.g., "transpose the rotation matrix or not" or "coordinate xyz direction ". I would suggest to check if the coordinate transform code works well. Regarding "not aligned", how did you conduct camera alignment? I am a bit confused. For example, we can see the scales of Fig 2 and Fig3 are quite different. There may be multiple solutions for alignment, e.g., (1) use "corresponding_cameras_alignment" (2) a simple and fast way force the first camera in each camera set to be the origin, and use the second camera to compute the alignment matrix. The second solution may be quite inaccurate but can give you a quick insight on how they look like. But please be aware that the input and target cameras must stay in the same coordinate system. By the way, if you use the "ground truth" camera poses from LLFF dataset, please note that LLFF has its own coordinate system. If you could provide a minimal, reproducible example of the code on LLFF, I'd be happy to take a look. |
Hi @jytime , thanks for your reply. Here is the example code. # transform.py :get gt_cameras.npy(left, up, forward) from NeRF poses(right, up, backward)
pose = np.load('cams_meta.npy')
bottom = np.tile(np.array([0., 0., 0., 1.]).reshape(1, 1, -1), (pose.shape[0], 1, 1))
K = pose[:, 12:21].reshape(-1, 3, 3)
focal = np.concatenate([K[:, 0:1, 0], K[:, 1:2, 1]], -1)
c2w = pose[:, :12].reshape(-1, 3, 4) # (x, y, z): (right, up, backward)
c2w_new = np.concatenate([-c2w[:, :, 0:1], c2w[:, :, 1:2], -c2w[:, :, 2:3], c2w[:, :, 3:]], 2) # transform to (-x, y, -z): (left, up, forward), which LLFF used
c2w_new = np.concatenate([c2w_new, bottom], -2)
w2c = np.linalg.inv(c2w_new)
T = w2c[:, :3, 3]
R = np.transpose(w2c[:, :3, :3], (0, 2, 1)) # right mul to left mul?
np.savez('gt_cameras.npz', gtR=R, gtT=T, gtFL=focal)
# demo.py :get pred_cameras w2c(I think so)
R = pred_cameras.R.cpu().numpy()
T = pred_cameras.T.unsqueeze(-1).cpu().numpy()
poses = np.concatenate([R, T], axis=-1)
np.save(os.path.join(folder_path, "pred_cameras.npy"), poses)
# then use corresponding_cameras_alignment to align Fern: For visualization, transform to c2w: pose = np.load('pred_cameras.npy"')
R = np.transpose(pose[:, :3, :3], (0, 2, 1)) # left mul to right mul?
pose[:, :3, :3] = R
bottom = np.tile(np.array([0., 0., 0., 1.]).reshape(1, 1, -1), (pose.shape[0], 1, 1))
w2c = np.concatenate([pose, bottom], -2)
c2w = np.linalg.inv(c2w_new)
c2w_new = np.concatenate([-c2w[:, :, 0:1], c2w[:, :, 1:2], -c2w[:, :, 2:3], c2w[:, :, 3:]], 2) # transform to (x, y, z)
...visualization... |
Hi @hdzmtsssw , After a quick glance, I am a bit confused about the source of 'cams_meta.npy'. Does this contain the cameras from COLMAP, LLFF, or any other source? Based on the comment, However, for COLMAP, it uses "right, down, forwards". For LLFF, it uses "down, right, backwards". You can find the corresponding information as discussed in LLFF. Would this be a source of the problem? |
Hi, @jytime , |
![]() Hi @hdzmtsssw , It looks the main problem is that the motion for fern is very small, so we need alignment with higher accuracy. You can try on your own code like:
Or you can use my code to reproduce the visualisation above, which uses visdom. Please note that in any case, the alignment cannot be perfect. So the visualisation can only give you a sense of how the structure is. |
Hi there, thanks for your great work!
I'm a bit confused about the coordinate system. Could you please explain how to transform the coordinate system to Colmap style? It seems that the model coordinates in NDC are compressed. Also, I was wondering if the ground truth pose is necessary and, If so, which coordinate system it should follow. Additionally, what's the purpose of
corresponding_cameras_alignment
?0-29: [pred_cameras.R, pred_cameras.T]

30-59: colmap
The text was updated successfully, but these errors were encountered: