about the coordinate system #9

hdzmtsssw · 2023-08-01T07:23:55Z

Hi there, thanks for your great work!
I'm a bit confused about the coordinate system. Could you please explain how to transform the coordinate system to Colmap style? It seems that the model coordinates in NDC are compressed. Also, I was wondering if the ground truth pose is necessary and, If so, which coordinate system it should follow. Additionally, what's the purpose of corresponding_cameras_alignment?

0-29: [pred_cameras.R, pred_cameras.T]
30-59: colmap

The text was updated successfully, but these errors were encountered:

jytime · 2023-08-01T14:08:05Z

Hi @hdzmtsssw ,

In the PyTorch3D NDC coordinate, "+X points left, and +Y points up and +Z points out from the image plane". In COLMAP, "X axis points to the right, the Y axis to the bottom, and the Z axis to the front as seen from the image." Because PyTorch3D conducts multiplication by left, we need to transpose the rotation matrix.

I will provide a transform function when we release the evaluation code. If you want to do it early, you can try something like below (I do not have time to carefully check it now so please be careful):

R_pytorch3d[:, :, :2] *= -1 
T_pytorch3d[:, :2] *= -1
R_colmap = R_pytorch3d.transpose(-2, -1)      # swap the last two dimensions, i.e., 3x3
T_colmap = T_pytorch3d

Ground truth pose is not necessary. We just use it to show how to compute the error metric.

Actually the model coordinate in NDC was not "compressed". For a given set of images, people cannot get its real "scale". So different methods just use some normalised units.

corresponding_cameras_alignment is necessary if you want to compare two sets of cameras but don't know the scale of the scene. Generally, it estimates a single similarity transformation between two sets of cameras cameras_src and cameras_tgt and returns an aligned version of cameras_src. In other words, this function will try to align a set of cameras to the target one by predicting a global rotation, a global translation, and a scale.

hdzmtsssw · 2023-08-02T06:45:45Z

@jytime Thanks for your fast reply!
I applied the transformation you mentioned to the pred_cameras, and it looks good on the sample scene(apple). However, it didn't produce good results on my own dataset, which consists of 30 forward-facing images(3840 × 2160). Does your method not support forward-facing scenes?

I plan to use your method to replace Colmap in obtaining poses for NeRF training. Do I need to provide the ground truth pose from Colmap to obtain the correct scale and aligned version? If so, which coordinate system should the ground truth pose follow(PyTorch3D NDC coordinate?) and which transform function should it use(the same code you provided?)?

Lastly, could you please let me know when the script for NeRF training will be released?

Figure 1. apple - transformed

Figure2. custom data - transformed

Figure 3. custom data - predict_cameras output(0-29) and colmap(20-59)

jytime · 2023-08-02T13:39:56Z

Hi @hdzmtsssw ,

The result looks a bit weird. Forward-facing images should be okay. Did you use GGS for this result? Or can you share some of the images?

hdzmtsssw · 2023-08-04T03:04:01Z

Hi, @jytime
I used the default configuration, so I use GGS for this result.

Unfortunately, the dataset is confidential, so I cannot share the images. The camera array was fan-shaped and captured four people standing in front of a green screen, with varying intrinsics between the cameras.

In another scene captured using the same inward-facing camera array as before, the transformed predict_cameras appear to be outward-facing.

I also tried the LLFF forward-facing dataset, such as the fern scene, which appears to be correct(?), but it seems that it is not aligned. I attempted to align it, but the result does not match with Colmap.

Fig 1. custom data2 - transformed

Fig 2. fern - transformed

Fig 3. fern - colmap

jytime · 2023-08-04T14:11:58Z

Hi @hdzmtsssw ,

I guess the inward-facing and outward-facing problem comes from the coordinate transform, e.g., "transpose the rotation matrix or not" or "coordinate xyz direction ". I would suggest to check if the coordinate transform code works well.

Regarding "not aligned", how did you conduct camera alignment? I am a bit confused. For example, we can see the scales of Fig 2 and Fig3 are quite different.

There may be multiple solutions for alignment, e.g., (1) use "corresponding_cameras_alignment" (2) a simple and fast way force the first camera in each camera set to be the origin, and use the second camera to compute the alignment matrix. The second solution may be quite inaccurate but can give you a quick insight on how they look like. But please be aware that the input and target cameras must stay in the same coordinate system.

By the way, if you use the "ground truth" camera poses from LLFF dataset, please note that LLFF has its own coordinate system.

If you could provide a minimal, reproducible example of the code on LLFF, I'd be happy to take a look.

hdzmtsssw · 2023-08-08T09:28:51Z

Hi @jytime , thanks for your reply.

Here is the example code.
ref:https://github.com/Fyusion/LLFF/blob/c6e27b1ee59cb18f054ccb0f87a90214dbe70482/llff/poses/pose_utils.py#L51C29-L51C34

# transform.py  :get gt_cameras.npy(left, up, forward) from NeRF poses(right, up, backward)
pose = np.load('cams_meta.npy')
bottom = np.tile(np.array([0., 0., 0., 1.]).reshape(1, 1, -1), (pose.shape[0], 1, 1))
K = pose[:, 12:21].reshape(-1, 3, 3)
focal = np.concatenate([K[:, 0:1, 0], K[:, 1:2, 1]], -1)
c2w = pose[:, :12].reshape(-1, 3, 4) # (x, y, z): (right, up, backward)
c2w_new = np.concatenate([-c2w[:, :, 0:1], c2w[:, :, 1:2], -c2w[:, :, 2:3], c2w[:, :, 3:]], 2) # transform to (-x, y, -z): (left, up, forward), which LLFF used
c2w_new = np.concatenate([c2w_new, bottom], -2)
w2c = np.linalg.inv(c2w_new)
T = w2c[:, :3, 3]
R = np.transpose(w2c[:, :3, :3], (0, 2, 1)) # right mul to left mul?
np.savez('gt_cameras.npz', gtR=R, gtT=T, gtFL=focal)

# demo.py    :get pred_cameras w2c(I think so)
R = pred_cameras.R.cpu().numpy()
T = pred_cameras.T.unsqueeze(-1).cpu().numpy()
poses = np.concatenate([R, T], axis=-1)
np.save(os.path.join(folder_path, "pred_cameras.npy"), poses)

# then use corresponding_cameras_alignment to align

Fern: For samples/apple: the absolute rotation error is 13.783582 degrees.

For visualization, transform to c2w:

pose = np.load('pred_cameras.npy"')
R = np.transpose(pose[:, :3, :3], (0, 2, 1)) # left mul to right mul?
pose[:, :3, :3] = R
bottom = np.tile(np.array([0., 0., 0., 1.]).reshape(1, 1, -1), (pose.shape[0], 1, 1))
w2c = np.concatenate([pose, bottom], -2)
c2w = np.linalg.inv(c2w_new)
c2w_new = np.concatenate([-c2w[:, :, 0:1], c2w[:, :, 1:2], -c2w[:, :, 2:3], c2w[:, :, 3:]], 2) # transform to (x, y, z)
...visualization...

jytime · 2023-08-08T14:47:13Z

Hi @hdzmtsssw ,

After a quick glance, I am a bit confused about the source of 'cams_meta.npy'. Does this contain the cameras from COLMAP, LLFF, or any other source? Based on the comment,
c2w = pose[:, :12].reshape(-1, 3, 4) # (x, y, z): (right, up, backward)
it looks you assume it is "right, up, backward".

However, for COLMAP, it uses "right, down, forwards". For LLFF, it uses "down, right, backwards". You can find the corresponding information as discussed in LLFF.

Would this be a source of the problem?

hdzmtsssw · 2023-08-08T15:48:17Z

Hi, @jytime ,
The coordinate system in 'cams_meta.npy' is the Vanilla NeRF coordinate system ("OpenGL coordinate system"). It is ok and can be well transformed from COLMAP or LLFF.
ref: https://github.com/bmild/nerf#already-have-poses

jytime · 2023-08-09T16:07:17Z

Hi @hdzmtsssw ,

It looks the main problem is that the motion for fern is very small, so we need alignment with higher accuracy. You can try on your own code like:

    pred_cameras_aligned = corresponding_cameras_alignment(
        cameras_src=pred_cameras,
        cameras_tgt=colmap_cameras,
        estimate_scale=True,
        mode= "extrinsics",
        eps= 1e-9,
    )

Or you can use my code to reproduce the visualisation above, which uses visdom.

Please note that in any case, the alignment cannot be perfect. So the visualisation can only give you a sense of how the structure is.

jytime closed this as completed Aug 21, 2023

jytime mentioned this issue Sep 4, 2023

how to convert the pose in the real world to ndc coordinate? #11

Closed

jytime mentioned this issue Oct 20, 2023

RealEstate10k annotation #19

Closed

wooni-github mentioned this issue Dec 15, 2023

conversion to colmap (nerf dataset) #27

Closed

jytime mentioned this issue Dec 19, 2023

RT export questions #30

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about the coordinate system #9

about the coordinate system #9

hdzmtsssw commented Aug 1, 2023

jytime commented Aug 1, 2023 •

edited

Loading

hdzmtsssw commented Aug 2, 2023

jytime commented Aug 2, 2023

hdzmtsssw commented Aug 4, 2023

jytime commented Aug 4, 2023 •

edited

Loading

hdzmtsssw commented Aug 8, 2023

jytime commented Aug 8, 2023

hdzmtsssw commented Aug 8, 2023 •

edited

Loading

jytime commented Aug 9, 2023 •

edited

Loading

about the coordinate system #9

about the coordinate system #9

Comments

hdzmtsssw commented Aug 1, 2023

jytime commented Aug 1, 2023 • edited Loading

hdzmtsssw commented Aug 2, 2023

jytime commented Aug 2, 2023

hdzmtsssw commented Aug 4, 2023

jytime commented Aug 4, 2023 • edited Loading

hdzmtsssw commented Aug 8, 2023

jytime commented Aug 8, 2023

hdzmtsssw commented Aug 8, 2023 • edited Loading

jytime commented Aug 9, 2023 • edited Loading

jytime commented Aug 1, 2023 •

edited

Loading

jytime commented Aug 4, 2023 •

edited

Loading

hdzmtsssw commented Aug 8, 2023 •

edited

Loading

jytime commented Aug 9, 2023 •

edited

Loading