How to simply infer two image inputs #3

QiuJunning · 2024-05-14T06:55:30Z

Amazing work! Can you provide an example of simply inferring two images and visualizing it? Thanks!

MickeyLLG · 2024-05-14T12:07:33Z

Thanks for being interested in our work. As this work only provides 3D joint predictions without scale and position vector, it is hard to directly draw 2D joints onto an arbitrary given image.
If you need to inference with arbitrary dual-view input, please refer to the following steps:

Refer to L350-L354 of adapt_detnet_dual.py for the forward process. Here you may need to replace "clr" with your own input images.
Inference twice as you have two images, which means calling results = model(clr) twice as what we do in adapt_detnet_dual.py L397-L402.
The predicted 3D joints are stored in the variable results['xyz'], as shown in L353.
Align the predicted 3D joints with ground-truth 3D joints. You may need to know the positions of at least two joints for alignment , as this work does not contain predicting the 3D tranlation vector and scale of a hand. Please refer to L419.
Merge/average the predictions from dual views following L68-L74 of visualization.py.
Finally, you can project the 3D joints onto the image plane following L77-L95 of visualization.py. In this step, camera parameters are necessary for projection.

Feel free to leave a comment whenever you have any further concerns :)

QiuJunning · 2024-05-15T10:21:09Z

Thank you for your reply! I encountered the following questions during my attempt and need further answers from you:

I want to know what processing needs to be done on my input image, such as resize, normalization, etc. I am currently trying to find that resizing to (128, 128) can be passed to model.
I want to visualize the 3D joints. I resize to (128, 128) and the output xyz shape is torch.Size([1, 21, 3]). According to my understanding, except for the first one representing the wrist, the rest Are the key points from thumb to little finger in groups of 4, right? Is it necessary to consider the different situations of left and right hands here?
I'm reading further into the code, please forgive me for what may seem like a stupid question.

MickeyLLG · 2024-05-15T13:47:08Z

I think all you need to do is to crop the hand region to 128*128 and normalize it to 0-1, i.e, img = img / 255. as shown in L316-L330 of dataloader.
In this work, we flip left hands to right.

QiuJunning · 2024-05-28T04:11:46Z

thanks！！！

QiuJunning closed this as completed May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to simply infer two image inputs #3

How to simply infer two image inputs #3

QiuJunning commented May 14, 2024 •

edited

Loading

MickeyLLG commented May 14, 2024 •

edited

Loading

QiuJunning commented May 15, 2024

MickeyLLG commented May 15, 2024

QiuJunning commented May 28, 2024

How to simply infer two image inputs #3

How to simply infer two image inputs #3

Comments

QiuJunning commented May 14, 2024 • edited Loading

MickeyLLG commented May 14, 2024 • edited Loading

QiuJunning commented May 15, 2024

MickeyLLG commented May 15, 2024

QiuJunning commented May 28, 2024

QiuJunning commented May 14, 2024 •

edited

Loading

MickeyLLG commented May 14, 2024 •

edited

Loading