Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to simply infer two image inputs #3

Closed
QiuJunning opened this issue May 14, 2024 · 4 comments
Closed

How to simply infer two image inputs #3

QiuJunning opened this issue May 14, 2024 · 4 comments

Comments

@QiuJunning
Copy link

QiuJunning commented May 14, 2024

Amazing work! Can you provide an example of simply inferring two images and visualizing it? Thanks!

@MickeyLLG
Copy link
Collaborator

MickeyLLG commented May 14, 2024

Thanks for being interested in our work. As this work only provides 3D joint predictions without scale and position vector, it is hard to directly draw 2D joints onto an arbitrary given image.
If you need to inference with arbitrary dual-view input, please refer to the following steps:

  1. Refer to L350-L354 of adapt_detnet_dual.py for the forward process. Here you may need to replace "clr" with your own input images.
  2. Inference twice as you have two images, which means calling results = model(clr) twice as what we do in adapt_detnet_dual.py L397-L402.
  3. The predicted 3D joints are stored in the variable results['xyz'], as shown in L353.
  4. Align the predicted 3D joints with ground-truth 3D joints. You may need to know the positions of at least two joints for alignment , as this work does not contain predicting the 3D tranlation vector and scale of a hand. Please refer to L419.
  5. Merge/average the predictions from dual views following L68-L74 of visualization.py.
  6. Finally, you can project the 3D joints onto the image plane following L77-L95 of visualization.py. In this step, camera parameters are necessary for projection.

Feel free to leave a comment whenever you have any further concerns :)

@QiuJunning
Copy link
Author

Thank you for your reply! I encountered the following questions during my attempt and need further answers from you:

  1. I want to know what processing needs to be done on my input image, such as resize, normalization, etc. I am currently trying to find that resizing to (128, 128) can be passed to model.
  2. I want to visualize the 3D joints. I resize to (128, 128) and the output xyz shape is torch.Size([1, 21, 3]). According to my understanding, except for the first one representing the wrist, the rest Are the key points from thumb to little finger in groups of 4, right? Is it necessary to consider the different situations of left and right hands here?
    I'm reading further into the code, please forgive me for what may seem like a stupid question.

@MickeyLLG
Copy link
Collaborator

  1. I think all you need to do is to crop the hand region to 128*128 and normalize it to 0-1, i.e, img = img / 255. as shown in L316-L330 of dataloader.
  2. In this work, we flip left hands to right.

@QiuJunning
Copy link
Author

thanks!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants