Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

3D trajectory in the wild #145

Closed
slava-smirnov opened this issue Jul 16, 2020 · 13 comments
Closed

3D trajectory in the wild #145

slava-smirnov opened this issue Jul 16, 2020 · 13 comments

Comments

@slava-smirnov
Copy link

slava-smirnov commented Jul 16, 2020

Again, thank you @dariopavllo and team for outstanding work! Thank you all for your amazing discussions to solve many community's issues.

Now it seems a lot of people figured out how to run it for the wild with decent efficiency.

The major practical bottleneck is that predicted skeleton is fixed by hip at the world center and thus skeleton doesn't move across the area which limits it’s practicality a lot.

Now in order to have your skeleton moved across the area you gotta have your own 3D trajectory model (as @dariopavllo points out many many times). And in order to train your own trajectory model to apply it for the hip you gotta have original h36m dataset. And you if don't have one (as original h36m owners doesn't seem to be responding for requests any more) you don't have that option.

So I was wondering if:

  • anyone solved 3D trajectories issue somehow

  • or have their own pretrained 3d trajectory model to share

  • or if original contributors are planning on adding pretrained 3D trajectories model in a near future.

That would benefit us all a lot! Keep up great work

@chiragpandav
Copy link

chiragpandav commented Jul 25, 2020

#147

I posted that issue!, do you want to do the same?

Somehow i manage to get "h36m.zip", but i do not know, how to generate 3D rendering with Moving Grid.

can you tell me, how exactly we can train the model so that it will generate "3D rendering with Moving Grid"?? Or
there is nothing to do with Training and my expected output? @dariopavllo

If you want that data, i can share link!!

@slava-smirnov
Copy link
Author

Hey @chiragp007! I'm not into training that model currently but generally speaking usually more data helps others to come up with solutions

@DA-fromindia
Copy link

DA-fromindia commented Jul 25, 2020

Hey @chiragp007! I'm not into training that model currently but generally speaking usually more data helps others to come up with solutions

#147 (as per this question)
@slava-smirnov means? if we want 3D projection with moving grid, do we have to train model?? Or we can do it without that?? Or it is not possible for custom video??

@chiragp007 @slava-smirnov i am having same issue like you guys are facing !!

@dariopavllo waiting for your thoughts on Issue No. 145 and 147

@dariopavllo
Copy link
Contributor

Hi all,

As mentioned in other comments, the pretrained models do not contain the trajectory model.

The trajectory model is only enabled when you train the semi-supervised model, but with some changes to the code it is also possible to add it to the normal training routine.

I'll try to look into this and see if I can find/retrain a pretrained model, but I can't make any promises.

A thing to keep in mind is that the trajectory estimation will be very unreliable if the subject is far from the camera or if the camera intrinsic parameters differ too much from those of Human3.6m. Another consideration is that the prediction is in camera space (not world space), so if the camera placement is not similar to that of Human3.6m, you might need to rotate the poses to align them to the floor.

@dariopavllo
Copy link
Contributor

Hi,

I added a pull request to add support for the trajectory model in the visualization script: #149
It also adds support for Detectron2 and simplifies the dataset setup instructions (since the preprocessed one is no longer available). This should be soon merged to master, but in the meantime you can use that one.

As for the actual pretrained models, I trained two 243-frame models (one symmetric, one causal) on Detectron poses, which include the trajectory model:
https://drive.google.com/file/d/1kJKDjdpFcg7cXr3x_hV3lYL0Tm3ImsFY/view?usp=sharing

This should be enough to get you started, but you should keep in mind the considerations in my previous post. Anyway, I tried it on a YouTube video and it works decently:
gif

@DA-fromindia
Copy link

DA-fromindia commented Aug 2, 2020

Hello @dariopavllo
Thanks for this, it looks so good
I have one question, can i apply these models for any video of Youtube ??
OR
it is related to only that video which we use while training??

@dariopavllo
Copy link
Contributor

It can be used on any video (the model is trained on Human3.6m with COCO keypoints), but it might work better on some videos than others.

@DA-fromindia
Copy link

DA-fromindia commented Aug 2, 2020

OK, Thanks for this!
@dariopavllo

@nathan60107
Copy link

nathan60107 commented Aug 5, 2020

@dariopavllo
I was look for setup for Detectron2. And I found this issue. Thank for the update.

I am trying to train 3D pose model from scratch, and I want to inference video from Youtube.
So my model have to accept 2D keypoint from Detectron2. I wonder what parameter or setup should I do.
That is, how to train a model using Human36M and coco metadata.
Or how can I change a 2D keypoint which is using coco metadata to h36m metadata.

Because prepare_data_2d_h36m always change data to h36m format.
Or should I just inference all video from Human3.6M using Detectron2?

@dariopavllo
Copy link
Contributor

@nathan60107 If you want to train a new model from scratch, using your own detections instead of the ones we provide (which have been obtained using Detectron 1), the answer is yes. You have to infer new detections by running Detectron on all Human3.6m videos. However, I don't really see why you would do that -- the pretrained model should work just fine for inference. You don't need to train a new model to infer poses from YouTube.

@slava-smirnov
Copy link
Author

slava-smirnov commented Aug 7, 2020

Hey @dariopavllo!

First and foremost thank you for providing traej model and updating run.py to support that. I can confirm non-causal one is running safely with no huge impact on performance. Tremendous job!

Got couple of questions buzzing my mind. Would increasing epochs for traej model provide an increase of it's quality? The thing I'm encountering is movement itself is decent but not that good as 3d keypoints detection. Can you possibly elaborate on that one please?

@nathan60107
Copy link

nathan60107 commented Aug 10, 2020

@nathan60107 If you want to train a new model from scratch, using your own detections instead of the ones we provide (which have been obtained using Detectron 1), the answer is yes. You have to infer new detections by running Detectron on all Human3.6m videos. However, I don't really see why you would do that -- the pretrained model should work just fine for inference. You don't need to train a new model to infer poses from YouTube.

@dariopavllo

Thank you for reply.
Because I prepare to predict dog pose. Before that I have to ensure I can get the same result as yours. Now I can use Detectron2 + my model to inference human video from YouTube.

I have to tell you there is a bug. If you train supervise(non-semi), and save the checkpoint. run.py will create dict key "model_traj" and give it value None. And when you going to inference you model, it see "model_traj" key and try to read its value, which is None so an error occur.

First, I wonder if there is a name for this paper's network, like Mask R-CNN, FPN, which is easy to let other know your are talking about this network.

One more question I want to ask. I wonder which format do you use in paper and now. You said you didn't train Mask R-CNN on COCO keypoints at here. So that mean you train Mask R-CNN on Human3.6M dataset, right? What format do you use in Human3.6M 2D kp? There are 2 possible options:

  1. You put video into Detectron1, and Detectron1 output 2D keypoint in h36m format, and you put 2D keypoint in h36m format into VideoPose3D, and get 3D pose in h36m.
  2. You put video into Detectron1, and Detectron1 output 2D keypoint in coco format, and you put 2D keypoint in coco format into VideoPose3D, and get 3D pose in h36m.

Which one is true in paper? And which one is true for the repository 2 year age? And which one is true for the repository you update about 10 day ago.

Because it seem that result in paper use coco 2D kp (see here).
But the pretrain VideoPose3D model (pretrained_h36m_cpn.bin) you give 2 year ago accept h36m 2D kp as input.
And the pretrain VideoPose3D model you give 10 day ago accept coco 2D kp as input.
So I am very confuse about the format, although I can inference YouTube video now. Can you explain at what time use which format and which network?
And if you use coco format how can you project 3D pose to 2D kp to get semi loss? There point format is totally different.

@JACKHAHA363
Copy link

JACKHAHA363 commented Feb 8, 2021

What would be the physical unit of the saved predictions? I think after line 767 in run.py the predictions being saved are actually in the euclidian space, so what would be the physical length of euclid(hip_right, knee_right) in the unit of meters? (I am using the default camera params)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants