Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Preprocessing Human3.6M + Adaptation for different skeleton #3

Open
StevRamos opened this issue Jan 11, 2022 · 8 comments
Open
Labels
question Further information is requested

Comments

@StevRamos
Copy link

How did you preprocess the Human3.6M dataset? I would like to replicate npy and pkl files that you provide. Do you have a code of these? Thanks in advance!

@DegardinBruno
Copy link
Owner

Hi @StevRamos, thanks for your interest in Kinetic-GAN!
For consistency purposes, we obtained the same data as previous methods. Authors of SA-GCN (“Structure-Aware Human-ActionGeneration”) provided us their data obtained by the other methods also. Their GitHub: https://github.com/PingYu-iris/SA-GCN

We just rearranged it to be easier to use!
Let me know if you have any further question.

@StevRamos
Copy link
Author

Thanks for the prompt response! I will review it.

I would like to use your model to generate new videos of sign language (for data augmentation purposes). The problem is the dataset I have is a set of videos. I recently learned a little bit of GNN so as I understand each node has features. It would be really good if you can tell me if it is possible to get (replicate) these features in the nodes of each video in my dataset (sign language videos) or if I need to have other tools to make it possible, and what these features represent.

You did an amazing work! Thanks for making the code public!

@DegardinBruno
Copy link
Owner

DegardinBruno commented Jan 11, 2022

Thank you very much! Btw, the content/shape of each dataset is N x C x T x V (x M), where N is the number of samples, C the number of coordinates, T temporal instances (frames) and V the number of joints. M is usually 1 if there is a fifth dimension.

Yes absolutely, great idea, you can even make your own conditional model with Kinetic-GAN to generate specific words and letters, you just need to extract it's 2D or 3D hand pose estimation first!
After that, you will need to define/change it's adjacency matrix (V x V matrix, where V is the number of joints in the hand, where connected joints have 1 otherwise 0) by changing the connected joints in the data (check graph_ntu.py file).
Then, you define/change the upsampling and downsampling path (check also graph_ntu.py file). There are some comments there where you can visualize the upsampling paths just by testing that code!

@StevRamos
Copy link
Author

Thanks you very much @DegardinBruno . That helps me a lot! So the information I need are the coordinates for each joint (in each timestep). I will get into the code. I think it is promissing!

Just to clarify, I have some questions.

  1. Should all the frames in the video have the same number of joints?
  2. What do you mean by local and global movement?
  3. What the dimension resolution level L (paper) means? (I think you refer in this issue as M)

Again, thanks in advance!

@DegardinBruno
Copy link
Owner

DegardinBruno commented Jan 12, 2022

  1. Should all the frames in the video have the same number of joints?

Yes, at this point, Kinetic-GAN only supports a fixed number of joints through all frames.

  1. What do you mean by local and global movement?

Check our video at 0:27s. In local movement, the skeleton is normalized to a root joint, and on the other hand, global movement describes the skeleton moving freely without constraints.

  1. What the dimension resolution level L (paper) means? (I think you refer in this issue as M)

As you can see in figure 4 (paper), we define our upsampling path with four levels where level 1 is a single point from the latent space and level 4 is the complete skeleton from the respective dataset.

M represents something different! In NTU RGB+D sometimes they have 2 skeletons in each data sample, that's where M come from. However, Kinetic-GAN still does not support action interaction between two samples.

@DegardinBruno DegardinBruno changed the title Data Preprocessing - Human3.6M Data Preprocessing Human3.6M + Adaptation for different skeleton Jan 12, 2022
@DegardinBruno DegardinBruno added the question Further information is requested label Jan 13, 2022
@StevRamos
Copy link
Author

Hi @DegardinBruno, I was using your model as I told you months ago. It worked! but now I would like to use it with other graph structure. When I tried this time, I got an error. Basically, it is because of the assertion (assert len(self.center) == self.lvls). That's why I want to know what is the notion behind the algorithm shown in

for _ in range(self.lvls-1):
stay = []
start = 1
while True:
remove = []
for i in G:
if len(G.edges(i)) == start and i not in stay:
lost = []
for j,k in G.edges(i):
stay.append(k)
lost.append(k)
recon = [(l,m) for l in lost for m in lost if l!=m]
G.add_edges_from(recon)
remove.append(i)
if start>10: break # Remove as maximum as possible
G.remove_nodes_from(remove)
cycle = nx.cycle_basis(G) # Check if there is a cycle in order to downsample it
if len(cycle)>0:
if len(cycle[0])==len(G):
last = [x for x in G if x not in stay]
G.remove_nodes_from(last)
start+=1
map_i = np.array([[i, x] for i,x in enumerate(G)]) # Keep track graph indices
self.map.append(map_i)
mapping = {} # Change mapping labels
for i, x in enumerate(G):
mapping[int(x)] = i
if int(x)==self.center[-1]:
self.center.append(i)
. If you could explain me the idea with pseudo-code, I would appreciate it so much. Thanks in advance!

Stev

@DegardinBruno
Copy link
Owner

Hey @StevRamos, great!!

It would be best if you changed the neighbor_base with the connections of your skeleton structure.
Uncomment the lines before the assertions to visualise your graph levels!

If you could explain me the idea with pseudo-code, I would appreciate it so much.

We are basically removing edges, letting at least one parent in the graph for the next level because you can't just remove edges since it will become inconsistent.

@hendrikTpl
Copy link

  1. Should all the frames in the video have the same number of joints?

Yes, at this point, Kinetic-GAN only supports a fixed number of joints through all frames.

  1. What do you mean by local and global movement?

Check our video at 0:27s. In local movement, the skeleton is normalized to a root joint, and on the other hand, global movement describes the skeleton moving freely without constraints.

  1. What the dimension resolution level L (paper) means? (I think you refer in this issue as M)

As you can see in figure 4 (paper), we define our upsampling path with four levels where level 1 is a single point from the latent space and level 4 is the complete skeleton from the respective dataset.

M represents something different! In NTU RGB+D sometimes they have 2 skeletons in each data sample, that's where M come from. However, Kinetic-GAN still does not support action interaction between two samples.

Hi @DegardinBruno, thanks for providing this code, btw I am working on to Human interaction generation, as you said it is not supported yet for interaction, would you please guide me and provide some note to make this possible? Recently I was working HIR (recognition only) now, I want to use your code and model to generate skeleton data (data augmentation) for small dataset. It would be great and much appreciate your help. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants