Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Results on Breakfast Dataset #5

Closed
giulio93 opened this issue Sep 27, 2019 · 10 comments
Closed

Results on Breakfast Dataset #5

giulio93 opened this issue Sep 27, 2019 · 10 comments

Comments

@giulio93
Copy link

giulio93 commented Sep 27, 2019

Hi, thank you so much to share this useful project!
I have run the code just like that, and i earned this results:
image

I have a couple of question:

  1. How many epochs did you use to train the RNN and CNN models?
  2. I have tried to use CNN model with 20 epochs but results were poor, so i upgrade it up to 300 epochs, was the same for you?
  3. The models are trained just with GroundTruth, and GT or Decoded output data for prediction, did you tried to train the models on Decode output directly?

I tried also to use weakly supervise Decoded output coming from the new Viterbi Decoder Paper . So i filled the two folder obs0.2 and obs0.3 with the new decoded data percentage respectively.
I earned better results on prediction with decode output data, does it make sense for you?
Even if i did not train the decoder as fully supervised mode but instead in weakly supervised mode.

image

@yabufarha
Copy link
Owner

Hi,
Happy that you found the code useful.

1-2. We trained the CNN model for 20 epochs, but the batch size was 64. The default values should work well for the RNN model.

  1. We trained the model only with ground-truth labels. Nevertheless, I don't expect much difference if you train with the decoded outputs since these outputs are at around 100% accuracy for the training set.

Regarding the results with nn-viterbi, I think you should look at the accuracy of the observed part (only the decoded output). If the accuracy of the observations is higher for nn-viterbi then I would expect better results for the anticipation. Otherwise, the accuracy should be lower.

@giulio93
Copy link
Author

Thanks! batch_size=64 works well!

@RomeroBarata
Copy link

Hi Yazan,

Sorry for reopening this issue, but I'm having a similar problem in reproducing the results for the RNN model. I trained the RNN models using the default values in the source code, as you recommend above, but my performance for the 20% observation is roughly 5% below the results reported and for the 30% observation is something around 2% below. Is there anything I might be doing wrong or missing while training the RNN model?

Kind regards,
Romero

@yabufarha
Copy link
Owner

Hi Romero,

Kindly note that the results in the paper are the average over multiple splits. If you don't have the data for all the splits, you can find them in the following link:
https://uni-bonn.sciebo.de/s/XQb7bnhnflSJdAE
The default parameters should be enough to reproduce the results on Breakfast with the RNN (+/- noise). For the CNN and 50salads, you might need to change some of the hyper-parameters as described in the paper.

I hope this would help.

Best,
Yazan

@RomeroBarata
Copy link

Hi Yazan,

I'm aware that the results are the average over the four splits on the Breakfast dataset, so I trained four models (train on data from split 02, 03, and 04, and evaluate on split 01, ...) and averaged their results. However, my average results are as I mentioned above, and I've trained all the RNN models with the default parameters. The experiments I'm doing are using the ground-truth data as input to the model, both during training and during evaluation.

Kind regards,
Romero

@yabufarha
Copy link
Owner

The default parameters work well for my case. It seems that @giulio93 got comparable results (except for the 20% observations and 10% prediction). I'm not sure what's the problem in your case. Can you maybe check the convergence on the training set and see if you need to train more?

@giulio93
Copy link
Author

giulio93 commented Nov 21, 2019

Hi,
I proceeded as follow:
I've trained and tested a model for each split. Then i've taken the mean over all 4 splits.
These are my results:
image

If you do not achieve similar results, please consider more than one predict run on test set,since in the RNN training procedure, random cuts are taken between actions in order to create training examples.

@RomeroBarata this is not clear to me:
"(train on data from split 02, 03, and 04, and evaluate on split 01, ...)" why are you training on split 2.3. 4 and evaluate in 1?
You need to train on each split and train for each split so (1 to 1, 2 to 2 , 3 to 3, 4 to 4).

I'm colletting experiments in a forked repository of this project, feel free to explore:
https://github.com/giulio93/anticipating-activities

@RomeroBarata
Copy link

Hi guys,

No worries, I'll check everything again. Thank you for all the clarifications!

@giulio93 , when you evaluate any machine learning model you should never train and test on the same data. Anything you see during training your model learns well (sometimes too well -> overfitting) and if you test on the same data you are going to get a very optimistic result that is not actually true in practice (when you deploy the model on a truly unseen test set). Thus, the correct way of training and testing is to train models on three splits of the data and test on the remaining one.
Thanks for the link, I'll have a look at your fork!

Kind regards,
Romero

@giulio93
Copy link
Author

giulio93 commented Nov 22, 2019

@RomeroBarata yeah man, i suppose to know the difference between training and evalutation... You need to read carefully the paper. Inside the dataset there are four splits, each split contains a TRAIN SET and a TEST SET.
Also known as K-fold cross validation.
Download the data from the link that Yazan post here: https://uni-bonn.sciebo.de/s/XQb7bnhnflSJdAE.
Inside you will find two folders for each dataset, than inside each folder there are files named:
Train.SplitX.boundle and Test.SplitX.boundle. So you train a model for each split w.r.t Train and Test file. Train and test file in the same split contains different video.

In Breakfast splits are made as follow:
image

In Salad splits are made as follow:
image

Hope this help!

@RomeroBarata
Copy link

Sorry @giulio93, I didn't mean to lecture about k-fold cross validation, I was just trying to clarify the previous misunderstanding. As you mentioned, even though the splits provided by the author are named split0X.train and split0X.test, they point to the correct files. Anyway, I'll check everything again and rerun the experiments. Thank you guys for the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants