Results on Breakfast Dataset #5

giulio93 · 2019-09-27T11:33:31Z

Hi, thank you so much to share this useful project!
I have run the code just like that, and i earned this results:

I have a couple of question:

How many epochs did you use to train the RNN and CNN models?
I have tried to use CNN model with 20 epochs but results were poor, so i upgrade it up to 300 epochs, was the same for you?
The models are trained just with GroundTruth, and GT or Decoded output data for prediction, did you tried to train the models on Decode output directly?

I tried also to use weakly supervise Decoded output coming from the new Viterbi Decoder Paper . So i filled the two folder obs0.2 and obs0.3 with the new decoded data percentage respectively.
I earned better results on prediction with decode output data, does it make sense for you?
Even if i did not train the decoder as fully supervised mode but instead in weakly supervised mode.

yabufarha · 2019-09-27T12:42:04Z

Hi,
Happy that you found the code useful.

1-2. We trained the CNN model for 20 epochs, but the batch size was 64. The default values should work well for the RNN model.

We trained the model only with ground-truth labels. Nevertheless, I don't expect much difference if you train with the decoded outputs since these outputs are at around 100% accuracy for the training set.

Regarding the results with nn-viterbi, I think you should look at the accuracy of the observed part (only the decoded output). If the accuracy of the observations is higher for nn-viterbi then I would expect better results for the anticipation. Otherwise, the accuracy should be lower.

giulio93 · 2019-09-27T14:30:44Z

Thanks! batch_size=64 works well!

RomeroBarata · 2019-11-19T21:47:23Z

Hi Yazan,

Sorry for reopening this issue, but I'm having a similar problem in reproducing the results for the RNN model. I trained the RNN models using the default values in the source code, as you recommend above, but my performance for the 20% observation is roughly 5% below the results reported and for the 30% observation is something around 2% below. Is there anything I might be doing wrong or missing while training the RNN model?

Kind regards,
Romero

yabufarha · 2019-11-20T09:24:42Z

Hi Romero,

Kindly note that the results in the paper are the average over multiple splits. If you don't have the data for all the splits, you can find them in the following link:
https://uni-bonn.sciebo.de/s/XQb7bnhnflSJdAE
The default parameters should be enough to reproduce the results on Breakfast with the RNN (+/- noise). For the CNN and 50salads, you might need to change some of the hyper-parameters as described in the paper.

I hope this would help.

Best,
Yazan

RomeroBarata · 2019-11-20T22:05:35Z

Hi Yazan,

I'm aware that the results are the average over the four splits on the Breakfast dataset, so I trained four models (train on data from split 02, 03, and 04, and evaluate on split 01, ...) and averaged their results. However, my average results are as I mentioned above, and I've trained all the RNN models with the default parameters. The experiments I'm doing are using the ground-truth data as input to the model, both during training and during evaluation.

Kind regards,
Romero

yabufarha · 2019-11-21T11:04:04Z

The default parameters work well for my case. It seems that @giulio93 got comparable results (except for the 20% observations and 10% prediction). I'm not sure what's the problem in your case. Can you maybe check the convergence on the training set and see if you need to train more?

giulio93 · 2019-11-21T11:50:25Z

Hi,
I proceeded as follow:
I've trained and tested a model for each split. Then i've taken the mean over all 4 splits.
These are my results:

If you do not achieve similar results, please consider more than one predict run on test set,since in the RNN training procedure, random cuts are taken between actions in order to create training examples.

@RomeroBarata this is not clear to me:
"(train on data from split 02, 03, and 04, and evaluate on split 01, ...)" why are you training on split 2.3. 4 and evaluate in 1?
You need to train on each split and train for each split so (1 to 1, 2 to 2 , 3 to 3, 4 to 4).

I'm colletting experiments in a forked repository of this project, feel free to explore:
https://github.com/giulio93/anticipating-activities

RomeroBarata · 2019-11-21T21:39:47Z

Hi guys,

No worries, I'll check everything again. Thank you for all the clarifications!

@giulio93 , when you evaluate any machine learning model you should never train and test on the same data. Anything you see during training your model learns well (sometimes too well -> overfitting) and if you test on the same data you are going to get a very optimistic result that is not actually true in practice (when you deploy the model on a truly unseen test set). Thus, the correct way of training and testing is to train models on three splits of the data and test on the remaining one.
Thanks for the link, I'll have a look at your fork!

Kind regards,
Romero

giulio93 · 2019-11-22T07:28:45Z

@RomeroBarata yeah man, i suppose to know the difference between training and evalutation... You need to read carefully the paper. Inside the dataset there are four splits, each split contains a TRAIN SET and a TEST SET.
Also known as K-fold cross validation.
Download the data from the link that Yazan post here: https://uni-bonn.sciebo.de/s/XQb7bnhnflSJdAE.
Inside you will find two folders for each dataset, than inside each folder there are files named:
Train.SplitX.boundle and Test.SplitX.boundle. So you train a model for each split w.r.t Train and Test file. Train and test file in the same split contains different video.

In Breakfast splits are made as follow:

In Salad splits are made as follow:

Hope this help!

RomeroBarata · 2019-11-22T23:49:16Z

Sorry @giulio93, I didn't mean to lecture about k-fold cross validation, I was just trying to clarify the previous misunderstanding. As you mentioned, even though the splits provided by the author are named split0X.train and split0X.test, they point to the correct files. Anyway, I'll check everything again and rerun the experiments. Thank you guys for the help!

giulio93 closed this as completed Sep 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Results on Breakfast Dataset #5

Results on Breakfast Dataset #5

giulio93 commented Sep 27, 2019 •

edited

Loading

yabufarha commented Sep 27, 2019

giulio93 commented Sep 27, 2019

RomeroBarata commented Nov 19, 2019

yabufarha commented Nov 20, 2019

RomeroBarata commented Nov 20, 2019

yabufarha commented Nov 21, 2019

giulio93 commented Nov 21, 2019 •

edited

Loading

RomeroBarata commented Nov 21, 2019

giulio93 commented Nov 22, 2019 •

edited

Loading

RomeroBarata commented Nov 22, 2019

Results on Breakfast Dataset #5

Results on Breakfast Dataset #5

Comments

giulio93 commented Sep 27, 2019 • edited Loading

yabufarha commented Sep 27, 2019

giulio93 commented Sep 27, 2019

RomeroBarata commented Nov 19, 2019

yabufarha commented Nov 20, 2019

RomeroBarata commented Nov 20, 2019

yabufarha commented Nov 21, 2019

giulio93 commented Nov 21, 2019 • edited Loading

RomeroBarata commented Nov 21, 2019

giulio93 commented Nov 22, 2019 • edited Loading

RomeroBarata commented Nov 22, 2019

giulio93 commented Sep 27, 2019 •

edited

Loading

giulio93 commented Nov 21, 2019 •

edited

Loading

giulio93 commented Nov 22, 2019 •

edited

Loading