-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor finetuning results #9
Comments
Hi, @wj-son I would like to ask some questions for more details: |
Both single gpu and multi gpus configuration raised the poor finetuning performance. All the codes are the same except for the dataset path. In my case, I pretrained the model with shuffled residual frames as view_2 and then finetuned the model with residual frames. |
Then I want to ask,
If all mentioned things are normal, then maybe we need to discuss for further solutions. |
1-result (xmar) prml@ai02:~/wj/source_code/IIC$ python retrieve_clips.py --ckpt=./ckpt/intraneg_shuffle_r3d_res_0211/ckpt_epoch_240.pth --dataset=ucf101 --merge=True This result shows somewhat different performance comparing the paper. |
Oh, that is weired. For the finetuning part, I think on the validation as well as the training set, if the accuracy is high, it should not be only 6% on the testing set. If you use the test mode of For the video retrieval part, the performance is also low, which is also strange. I have tried different random seeds but my results are around reported one. 23.8% is about 10% lower than normal performance. If everything remains the same with the code, the data, as well as training settings, I could not imagine the reason why you met poor performance. I am running one experiment with |
Hi, @wj-son I trained the model again recently with
which are higher than that reported in our paper. And for fine-tuning, on UCF101 dataset, the result is I still do not know why the performance in your case is worse. |
Sorry for replying your meaningful and laborious works.
Train: [240/240][590/596]8 (53.5BT 0.809 (0.816) DT 0.000 (0.004) loss 1.284 (1.066) 1_p 54.761 (53.873) 2_p 54.624 (53.557)
{'cl': 16, 'model': 'r3d', 'id': 'r3d', 'dataset': 'ucf101', 'feature_dir': 'features/ucf101/r3d', 'gpu': 0, 'ckpt': 'ckpt/intraneg_repeat_r3d_res_0216/ckpt_epoch_240.pth', 'bs': 8, 'workers': 8, 'extract': True, 'modality': 'res', 'merge': True}
Train epoch: [150/150][ 548/ 547] Loss 4.0295 (0.1402) Acc 0.000 (0.974) lr: 0.001 Thank you for your help.. Sorry to bother you |
This time, your reported results are better than previous version. However, the performance is still very low. Because you mentioned that you used the same code as the repo, I would like to ask
If 1 and 2 are fine, then 3. What kind of experimental environment do you use? Here is the detailed settings in my case. If possible, you can try my settings.
|
Train epoch: [150/150][ 548/ 547] Loss 4.6004 (0.0370) Acc 0.000 (0.996) lr: 0.001 |
That is wired. You mentioned
Because I have fixed all possible random seeds, the performance should be the same in the same experimental environment. Therefore, if the same means exactly the same results, then the model is not correctly loaded. So please check the model loading part again carefully. To eliminate the possibility of ssl pre-trained models, you can also try to load my provided model weights for finetuning.
|
I do the same thing(#2) for loading pretrained checkpoint, but the finetuning training result was somewhat different: At the beginning of the training, the loss was decreased a little bit slowly vice versa but the loss and accuracy was the almost same as before. Train epoch: [150/150][ 548/ 547] Loss 4.0295 (0.1402) Acc 0.000 (0.974) lr: 0.001 In addition to above, I just wonder suddenly what epoch of the pretrained(SSL) weights you have been using. |
This is wired. Last time I ran your code, I got the same results as the paper. However, this time I'm getting the same poor results. I've tried to roll back your code to a few commits before. Still not working. The retrieval results are as below. All default settings. |
@wj-son Hi, sorry to reply late. According to the logs, it seems good for training & validation. Have you tried to use training dataset for testing to check whether the poor performance was caused by the data? If overfitting, it should not be that poor on testing dataset compared to training dataset. |
@wuchlei Thank you for your report. If you do not change the code and all settings are the same, it is really strange to have such different results. For retrieval, 27.6%@top1 is not that bad. And it would be caused by using RGB as the retrieval modality. Anyway, I will run the experiment again to check and I will report my newest results here later. |
@BestJuly I used RGB and Res for retrieval (the default settings). So I think this is not the reason. |
So I've confirmed the problem is caused by pytorch version. Everything is fine with pytorch 1.3. |
Hi, @wuchlei Could you provide the pytorch version where the performance is poor? Then I can text in my experimental environment and also I can add some information in the readme.md to mention about this. Thank you in advance. |
@BestJuly 1.7.0 |
@BestJuly 1.7.0 same version in my environment
Regarding to this finding, I am trying to set up the environment(CUDA, CuDNN and so on) version with docker and reimplement. |
Hi @BestJuly, have you found the problem? |
Hi, @wuchlei @wj-son I used the same code and trained with torch 1.3 and torch 1.7.
The results are similar as what you have found. I want to use Therefore, for the data and testing part, I think there is no problem. And the problem lies to the training part. I used By far, I did not explore deeper differences. Therefore, for current findings, I will still keep using Four scratch training experiments (torch1.3 vs torch1.7, SGD & adam) for video recognition. The results are shown in that table:
* Note that the learning rate is the same (initial=0.001, multistepLR=[40,80]) but we do not explore the best performance for the training settings. |
I should have fixed random seeds using codes and I also tested Another finding is that I used
to generate input data manually. However, a wired thing is that for both pytorch 1.3 and 1.7, the input data are
However, after several epochs, the performance starts to be different. I guess there are some differences deeply in the |
Thanks for your great work.
I finetuned the pretrained model on UCF101 train split1, but evaluation results show about 6.5% accuracy.
I think that is caused by multi gpus and the procedure of loading checkpoints. But, despite of the change, the result was same.
I only change the original code about dataset path, model wrapped with torch.nn.DataParallel().
The text was updated successfully, but these errors were encountered: