Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarks #10

Open
ruotianluo opened this issue Aug 4, 2017 · 71 comments
Open

Benchmarks #10

ruotianluo opened this issue Aug 4, 2017 · 71 comments

Comments

@ruotianluo
Copy link
Owner

ruotianluo commented Aug 4, 2017

Cross entropy loss (Cider score on validation set without beam search; 25epochs):
fc 0.92
att2in 0.95
att2in2 0.99
topdown 1.01

(self critical training is in https://github.com/ruotianluo/self-critical.pytorch)
Self-critical training. (Self critical after 25epochs; Suggestion: don't start self critical too late):
att2in 1.12
topdown 1.12

Test split (beam size 5):
cross entropy:
topdown: 1.07

self-critical:
topdown:
Bleu_1: 0.779 Bleu_2: 0.615 Bleu_3: 0.467 Bleu_4: 0.347 METEOR: 0.269 ROUGE_L: 0.561 CIDEr: 1.143
att2in2:
Bleu_1: 0.777 Bleu_2: 0.613 Bleu_3: 0.465 Bleu_4: 0.347 METEOR: 0.267 ROUGE_L: 0.560 CIDEr: 1.156

@SJTUzhanglj
Copy link

is there any code or options, to show how to train any of these models (topdown, etc) with self-critical algorithm? @ruotianluo

@ruotianluo
Copy link
Owner Author

It's in my another repository

@miracle24
Copy link

Did you fine-tune the CNN when trained the model with cross entropy loss?

@ruotianluo
Copy link
Owner Author

No.

@miracle24
Copy link

Wow. It's unbelievable. I can not achieve that high score without fine-tune when train my own captioning model under cross entropy loss. Most papers I have read will fine-tune the CNN when train the model with cross entropy loss. Is there any tips when train the model with cross entropy?

@ruotianluo
Copy link
Owner Author

Finetuning is actually worse. It's about how to extract the features, check the self critical sequence training paper.

@miracle24
Copy link

I think they means they did not do finetuning when trained the model under RL loss, while they did not mention whether they finetune the CNN when train the model under cross entropy loss.

@miracle24
Copy link

I finetnue the CNN under cross entropy loss as neuraltalk2 (Lua version) and I got cider 0.91 on validation set without beamsearch. Then I train the self-critical model without finetuning based on the best pretrained model and I finally got cider almost close result compared with self-critical paper.

@ruotianluo
Copy link
Owner Author

They didn't fine-tune in both phase. And finetuning may not work as well under attention based model.

@miracle24
Copy link

I did not train the attention based model. But I will try. Thank you and your codes. I will start learning pytorch with you code.

@ahkarami
Copy link

ahkarami commented Oct 7, 2017

Dear @ruotianluo,
Thank you for your fantastic code. Would you please tell me all of your used parameters for run the train.py code? (In fact, I used your code, as the guidance in the ReadMe file, but when I have used and tested the trained model, I got same result (i.e., same captions) for all of my different test images). It is worth noting that, I have used --language_eval 0, and maybe this wrong parameter caused these obtained results, am I correct?

@ruotianluo
Copy link
Owner Author

Can you try downloading the pertrained model and evaluate on your test images. It helps me to narrow down the problem.

@ahkarami
Copy link

ahkarami commented Oct 7, 2017

Yes, I can download the pre-trained models and use them. The results from pre-Trained models were appropriate and nice; However, the obtained results from my Trained models were same for all of the images. It seems something wrong with my used parameters for training and the trained model produced same caption for all of given images.

@ruotianluo
Copy link
Owner Author

You should be able to reproduce my result following my instructions, it's really weird.
Anyway which options are not clear to me (most of the options are explained in the opts.py)?

@ahkarami
Copy link

ahkarami commented Oct 8, 2017

Thank you very much for your help. The problem has been solved. In fact, I have trained your code on another Synthetic data set, and as a result the error has been occurred. However, when I used your code on MS-COCO data set, the training process hasn't any problem.
Just as another question, would you please kindly tell me the appropriate value of parameters for training? I mean the appropriate values for parameters such as beam_size, rnn_size, num_layers, rnn_type, learning_rate, learning_rate_decay_every, and scheduled_sampling_start.

@ruotianluo
Copy link
Owner Author

ruotianluo commented Oct 8, 2017

@ahkarami is the previous problem related to my code?
I think it varies from dataset to dataset. Beam size could be 5. The numbers I set are the same as in the readme.

@ahkarami
Copy link

ahkarami commented Oct 8, 2017

Dear @ruotianluo,
No, the previous problem related to my data set, and your code is correct. In fact, in my data set the repetitious words are many. Moreover, the length of sentences vary from ~15 up to 90 words. I have changed the parameters of the prepro_labels.py by --max_length = 50 & --word_count_threshold = 2 then after about 40 epochs, the produced results are not same for any given image; However the results were bad and not appropriate. I think still my parameters for training & pre-processing the labels are not appropriate.

@xyy19920105
Copy link

Hi @ruotianluo ,
Thank you for your code and benchmark, did you test the adaptive attention on your code?? Could you output the adaptive attention's result??
Thank you again.

@ruotianluo
Copy link
Owner Author

Actually no. I didn't spend much time on that model.

@xyy19920105
Copy link

Thanks for your reply.
Do you think that the adaptive attention model is not good enough as a baseline??

@ruotianluo
Copy link
Owner Author

It's good, just I couldn't get it work well.

@dmitriy-serdyuk
Copy link
Contributor

Could you clarify, which features are used for the results above? resnet152? And does fc stand for ShowTell?

@ruotianluo
Copy link
Owner Author

@dmitriy-serdyuk it's using res101. and FC stands for the FC model in self critical sequence training paper which can be regarded as a variant of showtell.

@ruotianluo ruotianluo mentioned this issue Jan 11, 2018
@chynphh
Copy link

chynphh commented Mar 1, 2018

Thank you for your fantastic code. I am a beginner, and it helped me a lot.
I have a question about the 'LSTMCore' class in the FCModel.py. Why don't you use the official LSTM model and train it by step, or the LSTMCell model and add a dropout layer on it? Is there any difference between your code and them?

@ruotianluo
Copy link
Owner Author

@chynphh
Copy link

chynphh commented Mar 1, 2018

OK, I got it. But why do you make this change? Is there any paper or any research about this?

@ruotianluo
Copy link
Owner Author

Self-critical Sequence Training for Image Captioning
https://arxiv.org/abs/1612.00563

@chynphh
Copy link

chynphh commented Mar 1, 2018

Thank you very much!

@YuanEZhou
Copy link

YuanEZhou commented Oct 24, 2018

opt.id = 'topdown'
opt.caption_model = 'topdown'
opt.rnn_size = 1000
opt.input_encoding_size = 1000

opt.batch_size = 100
Other configurations follow this repository.

Cross_entropy loss:
ce_wo_constrain

Cross_entropy+self-critical: slightly better than the result reported in original paper.
ce sc argmax

@jamiechoi1995
Copy link

opt.id = 'topdown'
opt.caption_model = 'topdown'
opt.rnn_size = 1000
opt.input_encoding_size = 1000

opt.batch_size = 100
Other configurations follow this repository.

Cross_entropy loss:
ce_wo_constrain

Cross_entropy+self-critical: slightly better than the result reported in original paper.
ce sc argmax

@YuanEZhou which feature did you use? the default resnet101 feature or the bottom up feature

@YuanEZhou
Copy link

bottom up feature

@jamiechoi1995
Copy link

bottom up feature

@YuanEZhou may I ask how did you use these features?
Because I have a similar question in this issue: ruotianluo/self-critical.pytorch#66

did you modify the code to incorporate bounding box information? Or just use the default options.

@YuanEZhou
Copy link

@jamiechoi1995 I use the default options.

@jamiechoi1995
Copy link

Adaptive Attention model
learning rate 1e-4
batch size 32
trained for 100 epochs
I use the code in self-critical repo

{'CIDEr': 1.0295328576254532, 'Bleu_4': 0.32367107232015596, 'Bleu_3': 0.4308636494026319, 'Bleu_2': 0.5710839754137301, 'Bleu_1': 0.7375622419883233, 'ROUGE_L': 0.5415854013591195, 'METEOR': 0.2603669044858015, 'SPICE': 0.193603187345227
47}

@fawazsammani
Copy link

fawazsammani commented Mar 11, 2019

@YuanEZhou can you please share the results.json file you got from the coco caption code which includes all the image ids with their predictions for the validation images? I urgently need it. Your help is highly appreciated

@YuanEZhou
Copy link

Hi @fawazsammani , I am sorry that I have lost the file.

@2033329616
Copy link

when I use the att2in2 pre-trained model to evaluate the coco dataset, the decoder is always output similar sentences, metrics are very bad why?
wrong_info4
wrong_info3

@fawazsammani
Copy link

fawazsammani commented Mar 15, 2019

@2033329616 maybe the mistake is in your images. Yesterday, i ran the att2in2 model on the COCO karpathy split validation images, you can run them in the coco caption and see the results, they are identical to the ones posted. (I've already pre-processed the file to include the image ids for evaluation purpose, so you may just run the coco caption code on it directly).
att2in2_results.zip
Regards

@YuanEZhou
Copy link

@2033329616 You need to download pretrained resnet model from the link in this project.

@2033329616
Copy link

@fawazsammani @YuanEZhou , Thanks for your reply, I download the "att2in2_results.zip" and run the coco metrics code, it gets a good result. I have already used the pretrained att2in2 mode in this project, and test it on the karpathy split test cocodataset, but I can't get the correct result, I notice the output sentences are same whatever I change the image or fc and att feature, I have no idea how to solve this problem?

@akashprakas
Copy link

is there a pretrained model in which the self attention was used?

@kakazl
Copy link

kakazl commented May 24, 2019

@fawazsammani @YuanEZhou , Thanks for your reply, I download the "att2in2_results.zip" and run the coco metrics code, it gets a good result. I have already used the pretrained att2in2 mode in this project, and test it on the karpathy split test cocodataset, but I can't get the correct result, I notice the output sentences are same whatever I change the image or fc and att feature, I have no idea how to solve this problem?

i meet the same problem. Are you solving the problem now?

@fawazsammani
Copy link

Hi @2033329616 and @kakazl . I'm not sure exactly what's the problem in your case. Maybe you used different settings? This is the command i run: pytorch-0.4:py2 "python eval.py --model '/data/att2in2/model-best.pth' --infos_path '/data/att2in2/infos_a2i2-best.pkl' --image_folder '/captiondata' --num_images -1 --beam_size 3 --dump_path 1"
Make sure you place all the images in the folder 'captiondata'. Or create a new folder and change the name in the command. Hope that helps

@sssilence
Copy link

Hi @2033329616 and @kakazl . I'm not sure exactly what's the problem in your case. Maybe you used different settings? This is the command i run: pytorch-0.4:py2 "python eval.py --model '/data/att2in2/model-best.pth' --infos_path '/data/att2in2/infos_a2i2-best.pkl' --image_folder '/captiondata' --num_images -1 --beam_size 3 --dump_path 1"
Make sure you place all the images in the folder 'captiondata'. Or create a new folder and change the name in the command. Hope that helps

Sorry,when I run:python eval.py --model 'self_cirtical/att2in2/model-best.pth' --infos_path 'self_cirtical/att2in2/infos_a2i2-best.pkl' --image_folder 'data/coco/images/val2014/' --num_images 10,
always occur the error:TypeError: 'int' object is not callable ,on AttModels line 165,batch_size = fc_feats.size(0)
I don't know why.Thank you!

@sssilence
Copy link

Hi @2033329616 and @kakazl . I'm not sure exactly what's the problem in your case. Maybe you used different settings? This is the command i run: pytorch-0.4:py2 "python eval.py --model '/data/att2in2/model-best.pth' --infos_path '/data/att2in2/infos_a2i2-best.pkl' --image_folder '/captiondata' --num_images -1 --beam_size 3 --dump_path 1"
Make sure you place all the images in the folder 'captiondata'. Or create a new folder and change the name in the command. Hope that helps

Sorry,when I run:python eval.py --model 'self_cirtical/att2in2/model-best.pth' --infos_path 'self_cirtical/att2in2/infos_a2i2-best.pkl' --image_folder 'data/coco/images/val2014/' --num_images 10,
always occur the error:TypeError: 'int' object is not callable ,on AttModels line 165,batch_size = fc_feats.size(0)
I don't know why.Thank you!

@fawazsammani

@fawazsammani
Copy link

@sssilence are you using Python 2 or 3? I just ran it again and it works. According to your error, your fc_feats is an integer. Are you sure to extracted the features correctly and didn't modify something in the code?

@sssilence
Copy link

@sssilence are you using Python 2 or 3? I just ran it again and it works. According to your error, your fc_feats is an integer. Are you sure to extracted the features correctly and didn't modify something in the code?

Yeah I used python2.I didn't modify anything in the code.And I used resnet101 extracting the features.Then I modify some code in rval_utils.py: tmp = [torch.from_numpy(_).cuda() if _ is not None else _ for _ in tmp],and I can run python rval.py but I can't run python train successfuly.
Besides, when I finished running eval.py,only these:
cp "data/coco/images/val2014/COCO_val2014_000000316715.jpg" vis/imgs/img40508.jpg
image 4: a group of traffic lights on a city street
cp "data/coco/images/val2014/COCO_val2014_000000278350.jpg" vis/imgs/img40509.jpg
image 5: a man standing in the water with a frisbee
cp "data/coco/images/val2014/COCO_val2014_000000557573.jpg" vis/imgs/img40510.jpg
image 6: a close up of a flower in a street
evaluating validation preformance... 5/40504 (0.000000)
loss: 0.0
there are nothing in eval_results and there are not any score.

@Sun-WeiZhen
Copy link

Dear @ruotianluo,
Thank you for your fantastic code. Would you please tell me with the following questions,thank you.I have downloaded the pretrained models as readme.
usage: eval.py [-h] --model MODEL [--cnn_model CNN_MODEL] --infos_path
INFOS_PATH [--batch_size BATCH_SIZE] [--num_images NUM_IMAGES]
[--language_eval LANGUAGE_EVAL] [--dump_images DUMP_IMAGES]
[--dump_json DUMP_JSON] [--dump_path DUMP_PATH]
[--sample_max SAMPLE_MAX] [--beam_size BEAM_SIZE]
[--temperature TEMPERATURE] [--image_folder IMAGE_FOLDER]
[--image_root IMAGE_ROOT] [--input_fc_dir INPUT_FC_DIR]
[--input_att_dir INPUT_ATT_DIR]
[--input_label_h5 INPUT_LABEL_H5] [--input_json INPUT_JSON]
[--split SPLIT] [--coco_json COCO_JSON] [--id ID]
eval.py: error: unrecognized arguments: python eval.py

@AnupKumarGupta
Copy link

Hi everyone. Thanks and kudos to this great repository. I am just a newbie and this repo has helped me a lot. I want to mimic the results of ShowAndTell, ShowAttendAndTell. I have provided the path to the model as mle/fc/model-best.pth but an exception is raised Exception: Caption model not supported: newfc.

I changed the name of caption_model to fc from new_fc, but yet again I encounter an error. Any help will be highly appreciated.

@dmitriy-serdyuk it's using res101. and FC stands for the FC model in self critical sequence training paper which can be regarded as a variant of showtell.

@Mollylulu
Copy link

image
Hello, I download the restnet101 folder and move model.pth & infos.pkl files into the layer where eval.py exists, then when I run the eval command as your direction, it just reports the error like the image showing. could you help me figure out where I make mistakes?

@ruotianluo
Copy link
Owner Author

@Willowlululu i guess you are using python3? This repo only support py2. Try selfcritical. Pytorch

@anuragrpatil
Copy link

Hi @ruotianluo, Thank you for the great repo! I was wondering is there a pretrained transformer model in the drive link?

@ruotianluo
Copy link
Owner Author

There is, check out self critical pytorch repo model zoo

@anuragrpatil
Copy link

@ruotianluo Thank you for the quick response! To check my understanding, the fc_nsc, fc_rl and att2in2 are from the self critical paper and the updown is the Anderson paper. Apologies if I am missing out anything here.

Screenshot 2020-04-18 at 1 27 34 PM

@ruotianluo
Copy link
Owner Author

@ydyrx-ldm
Copy link

@jamiechoi1995

Adaptive Attention model
learning rate 1e-4
batch size 32
trained for 100 epochs
I use the code in self-critical repo

{'CIDEr': 1.0295328576254532, 'Bleu_4': 0.32367107232015596, 'Bleu_3': 0.4308636494026319, 'Bleu_2': 0.5710839754137301, 'Bleu_1': 0.7375622419883233, 'ROUGE_L': 0.5415854013591195, 'METEOR': 0.2603669044858015, 'SPICE': 0.193603187345227
47}

Hi, I also want to use Adaptive Attention. What was your training command at that time? Waiting for your answer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests