Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the retrieval loss doesn't converge well #11

Open
qq283215389 opened this issue Feb 26, 2019 · 11 comments
Open

the retrieval loss doesn't converge well #11

qq283215389 opened this issue Feb 26, 2019 · 11 comments

Comments

@qq283215389
Copy link

Hello, luo
when I pretrain the VSEFCmodel, the vse_loss doesn't converge well , just around 51.2. is there some mistakes in my experiments, how about your vse_loss when you pretrain VSEFCmodel?

@ruotianluo
Copy link
Owner

thats very common in the first several epochs. Try training it a little bit longer. Or just restart the training.

@qq283215389
Copy link
Author

ok, thanks a lot, for another VSE model(VSEAttModel) and "pair loss" , whose result isn't shown in your paper "Discriminability objective for training descriptive captions" in CVPR 2018?

@ruotianluo
Copy link
Owner

Pair loss is worse and vseattmodel gives worse result too.

@qq283215389
Copy link
Author

thanks!if the retrieval model perform better(like the paper“Stacked Cross Attention for Image-Text Matching”),can we get a better result for captioning model?

@ruotianluo
Copy link
Owner

I think it's very likely.

@qq283215389
Copy link
Author

hello,luo
It's my result of pre-training retrieval model after i run “run_fc_con.sh”, there is still a difference with your result presented in your paper for the retrieval model.
Result:
Average i2t Recall: 53.9
Image to text: 29.9 59.2 72.6 4.0 19.6
Average t2i Recall: 42.3
Text to image: 20.6 46.5 59.8 7.0 40.8

@ruotianluo
Copy link
Owner

Did you download my pretrained model? Does it perform better and the same as what's reported in the paper?
https://drive.google.com/open?id=1oQ_O-O2KoSQv1xdBPKaIOGt-VW0gS-42
These are my training curves, to give you a hint.

@qq283215389
Copy link
Author

i might get the problem,i have used the size of 7x7 for coco fc features, i think u have used 14x14 for coco fc features?

@ruotianluo
Copy link
Owner

fc feature doest have spatial dimensions, it's a vector

@qq283215389
Copy link
Author

I found other paper use Karpathy'split for COCO, your paper use rama's split, whose test data are the same? why you can compare your result with the result in self-critical?

@ruotianluo
Copy link
Owner

the splits are different. The self critical one is my implementation on Rama's split. Using Rama split I'd because we need to compare ours to Rama's result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants