-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Results on Youtobe #3
Comments
The checkpoint in this repo is different from one for Youtube-VOS evaluation. However, the provided checkpoint should also give similar numbers with one reported in our paper with minor degradation (about 1-2 lower Overall score). For Youtube-VOS, there are some differences compared to DAVIS:
|
Great!It surprised me that using DAVIS videos for training will degrade the performance on Youtube-VOS. Thank you for sharing. I will retest it with your mentioned 1 and 2. Thanks. |
@seoungwugoh Can you please tell how to test the pretrained model on YouTube VOS ? I tried to use the YoutubeVOS dataset instead of DAVIS17, however, I seem to get empty masks as output. |
Getting an empty mask seems to be due to bugs in the code. |
@seoungwugoh In the case that Some objects start to appear in the middle of video then overwriting current mask with the new objects, will the overwritten mask include the old objects? |
@siyueyu Yes, we overwrite the pixels belongs to the new object. Other pixels remain the same. |
@seoungwugoh I could get the correct masks now as predictions, however, I keep getting out of memory error when I test it on YouTubeVOS. I am using all the validation frames instead of every 5 frames. The GPU I am using is GTX 1080. Did you recommend using any particular configurations for YouTubeVOS ? I even played with the mem_every parameter, but still getting out of memory issues. |
For YoutubeVOS, some videos are quite long (> 150 frames), it often cause OOM. GPU memories are mostly consumed by a large matrix inner-product when memory reading. We used V100 GPU which has 16GB memory and setting a larger mem_every parameter for some videos works well. To drastically reduce memory consumption. you can consider to use no intermediate memory frames (infinite mem_every). Another extreme solution will be convert that inner-product part to CPU if you afford additional computation time. |
@seoungwugoh Thanks for the suggestion. I ran it without any intermediate frame and could obtain the results. However, I see that it doesn't consider the masks of objects which start to appear after the first frame. I get no predictions for those objects. Looking at your suggestion above in this thread, you mention that "Some objects start to appear in the middle of video. In that case, we overwrite current mask with the new objects." I already modified the dataset.py. Is it already implemented in the uploaded code ? If not, can you point out where do we need to incorporate those changes ? Thank you. |
@seoungwugoh Also, to add to what I mentioned above, I get a score of 69.4 (compared to 78.4 in the paper) on YouTube validation set using the pre-trained model. Since, I used no intermediate memory frames, I guess by default it takes only the first and the previous frame. |
@seoungwugoh Hi, I'm trying to finetune your model. |
@sourabhswain Code in this repository does not contains functionality for evaluating Youtube-VOS. You should implement by yourself. But, It will not too difficult. To get a similar number with the paper, you should estimate masks for objects start to appear in the middle of video. @npmhung We turned off Batchnorm for both pre-training and main-training. In other words, we use mean and var learned from ImageNet. This can be simply done by setting model.eval() during training. |
@seoungwugoh Is it possible for you to also provide the checkpoint used for Youtube-VOS evaluation (I'm ok without the code)? Thanks a lot! |
@seoungwugoh I made the changes specific to YouTube-VOS and now I can get a score 74.17. It's still a bit off from the score mentioned in the paper (78.4). Could it be just due to the different pretrained model which you uploaded here ? Or do you use different hyperparameters for YouTube-VOS ? |
@sourabhswain It would be due to different weights. The number in paper (78.4) is measured using weights for Youtube-VOS. Unfortunately, we have no plan to upload weights for Youtube-VOS testing. |
Hi @seoungwugoh , you mentioned that when objects started to appear in the middle of the video, you overwrite the current mask. So only the prev mask was impacted, and the first frame mask remains unchanged. However, for the objects that appear later, they cannot refer to the first frame for the GT mask (since the "first frame" for them is not the first frame of the video). Can this hurt the performance, or do you have any workaround for this? Thank you! |
Hi, I test your released code and model on Youtobe, but I can get the accuracy reported in the paper. Did you test this code on Youtobe?
The text was updated successfully, but these errors were encountered: