You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is taking a while to run, I'll probably check the results sometime during the weekend.
The initial results of this approach are fairly poor. I think the reason for this is that many of the RefCOCO text prompts involve spatial relations like "the man to the left of the ...". CLIP does not have the ability to contextualize local regions within an image.
Hello, I also utilize the clip model to classify the masks from SAM. However, I find the performance is poor. Increasing the image size of the clip model may improve the recognition accuracy of each mask.
No description provided.
The text was updated successfully, but these errors were encountered: