image-text pre-trianed #7

Tzx11 · 2024-03-11T09:23:57Z

nice work！I have two question.
When I finish reading this paper，i think the prior consists of a image encoder and a text encoder，so the image-text pre-trianed weights just contain image encoder and a text encoder weights？
and How do I load image-text pre-trianed weights into models for other medical downstream tasks.

QtacierP · 2024-03-11T09:56:31Z

In this repository, I have made available two sets of pre-trained weights to facilitate further research and application development. The first set consists of pure vision-encoder weights, grounded in the ResNet50 architecture. This is a standard architecture of ResNet50 as found in the TorchVision Library, with the exception of the last fully connected (FC) layer. These weights can be freely accessed and downloaded from this link. Leveraging these weights allows for straightforward fine-tuning on various downstream tasks.

The second set of weights is designed for text-image joint representation, and is hosted on Google Drive. Typically, such weights find their application in zero-shot tasks. To support this, I have included scripts and code within the repository for zero-shot classification, enabling users to implement this advanced functionality with ease.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

image-text pre-trianed #7

image-text pre-trianed #7

Tzx11 commented Mar 11, 2024

QtacierP commented Mar 11, 2024

image-text pre-trianed #7

image-text pre-trianed #7

Comments

Tzx11 commented Mar 11, 2024

QtacierP commented Mar 11, 2024