This is the public project of paper:"TongueSAM: An Universal Tongue Segmentation Model Based on SAM with Zero-Shot", this paper can be get:https://arxiv.org/abs/2308.06444.
Tongue segmentation serves as the primary step in automated TCM tongue diagnosis, which plays a significant role in the di- agnostic results. Currently, numerous deep learning based methods have achieved promising results. However, most of these methods exhibit mediocre performance on tongues different from the training set. To address this issue, this paper proposes a universal tongue segmentation model named TongueSAM based on SAM (Segment Anything Model). SAM is a large-scale pretrained interactive segmentation model known for its powerful zero-shot generalization capability. Applying SAM to tongue segmentation enables the segmentation of various types of tongue images with zero-shot. In this study, a Prompt Generator based on object detection is integrated into SAM to enable an end-to-end automated tongue segmentation method. Experiments demonstrate that TongueSAM achieves exceptional performance across various of tongue segmentation datasets, particularly under zero-shot. TongueSAM can be directly applied to other datasets without fine-tuning. As far as we know, this is the first application of large-scale pretrained model for tongue segmentation.
TongueSAM consists primarily of two components: SAM and the Prompt Generator. For a given tongue image, TongueSAM first utilizes the pretrained Image Encoder in SAM for encoding. Meanwhile, the Prompt Generator generates bounding box prompt based on the tongue image. Finally, the image embedding and prompts are jointly fed into the Mask Decoder to generate the segmentation result. The entire segmentation process is end-to-end and does not require any additional manual prompts. The following sections will introduce different components of TongueSAM.
In our experiments, we used 3 tongue image segmentation datasets, TongueSet1, TongueSet2(BioHit), TongueSet3. The TongueSet1 cannot be public at the moment due to privacy concerns. The TongueSet2 has already been made public. We are now releasing the TongueSet3 here.
TongueSet3 is a dataset we compiled by selecting 1000 tongue images from the website, and manually segmenting them using the Labelme tool. This dataset encompasses a wide range of tongue images from various sources, including those captured with mobile devices and non-standard angles. To our knowledge, this is the first publicly available tongue image segmentation dataset in a free environment. The original tongue images from the website vary in size. To ensure input consistency, we resized each tongue image to [400, 400] pixels. In the files we have made public, the "img" folder contains the original input tongue images, and the "gt" folder contains our manually annotated ground truth segmentations. It's important to note that the images in the "gt" folder may appear completely black, but in reality, pixels with a value of [1, 1, 1] represent the tongue region, while pixels with a value of [0, 0, 0] represent the background. Please be mindful of this distinction.
1.Zero-Shot Segmentation
The most crucial capability of TongueSAM lies in its Zero-Shot segmentation. To facilitate user adoption, we employed the three datasets mentioned in the paper for fine-tuning TongueSAM and openly released the pre-trained model. Users can perform tongue image segmentation directly using TongueSAM with just a few straightforward steps.
Download the pre-trained weights:TongueSAM
Put the tonguesam.pth
into the ./pretrained_model/
folder.
Place the tongue image files that need to be segmented into the ./data/test_in/
folder.
Run ./python.py
The segmented tongue images will be located in the ./data/test_out/
folder.
2.Fine-tune
If you wish to further fine-tune the model, please follow these steps:
To train the Prompt Generator based on YOLOX, please refer to the following guidelines:YOLOX
Replace the pre-trained model in the ./segment/yolox.pth
file with your trained model.
Run split.py
twice, and the path of src_folder
is your img_data and gt_data respectively.
Run pre_tongue.py
, img_path
and gt_path
for your processed folder paths, respectively. For other parameter Settings, refer to MedSAM.
Run ./train.py
,please refer to the following guidelines:MedSAM
The project is based on YOLOX and MedSAM, and we appreciate their contributions.
This project is licensed under the MIT LICENSE.