CT-CHAT

Welcome to the official repository for CT-CHAT, a cutting-edge visual-language chat model designed specifically for 3D chest CT volumes. CT-CHAT provides an open-source codebase and pre-trained models, utilizing CT-CLIP and a VQA (Visual Question Answering) dataset adapted from CT-RATE, making it accessible to researchers worldwide. The VQA dataset and model weights are available via the HuggingFace repository.

System Requirements

Before you get started, ensure that your environment meets the following requirements:

Python version: > 3.12.4
Necessary dependencies: Install CT-CLIP’s dependencies by following the instructions in the CT-CLIP repository.
Additional libraries: Ensure that the following libraries are installed:
- PyTorch v2.4.0
- CUDA v12.4
- SciPy v1.14.0
- Torchvision v0.19.0
- Scikit-learn v1.2.2
- Pandas v2.2.2
- Transformers v4.44.0
- NumPy v1.26.4

Hardware Requirements

For training:
- Small models: Minimum of 2 A100 GPUs with 80GB VRAM.
- Large models (80B Llama 3.1): Minimum of 4 A100 GPUs.
For inference:
- Large models: At least 2 A100 GPUs.
- Smaller models: 1 A100 GPU.

Training

To train the model, follow the provided scripts. It's crucial to run the training data through the image encoder to generate embeddings prior to training. Use the provided Encoder Script as a reference for encoding a single image. Note that this differs from the latent-saving process in CT-CLIP; the outputs must be saved before latent projection. Update the training scripts with the correct path to the saved encodings and other necessary configurations.

Inference

For inference, refer to the serve scripts. To perform CLI-based inference, the validation data must first be encoded similarly to the training data. After encoding, adjust the required paths in the CT-CHAT validation scripts for CLI inference. After calculating latent embeddings, inference with 4 A100 GPUs is expected to be 5-10 tokens/s for Llama 70B, for Llama 8B model, it is expected to be 10-20 tokens/s in 2 A100 GPUs.

For GUI-based inference, run the following commands:

python -m llava.serve.controller --host 0.0.0.0 --port 10000
python -m llava.serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload
python -m llava.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path "path_to_model" --model-base "path_to_model"

Pretrained Models

We offer pre-trained models for several LLMs, trained on the VQA dataset described in our paper. You can download them from the links below:

CT-CHAT (Llama 3.1 70B): Download Here
CT-CHAT (Llama 3.1 8B): Download Here
CT-CHAT (Vicuna): Download Here
CT-CHAT (Mistral): Download Here

VQA Dataset

The VQA dataset has been derived from the CT-RATE data using the Llama 3.1 80B model with the scripts provided here. Short-answer questions have been sampled from the RadGenome Chest CT dataset. The dataset is available in the CT-RATE HuggingFace repository.

Citing Us

If you use CT-CHAT, CT-CLIP, or our CT-RATE dataset in your research, please cite our paper.

License

We are committed to fostering innovation and collaboration in the research community. To this end, all elements of CT-RATE, CT-CLIP, and CT-CHAT are released under a Creative Commons Attribution (CC-BY-NC-SA) license. This licensing framework ensures that our contributions can be freely used for non-commercial research purposes, while also encouraging contributions and modifications, provided that the original work is properly cited and any derivative works are shared under similar terms.

Acknowledgements

We would like to express our sincere gratitude to the following works, whose contributions were invaluable to our research. Our VQA dataset includes a subset of data from RadGenome Chest CT. Additionally, our CT-CHAT model is a 3D adaptation of the LLaVA model for CT volumes. CT-CHAT leverages CT-ViT architecture as the vision encoder which is introduced as part of GenerateCT. We are deeply appreciative of these researchers for their outstanding open-source contributions. If you use our VQA data or CT-CHAT model in your work, we kindly ask that you also cite the related works to acknowledge their impact.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
VQA_dataset		VQA_dataset
evaluations		evaluations
figures		figures
llava		llava
video		video
LLaMA3.1-V_finetune_lora_ctchat.sh		LLaMA3.1-V_finetune_lora_ctchat.sh
LLaMA3.1-V_pretrain_ctchat.sh		LLaMA3.1-V_pretrain_ctchat.sh
README.md		README.md
finetune_mistral.sh		finetune_mistral.sh
finetune_vicuna.sh		finetune_vicuna.sh
pretrain_mistral.sh		pretrain_mistral.sh
pretrain_vicuna.sh		pretrain_vicuna.sh
zero3.json		zero3.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CT-CHAT

System Requirements

Hardware Requirements

Training

Inference

Pretrained Models

VQA Dataset

Citing Us

License

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

ibrahimethemhamamci/CT-CHAT

Folders and files

Latest commit

History

Repository files navigation

CT-CHAT

System Requirements

Hardware Requirements

Training

Inference

Pretrained Models

VQA Dataset

Citing Us

License

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages