CogVLM2 Autocaptioning Tools

python src/analyze.py path/to/directory "Describe the image"

CogVLM2 Autocaptioning Tools

Welcome to this CogVLM2 Autocaptioning Tools repository! This project sets up tools for autocaptioning using the state-of-the-art CogVLM2.

✅ Chat Mode ✅ Caption Mode ✅ FastAPI Application

Introduction

CogVLM2 is an Open Source VLM that rivals near GPT4V performance. This repository aims to set up the necessary environment and some tools to leverage the power of the CogVLM2 model. The model was created and released by The Knowledge Engineering Group (KEG) & Data Mining (THUDM) at Tsinghua University: https://huggingface.co/THUDM.

Setup

(TESTED ON UBUNTU 22.04 | CUDA 12.1 | Torch 2.3.0+cu121 w/ Xformers)
For windows, lmk. I'll make a pull request to actually test. but should work fine.

Follow the steps below to set up the project:

Option 1: Using Setup Scripts

Linux/Mac:

Download and Run the Shell Script:

wget https://raw.githubusercontent.com/C0nsumption/Consume-CogVLM2/main/setup/setup.sh
chmod +x setup.sh
./setup.sh

Windows:

Download and Run the Batch Script:

 curl -o setup.bat https://raw.githubusercontent.com/C0nsumption/Consume-CogVLM2/main/setup/setup.bat
 setup.bat

Option 2: Manual Installation

Manual Installation

Clone this Repo and Navigate to the Project Directory:

git clone https://github.com/C0nsumption/Consume-CogVLM2.git
cd Consume-CogVLM2

Set Up a Virtual Environment:

python -m venv venv
source venv/bin/activate  # For Linux/Mac
venv\Scripts\activate  # For Windows

Initialize with Git LFS (make sure to have installed. Ask ChatGPT.):
```
git lfs install
```

Clone the Model Repository:

git clone https://huggingface.co/THUDM/cogvlm2-llama3-chat-19B-int4

Install Dependencies:

pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu121

pip install -r requirements.txt

Run Tests:
```
python test/test.py
```

Usage

After setting up the environment, you can start using the CogVLM2 autocaptioning tools. Detailed usage instructions and examples can be found in the Usage Guide.

Contributing

I welcome contributions from the community! If you'd like to contribute, please fork the repository and submit a pull request. For major changes, please open an issue first to discuss what you would like to change.

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Feel free to reach out if you have any questions or need further assistance! But give me time, very busy:
ａｃｃｅｌｅｒａｔｉｎｇ 🫡

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
dataset/test		dataset/test
docs		docs
setup		setup
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CogVLM2 Autocaptioning Tools

Table of Contents

Introduction

Setup

Option 1: Using Setup Scripts

Linux/Mac:

Windows:

Option 2: Manual Installation

Usage

Contributing

License

About

Releases

Packages

Languages

License

C0nsumption/Consume-CogVLM2

Folders and files

Latest commit

History

Repository files navigation

CogVLM2 Autocaptioning Tools

Table of Contents

Introduction

Setup

Option 1: Using Setup Scripts

Linux/Mac:

Windows:

Option 2: Manual Installation

Usage

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages