python src/analyze.py path/to/directory "Describe the image"

CogVLM2 Autocaptioning Tools

Welcome to this CogVLM2 Autocaptioning Tools repository! This project sets up tools for autocaptioning using the state-of-the-art CogVLM2.

✅ Chat Mode ✅ Caption Mode ✅ FastAPI Application

Introduction

CogVLM2 is an Open Source VLM that rivals near GPT4V performance. This repository aims to set up the necessary environment and some tools to leverage the power of the CogVLM2 model. The model was created and released by The Knowledge Engineering Group (KEG) & Data Mining (THUDM) at Tsinghua University: https://huggingface.co/THUDM.

Setup

(TESTED ON UBUNTU 22.04 | CUDA 12.1 | Torch 2.3.0+cu121 w/ Xformers)
For windows, lmk. I'll make a pull request to actually test. but should work fine.

Follow the steps below to set up the project:

Option 1: Using Setup Scripts

Linux/Mac:

Download and Run the Shell Script:

wget https://raw.githubusercontent.com/C0nsumption/Consume-CogVLM2/main/setup/setup.sh
chmod +x setup.sh
./setup.sh

Windows:

Download and Run the Batch Script:

 curl -o setup.bat https://raw.githubusercontent.com/C0nsumption/Consume-CogVLM2/main/setup/setup.bat
 setup.bat

Option 2: Manual Installation

Manual Installation

Clone this Repo and Navigate to the Project Directory:

git clone https://github.com/C0nsumption/Consume-CogVLM2.git
cd Consume-CogVLM2

Set Up a Virtual Environment:

python -m venv venv
source venv/bin/activate  # For Linux/Mac
venv\Scripts\activate  # For Windows

Initialize with Git LFS (make sure to have installed. Ask ChatGPT.):
```
git lfs install
```

Clone the Model Repository:

git clone https://huggingface.co/THUDM/cogvlm2-llama3-chat-19B-int4

Install Dependencies:

pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu121

pip install -r requirements.txt

Run Tests:
```
python test/test.py
```

Usage

After setting up the environment, you can start using the CogVLM2 autocaptioning tools. Detailed usage instructions and examples can be found in the Usage Guide.

Contributing

I welcome contributions from the community! If you'd like to contribute, please fork the repository and submit a pull request. For major changes, please open an issue first to discuss what you would like to change.

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Feel free to reach out if you have any questions or need further assistance! But give me time, very busy:
ａｃｃｅｌｅｒａｔｉｎｇ 🫡

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

CogVLM2 Autocaptioning Tools

Table of Contents

Introduction

Setup

Option 1: Using Setup Scripts

Linux/Mac:

Windows:

Option 2: Manual Installation

Usage

Contributing

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

CogVLM2 Autocaptioning Tools

Table of Contents

Introduction

Setup

Option 1: Using Setup Scripts

Linux/Mac:

Windows:

Option 2: Manual Installation

Usage

Contributing

License