This project implements close-to-real-time image captioning system and two types of adversarial attacks: object concealment and object replacement. It utilizes various open-source AI models, including YOLO-world for object detection, Segment Anything Model 2 (SAM 2) for image segmentation, Stable Diffusion for image generation, and LLaVA for image captioning.
- Close-to-real-time Image Captioning: Captures images and provides captions describing the scene.
- Speech Integration: Includes speech-to-text (Whisper) and text-to-speech (Piper TTS) for audio interaction.
- Object Concealment Attack: Strategically adds noise to target objects, concealing them from the captioning model.
- Object Replacement Attack: Substitutes target objects with fabricated alternatives, generating inaccurate captions.
- Python 3.7 or higher
- CUDA-enabled GPU (recommended)
- Picovoice Account (for wake word detection)
-
Clone the repository:
git clone https://github.com/your-username/Hey-Connect.git cd Hey-Connect
-
Install dependencies:
pip install -r requirements.txt
-
Set up Picovoice:
- Create a free account on Picovoice.
- Follow the Porcupine Web quick start guide (https://picovoice.ai/docs/quick-start/porcupine-web/) to obtain your Access Key
- Place the required files in the
static
folder of the project. - Replace
${ACCESS_KEY}
inquery-based_interaction_use_case.js
with your actual Access Key. - Train two wake word models, one for "start audio" and another for "stop audio". download them as base64 and replace the placeholders in
query-based_interaction_use_case.js
.
-
Run the Flask API:
python app.py
-
The API will be accessible at
https://localhost:4000
(using SSL).
-
Open either
passive_description_use_case.html
orquery-based_interaction_use_case.html
in a web browser.- Passive Description: Provides continuous captions of the video feed.
- Query-Based Interaction: Allows users to ask questions about the scene using the wake words.
-
Run the attack scripts from the command line:
-
Object Concealment:
python object_concealment_attack.py
- Enter the image path, target object class name, and noise density (0.0-1.0) as prompted.
-
Object Replacement:
python object_replacement_attack.py
- Enter the image path, target object class name, and a text prompt for the replacement object.
-
-
The modified images will be saved as "modified_image.png".
Object Concealment:
python object_concealment_attack.py
Enter the path to the image: image.jpg
Enter the target object class name: person
Enter the noise density (0.0 to 1.0): 0.7
Object Replacement:
python object_replacement_attack.py
Enter the path to the image: image.jpg
Enter the target object class name: dog
Enter the prompt for the replacement object: cat
- Ensure that your webcam and microphone are properly configured and accessible.
- The attack scripts require a Good GPU with CUDA-enabled for optimal performance.
- You can experiment with different prompts for the object replacement attack to achieve various results.
This project utilizes the following open-source libraries and models:
- YOLO-World
- Ultralytics
- Segment Anything Model 2 (SAM 2)
- Stable Diffusion
- LLaVA
- Whisper
- Piper TTS
- Picovoice