ui-screenshot-to-prompt is an AI-powered tool that analyzes UI images to generate detailed prompts for AI coders. It uses computer vision and natural language processing to break down UI components, analyze design patterns, and create comprehensive descriptions for reproducing the design. Very useful for Bolt.new, v0 and other upcoming SaaS.
output_with_audio.mp4
- Smart image splitting and component detection
- OCR for text extraction
- UI element classification (buttons, text fields, checkboxes, etc.)
- Individual component analysis
- Overall design pattern analysis
- Gradio web interface for easy usage
The tool offers two splitting modes for analyzing UI images:
- Easy Mode
- Grid-based splitting of the image
- Automatically determines optimal grid size based on image dimensions and aspect ratio (max 3x3)
- Creates detailed spatial annotations that describe component placement and hierarchy within the interface layout
- Advanced Mode (Experimental)
- Smart component detection using computer vision techniques
- Identifies UI elements like buttons, text fields, and checkboxes
- Includes visualization of detected components
- Uses configurable minimum dimensions for component detection
- Note: This mode is still experimental and may need improvements for optimal results
Each detected component is analyzed for:
- Component type classification
- Position and dimensions
- Confidence score for detection
- Location
The tool requires:
- OpenAI API
- Used for vision analysis; general analysis through GPT-4o and individual components through GPT-4o-mini
- Required for component and design analysis
- Anthropic/Openrouter API
- Used for creating detailed super prompts via Claude
- Recommended for most accurate results
- Python 3.10+
- Rust (unfortunately for tokenizers dependency)
- Poetry (for dependency management)
- Clone the repository:
git clone https://github.com/s-smits/ui-screenshot-to-prompt.git
cd ui-screenshot-to-prompt
- Install required system dependencies:
For Unix-based systems (macOS/Linux):
# macOS (using Homebrew)
brew install rust
# Linux (Ubuntu/Debian)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
- Install Poetry if you haven't already:
curl -sSL https://install.python-poetry.org | python3 -
- Install dependencies:
poetry install
- Set up environment variables:
- Rename the
.env.example
file to.env
:mv .env.example .env
- Open the
.env
file and replace the placeholder values with your actual API keys and URL:OPENAI_API_KEY=your_openai_api_key ANTHROPIC_API_KEY=your_anthropic_api_key # OR OPENROUTER_API_KEY=your_openrouter_api_key
- Activate the Poetry environment:
poetry shell
- Run the Gradio interface:
python src/ui-screenshot-to-prompt/main.py
-
Open the provided URL in your web browser to access the Gradio interface.
-
Upload an image of a UI design, and the tool will generate a detailed prompt for reproducing the design.
You can adjust various parameters in the config.py
file, such as:
- System prompts
- Vision analysis prompts
- Super prompt template
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI for GPT models
- OpenRouter for API access
- Gradio for the web interface
- Tesseract OCR for text extraction