-
Image Captioning: The application employs Salesforce's BLIP image captioning model, which has been trained on large-scale image-caption datasets. It generates accurate and contextually relevant captions for uploaded images, allowing users to understand the content of the image without relying solely on visual perception.
-
Text-to-Speech Synthesis: Microsoft's SpeechT5 model is employed for text-to-speech synthesis, converting the generated captions into natural-sounding speech. The SpeechT5 model incorporates advanced techniques for speech generation, producing high-quality and expressive speech output.
-
Multiple Input Options: Supports image upload from local devices and URL input for images hosted online, offering flexibility in image selection.
-
Real-time Processing: Performs image captioning and text-to-speech synthesis in real-time, delivering quick and responsive results.
-
User-friendly Interface: Built using the Streamlit framework, the application provides clear instructions, intuitive image upload options, and visually appealing visualizations for a seamless and accessible user experience.
Try the Image Captioning and Text-to-Speech application online by visiting the deployed Streamlit app:
Note: In case the application runs out of memory during usage, you should reboot the app to free up resources and ensure optimal performance.
To run the Image Captioning and Text-to-Speech application locally, follow these steps:
-
Clone the repository:
git clone https://github.com/your-username/your-repo.git
-
Install the required dependencies using pip:
pip install -r requirements.txt
-
Run the application:
streamlit run app.py
The application will be accessible in your web browser at http://localhost:8501.
Contributions, bug reports, and feature requests are welcome! If you encounter any issues or have suggestions for improvements, please open an issue on the GitHub repository. You can also reach out to the project maintainer, Alim Tleuliyev, at [email protected] for further assistance or inquiries.