DataSpeakGPT is an advanced and feature-rich text processing and Optical Character Recognition (OCR) suite powered by the cutting-edge GPT-3.5 Turbo language model. This comprehensive toolset consists of two robust scripts, FileReaderGPT.py
and OcrGPT.py
, designed to handle a diverse range of file formats and significantly enhance text recognition accuracy.
- CSV, JSON, PDF, Text: FileReaderGPT.py supports reading and processing files in these formats, providing a versatile solution for different data structures and content types.
- Natural Language Processing: Leveraging the GPT-3.5 Turbo language model, the script offers sophisticated natural language processing capabilities for user interactions and content analysis.
- PyPDF2 and pdfplumber Integration: Seamless integration with PyPDF2 and pdfplumber for efficient PDF text extraction, ensuring reliable and accurate processing of PDF documents.
- Optimized Processing: For large text files, FileReaderGPT.py dynamically chunks the content, optimizing interactions with GPT-3.5 Turbo and enhancing performance.
- EasyOCR Library: OcrGPT.py utilizes the EasyOCR library for accurate Optical Character Recognition from images, supporting multiple languages for enhanced versatility.
- Grammar and Word Fixing: After OCR, GPT-3.5 Turbo is employed to fix grammar issues and improve the recognized text, ensuring the highest quality output.
- Language Selection: OcrGPT.py supports OCR in multiple languages, providing flexibility for users working with diverse linguistic content.
- Real-time GPT-3.5 Turbo Responses: Users can interactively experience real-time responses from GPT-3.5 Turbo, providing an engaging and dynamic user experience.
-
FileReaderGPT.py:
- Run the script.
- Enter the path of the file you want to process.
- Experience intelligent file content analysis and receive improvement suggestions.
-
OcrGPT.py:
- Run the script.
- Enter the path of the image you want to perform OCR on.
- Witness accurate text extraction and GPT-3.5 Turbo-powered text enhancement.
-
Installation:
- Ensure Python and required dependencies are installed.
- Clone the repository:
git clone <repository-url>
- Navigate to the project directory:
cd DataSpeakGPT
- Install necessary dependencies:
pip install -r requirements.txt
-
Examples:
- FileReaderGPT.py:
python FileReaderGPT.py
- OcrGPT.py:
python OcrGPT.py
- FileReaderGPT.py:
Contributions are welcome! Please follow the contribution guidelines.
This project is licensed under the MIT License.
DataSpeakGPT empowers users with advanced text processing and OCR capabilities, seamlessly integrating GPT-3.5 Turbo for unparalleled natural language understanding. Elevate your data transformation and refinement processes with this comprehensive suite.
Explore the limitless possibilities of DataSpeakGPT and transform your data into refined, polished information effortlessly.