Skip to content

mshojaei77/DataSpeakGPT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

DataSpeakGPT

Overview

DataSpeakGPT is an advanced and feature-rich text processing and Optical Character Recognition (OCR) suite powered by the cutting-edge GPT-3.5 Turbo language model. This comprehensive toolset consists of two robust scripts, FileReaderGPT.py and OcrGPT.py, designed to handle a diverse range of file formats and significantly enhance text recognition accuracy.

Features

1. FileReaderGPT.py

File Format Support

  • CSV, JSON, PDF, Text: FileReaderGPT.py supports reading and processing files in these formats, providing a versatile solution for different data structures and content types.

GPT-3.5 Turbo Integration

  • Natural Language Processing: Leveraging the GPT-3.5 Turbo language model, the script offers sophisticated natural language processing capabilities for user interactions and content analysis.

PDF Extraction

  • PyPDF2 and pdfplumber Integration: Seamless integration with PyPDF2 and pdfplumber for efficient PDF text extraction, ensuring reliable and accurate processing of PDF documents.

Dynamic Chunking

  • Optimized Processing: For large text files, FileReaderGPT.py dynamically chunks the content, optimizing interactions with GPT-3.5 Turbo and enhancing performance.

2. OcrGPT.py

Optical Character Recognition (OCR)

  • EasyOCR Library: OcrGPT.py utilizes the EasyOCR library for accurate Optical Character Recognition from images, supporting multiple languages for enhanced versatility.

GPT-3.5 Turbo Text Enhancement

  • Grammar and Word Fixing: After OCR, GPT-3.5 Turbo is employed to fix grammar issues and improve the recognized text, ensuring the highest quality output.

Multilingual Support

  • Language Selection: OcrGPT.py supports OCR in multiple languages, providing flexibility for users working with diverse linguistic content.

Interactive User Experience

  • Real-time GPT-3.5 Turbo Responses: Users can interactively experience real-time responses from GPT-3.5 Turbo, providing an engaging and dynamic user experience.

Usage

  1. FileReaderGPT.py:

    • Run the script.
    • Enter the path of the file you want to process.
    • Experience intelligent file content analysis and receive improvement suggestions.
  2. OcrGPT.py:

    • Run the script.
    • Enter the path of the image you want to perform OCR on.
    • Witness accurate text extraction and GPT-3.5 Turbo-powered text enhancement.

Getting Started

  1. Installation:

    • Ensure Python and required dependencies are installed.
    • Clone the repository: git clone <repository-url>
    • Navigate to the project directory: cd DataSpeakGPT
    • Install necessary dependencies: pip install -r requirements.txt
  2. Examples:

    • FileReaderGPT.py: python FileReaderGPT.py
    • OcrGPT.py: python OcrGPT.py

Contributing

Contributions are welcome! Please follow the contribution guidelines.

License

This project is licensed under the MIT License.


DataSpeakGPT empowers users with advanced text processing and OCR capabilities, seamlessly integrating GPT-3.5 Turbo for unparalleled natural language understanding. Elevate your data transformation and refinement processes with this comprehensive suite.

Explore the limitless possibilities of DataSpeakGPT and transform your data into refined, polished information effortlessly.

About

Read files and images and retrieve data for LLM

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages