This is a Streamlit application designed to process financial documents like salary slips, profit and loss statements, and transaction histories. The app uses OCR (Optical Character Recognition) technologies to extract text, analyze data, and generate insightful visualizations.
- Extract text using PaddleOCR and EasyOCR.
- Visualize financial data through Bar Charts and Pie Charts.
- Generate answers to user queries using the Together API.
- Support for processing multiple financial documents simultaneously.
- Comparative analysis of financial data.
- Python 3.8 or higher
- AWS credentials for accessing the S3 bucket
- Together API key for AI-based query processing
- Clone this repository:
git clone https://github.com/AdwaitSalankar/OCR-of-Bank-Statements.git
- Navigate to the project directory:
cd OCR-of-Bank-Statements
- Install dependencies:
pip install -r requirements.txt
- Run the Streamlit app:
streamlit run app.py
- Use the sidebar to select OCR engine, document type, and visualization type.
- Enter the number of images to process and provide a query to analyze the extracted data.
- Visualize and analyze the results in the app.
- app.py: Main application code.
- fonts/: Contains font files used by PaddleOCR.
- requirements.txt: Python dependencies.
Ensure your AWS credentials are properly configured to access the S3 bucket. Replace the Together API key in the code with your own key. Font files required for PaddleOCR should be placed in the fonts folder.