A demonstration project showcasing the usage of PyZerox library for processing PDF documents using GPT-4 models.
- PDF document processing
- Integration with GPT-4 models
- Asynchronous operation
- Configurable page selection
- Custom system prompts support
- Python 3.7+
- OpenAI API key
- Internet connection for accessing remote PDF files
- Clone the repository:
git clone https://github.com/felipefontoura/pyzerox-demo.git
cd pyzerox-demo
- Install dependencies:
pip install -r requirements.txt
- Set up your OpenAI API key on
.env
file.
The main script demonstrates how to process a PDF file using PyZerox:
import asyncio
from pyzerox import zerox
async def main():
result = await zerox(
file_path="your_pdf_url",
model="gpt-4o-mini",
output_dir="./tmp"
)
return result
result = asyncio.run(main())
The following parameters can be configured:
file_path
: URL or local path to the PDF filemodel
: The GPT model to use (default: "gpt-4o-mini")output_dir
: Directory for output filescustom_system_prompt
: Optional custom system promptselect_pages
: Optional page selection (None for all pages)
- py-zerox==0.0.7
Contributions, issues, and feature requests are welcome! Feel free to check issues page.
This project is licensed under the MIT License.