Add pdf file to parse in /ignore folder then run:
docker build -t ciprs-reader .
docker run --rm -v /$(pwd):/usr/src/app ciprs-reader python ciprs-reader.py ignore/cypress-example.pdf
Example output:
[
{
"General": {
"County": "DURHAM",
"File No": "00GR000000"
},
"Case Information": {
"Case Status": "DISPOSED",
"Offense Date": "2018-01-01T20:00:00"
},
"Defendant": {
"Date of Birth/Estimated Age": "1990-01-01",
"Name": "DOE,JON,BOJACK",
"Race": "WHITE",
"Sex": "MALE"
},
"District Court Offense Information": [
{
"Records": [
{
"Action": "CHARGED",
"Description": "SPEEDING(70 mph in a 50 mph zone)",
"Severity": "TRAFFIC",
"Law": "20-141(J1)"
}
],
"Disposed On": "2010-01-01",
"Disposition Method": "DISMISSAL WITHOUT LEAVE BY DA"
}
],
"Superior Court Offense Information": [],
}
]
Pre-requisites:
Mac
brew cask install pdftotext
Ubuntu
sudo apt-get install -y poppler-utils
wget --no-check-certificate https://dl.xpdfreader.com/xpdf-tools-linux-4.04.tar.gz \
&& tar -xvf xpdf-tools-linux-4.04.tar.gz \
&& cp xpdf-tools-linux-4.04/bin64/pdftotext /usr/local/bin/pdftotext-4
Setup:
pip install -r requirements.txt
pip install -e .
Read CIPRS PDF:
python ciprs_reader.py ./cypress-example.pdf
Run Jupyter:
jupyter-lab
Run tests:
pytest --pylint
Code for Durham