Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
enhancement: Add timeout limit to document parsing job. DS4SD#270
Testing: (.venv) mario@Abhisheks-MacBook-Air docling % docling https://arxiv.org/pdf/2206.01062 --document-timeout=100.123 INFO:docling.document_converter:Going to convert document batch... Fetching 9 files: 100%|█████████████████████████████████████████████| 9/9 [00:00<00:00, 27513.66it/s] INFO:docling.pipeline.base_pipeline:Processing document 2206.01062v1.pdf INFO:docling.document_converter:Finished converting document 2206.01062v1.pdf in 23.67 sec. INFO:docling.cli.main:writing Markdown output to 2206.01062v1.md INFO:docling.cli.main:Processed 1 docs, of which 0 failed INFO:docling.cli.main:All documents were converted in 23.68 seconds. (.venv) mario@Abhisheks-MacBook-Air docling % docling https://arxiv.org/pdf/2206.01062 --document-timeout=5.4567 INFO:docling.document_converter:Going to convert document batch... Fetching 9 files: 100%|█████████████████████████████████████████████| 9/9 [00:00<00:00, 50805.84it/s] INFO:docling.pipeline.base_pipeline:Processing document 2206.01062v1.pdf WARNING:docling.pipeline.base_pipeline:Document processing time (6.477 seconds) exceeded the specified timeout of 5.457 seconds INFO:docling.document_converter:Finished converting document 2206.01062v1.pdf in 10.65 sec. WARNING:docling.cli.main:Document /var/folders/d7/dsfkllxs0xs8x2t4fcjknj4c0000gn/T/tmp9v8ng4n3/2206.01062v1.pdf failed to convert. INFO:docling.cli.main:Processed 1 docs, of which 1 failed INFO:docling.cli.main:All documents were converted in 10.65 seconds. (.venv) mario@Abhisheks-MacBook-Air docling % docling https://arxiv.org/pdf/2206.01062 INFO:docling.document_converter:Going to convert document batch... Fetching 9 files: 100%|█████████████████████████████████████████████| 9/9 [00:00<00:00, 85792.58it/s] INFO:docling.pipeline.base_pipeline:Processing document 2206.01062v1.pdf INFO:docling.document_converter:Finished converting document 2206.01062v1.pdf in 21.84 sec. INFO:docling.cli.main:writing Markdown output to 2206.01062v1.md INFO:docling.cli.main:Processed 1 docs, of which 0 failed INFO:docling.cli.main:All documents were converted in 21.85 seconds. (.venv) mario@Abhisheks-MacBook-Air docling % docling Usage: docling [OPTIONS] source ╭─ Arguments ───────────────────────────────────────────────────────────────────────────────────────╮ │ * input_sources source PDF files to convert. Can be local file / directory paths or URL. │ │ [default: None] │ │ [required] │ ╰───────────────────────────────────────────────────────────────────────────────────────────────────╯ ╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────╮ │ --from [docx|pptx|html|image|pd Specify input formats to │ │ f|asciidoc|md] convert from. Defaults to │ │ all formats. │ │ [default: None] │ │ --to [md|json|text|doctags] Specify output formats. │ │ Defaults to Markdown. │ │ [default: None] │ │ --ocr --no-ocr If enabled, the bitmap │ │ content will be processed │ │ using OCR. │ │ [default: ocr] │ │ --force-ocr --no-force-ocr Replace any existing text │ │ with OCR generated text │ │ over the full content. │ │ [default: no-force-ocr] │ │ --ocr-engine [easyocr|tesseract_cli|t The OCR engine to use. │ │ esseract] [default: easyocr] │ │ --pdf-backend [pypdfium2|dlparse_v1|dl The PDF backend to use. │ │ parse_v2] [default: dlparse_v1] │ │ --table-mode [fast|accurate] The mode to use in the │ │ table structure model. │ │ [default: fast] │ │ --artifacts-path PATH If provided, the location │ │ of the model artifacts. │ │ [default: None] │ │ --abort-on-error --no-abort-on-error If enabled, the bitmap │ │ content will be processed │ │ using OCR. │ │ [default: │ │ no-abort-on-error] │ │ --output PATH Output directory where │ │ results are saved. │ │ [default: .] │ │ --version Show version information. │ │ --document-timeout FLOAT The timeout for processing │ │ each document, in seconds. │ │ [default: None] │ │ --help Show this message and │ │ exit. │ ╰───────────────────────────────────────────────────────────────────────────────────────────────────╯
- Loading branch information