Skip to content

Commit

Permalink
Added note on grobid concurrency configuration to README.
Browse files Browse the repository at this point in the history
  • Loading branch information
elshimone committed Dec 3, 2023
1 parent b0e5aa9 commit 36397dd
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,8 @@ necessary for PDF files.
- [GROBID install instructions](https://grobid.readthedocs.io/en/latest/Install-Grobid/)
- [GROBID start service](https://grobid.readthedocs.io/en/latest/Grobid-service/)

Note the concurrency setting for the GROBID service is 10. Depending on the number of CPUs in your system, this may cause paperetl to exhaust the GROBID engine pool, resulting in a 503 service unable error response when parsing PDFs. You can avoid this by increasing the concurrency setting in the GROBID configuration file as described in this [section](https://grobid.readthedocs.io/en/latest/Configuration/#service-configuration) of the documentation.

### Docker

A Dockerfile with commands to install paperetl, all dependencies and scripts is available in this repository.
Expand Down

0 comments on commit 36397dd

Please sign in to comment.