From 36397dd7e0cf4ed56042730d8f1dc13538702aef Mon Sep 17 00:00:00 2001 From: Simon Epstein Date: Sun, 3 Dec 2023 15:44:50 +0000 Subject: [PATCH] Added note on grobid concurrency configuration to README. --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 51c4d45..94142e7 100644 --- a/README.md +++ b/README.md @@ -73,6 +73,8 @@ necessary for PDF files. - [GROBID install instructions](https://grobid.readthedocs.io/en/latest/Install-Grobid/) - [GROBID start service](https://grobid.readthedocs.io/en/latest/Grobid-service/) +Note the concurrency setting for the GROBID service is 10. Depending on the number of CPUs in your system, this may cause paperetl to exhaust the GROBID engine pool, resulting in a 503 service unable error response when parsing PDFs. You can avoid this by increasing the concurrency setting in the GROBID configuration file as described in this [section](https://grobid.readthedocs.io/en/latest/Configuration/#service-configuration) of the documentation. + ### Docker A Dockerfile with commands to install paperetl, all dependencies and scripts is available in this repository.