Indexing CPU thread limit? Parsing multiple files, reads only 8 files at a time #1774

JervenBolleman · 2025-02-06T20:51:19Z

Giving qlever indexer 600+ files to load, only 8 files are read at a time (notice 8 zcat files, and up to 1600% cpu usage for the IndexBuilderMain executable, average more around 1400% on top). Is there an inbuilt limitation? as the machine this is observed on has many more idle CPU cores.

hannahbast · 2025-02-06T21:38:22Z

@JervenBolleman To verify, I just ran qlever index with the standard Qleverfile for UniProt (which reads 677 input streams) on an otherwise idle machine with 32 logical cores. Than all cores are busy and ps -ef | grep "gzip -cd" shows over 600 processes.

The average speed of the parsing is 4.3 M/s. Which average speed is reported on your machine?

JervenBolleman · 2025-02-06T22:41:31Z

1.2 M/s and only 8 cores are busy feeding the fifos (zcat) with IndexBuilderMain cpu% in top going between 1200 to 2000%.

…

On Thu, Feb 6, 2025 at 10:38 PM Hannah Bast ***@***.***> wrote: @JervenBolleman <https://github.com/JervenBolleman> To verify, I just ran qlever index with the standard Qleverfile for UniProt (which reads 677 input streams) on an otherwise idle machine with 32 logical cores. Than all cores are busy and ps -ef | grep "gzip -cd" shows over 600 processes. The average speed of the parsing is 4.3 M/s. Which average speed is reported on your machine? — Reply to this email directly, view it on GitHub <#1774 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAHQYFJB3UIEXLAQNN6NJLD2OPI6JAVCNFSM6AAAAABWUOT5C6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNBRGEYDSMZSGU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- Jerven Bolleman ***@***.***

joka921 · 2025-02-17T12:52:40Z

Hi @JervenBolleman
If your setup allows you to compile QLever from scratch, and therefore to change the source code,
you can play around with the following constants (typically by increasing them). In the future we can also expose those to be less hardcoded, but currently you are the only person with larger machines than us explicitly contacting us about this

In src/index/ConstantsIndexBuilding.h:

NUM_PARALLEL_PARSER_THREADS = 8, that is where the number of 8 files comes from. The question is however, are those 8 zcat threads also busy (all at 100% each).
NUM_PARALLEL_ITEM_MAPS = 10 (probably also relevant).
QUEUE_SIZE_[BEFORE|AFTER]_PARALLEL_PARSING = 10

I think we can also hack around when we meet soon.

JervenBolleman · 2025-02-17T14:53:00Z

@joka921 thanks, I will set NUM_PARALLEL_PARSER_THREADS to 1/3rd total number CPU, and the others to 1/2. And see how that goes. Using const auto processor_count = std::thread::hardware_concurrency(); to get the number of cores.
That does not play nice with constexpr, so trying hard coded first.

First experiment is that it does not seem to really help. I am wondering about my own IO settings and if there might be a problem there.

aindlq · 2025-02-17T15:13:34Z

@joka921 +1 for that, I tried to manually increase constants that you specified and it helped to reduce index build, on relatively small data I got down from 764s to 676s

hannahbast · 2025-02-17T16:36:43Z

@JervenBolleman @aindlq Just for the record: I am currently building an index for what I assume is the same dataset (complete UniProt RDF dump from 2025-02-05), and the average parsing speed is 4.1 M/s, on a Ryzen 9 9950X (16 cores) with 4 NVMe in a RAID0 and using the standard settings regarding the constants listed by @joka921.

aindlq · 2025-02-17T18:52:38Z

@hannahbast

with image from docker hub:

INFO: Triples parsed: 667,236,386 [average speed 2.2 M/s, last batch 2.1 M/s, fastest 2.6 M/s, slowest 2.0 M/s]

with custom build:

INFO: Triples parsed: 667,236,386 [average speed 3.2 M/s, last batch 3.0 M/s, fastest 5.2 M/s, slowest 2.6 M/s]

I just more or less randomly increased constants that @joka921 recommended. Parsing n-triples from stdin in parallel. fs cache is dropped before index run.

$ lscpu

  Model name:             Intel(R) Xeon(R) Gold 5412U
    CPU family:           6
    Model:                143
    Thread(s) per core:   2
    Core(s) per socket:   24

diff --git a/src/index/ConstantsIndexBuilding.h b/src/index/ConstantsIndexBuilding.h
index d7c18029..13b01a6e 100644
--- a/src/index/ConstantsIndexBuilding.h
+++ b/src/index/ConstantsIndexBuilding.h
@@ -66,21 +66,21 @@ constexpr inline std::string_view QLEVER_INTERNAL_INDEX_INFIX = ".internal";
 // unique elements of the vocabulary are identified via hash maps. Typically, 6
 // is a good value. On systems with very few CPUs, a lower value might be
 // beneficial.
-constexpr inline size_t NUM_PARALLEL_ITEM_MAPS = 10;
+constexpr inline size_t NUM_PARALLEL_ITEM_MAPS = 24;
 
 // The number of threads that are parsing in parallel, when the parallel Turtle
 // parser is used.
-constexpr inline size_t NUM_PARALLEL_PARSER_THREADS = 8;
+constexpr inline size_t NUM_PARALLEL_PARSER_THREADS = 24;
 
 // Increasing the following two constants increases the RAM usage without much
 // benefit to the performance.
 
 // The number of unparsed blocks of triples, that may wait for parsing at the
 // same time
-constexpr inline size_t QUEUE_SIZE_BEFORE_PARALLEL_PARSING = 10;
+constexpr inline size_t QUEUE_SIZE_BEFORE_PARALLEL_PARSING = 24;
 // The number of parsed blocks of triples, that may wait for parsing at the same
 // time
-constexpr inline size_t QUEUE_SIZE_AFTER_PARALLEL_PARSING = 10;
+constexpr inline size_t QUEUE_SIZE_AFTER_PARALLEL_PARSING = 24;

hannahbast · 2025-02-17T19:13:32Z

@aindlq That is very valuable feedback, thanks! It would of course be great if QLever could figure that out itself given the resources. Not the highest-priority item on our list, but we will look into that eventually

JervenBolleman · 2025-02-18T11:18:04Z

@hannahbast I am indexing on an original Epyc Zen 1 machine. Difference in parsing speed is equivalent to difference in CPU frequency. Seems like there is a bottleneck operation somewhere in there.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexing CPU thread limit? Parsing multiple files, reads only 8 files at a time #1774

Indexing CPU thread limit? Parsing multiple files, reads only 8 files at a time #1774

JervenBolleman commented Feb 6, 2025

hannahbast commented Feb 6, 2025

JervenBolleman commented Feb 6, 2025 via email

joka921 commented Feb 17, 2025

JervenBolleman commented Feb 17, 2025 •

edited

Loading

aindlq commented Feb 17, 2025

hannahbast commented Feb 17, 2025

aindlq commented Feb 17, 2025

hannahbast commented Feb 17, 2025

JervenBolleman commented Feb 18, 2025

Indexing CPU thread limit? Parsing multiple files, reads only 8 files at a time #1774

Indexing CPU thread limit? Parsing multiple files, reads only 8 files at a time #1774

Comments

JervenBolleman commented Feb 6, 2025

hannahbast commented Feb 6, 2025

JervenBolleman commented Feb 6, 2025 via email

joka921 commented Feb 17, 2025

JervenBolleman commented Feb 17, 2025 • edited Loading

aindlq commented Feb 17, 2025

hannahbast commented Feb 17, 2025

aindlq commented Feb 17, 2025

hannahbast commented Feb 17, 2025

JervenBolleman commented Feb 18, 2025

JervenBolleman commented Feb 17, 2025 •

edited

Loading