-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indexing CPU thread limit? Parsing multiple files, reads only 8 files at a time #1774
Comments
@JervenBolleman To verify, I just ran The average speed of the parsing is 4.3 M/s. Which average speed is reported on your machine? |
1.2 M/s and only 8 cores are busy feeding the fifos (zcat) with
IndexBuilderMain cpu% in top going between 1200 to 2000%.
…On Thu, Feb 6, 2025 at 10:38 PM Hannah Bast ***@***.***> wrote:
@JervenBolleman <https://github.com/JervenBolleman> To verify, I just ran qlever
index with the standard Qleverfile for UniProt (which reads 677 input
streams) on an otherwise idle machine with 32 logical cores. Than all cores
are busy and ps -ef | grep "gzip -cd" shows over 600 processes.
The average speed of the parsing is 4.3 M/s. Which average speed is
reported on your machine?
—
Reply to this email directly, view it on GitHub
<#1774 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHQYFJB3UIEXLAQNN6NJLD2OPI6JAVCNFSM6AAAAABWUOT5C6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNBRGEYDSMZSGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Jerven Bolleman
***@***.***
|
Hi @JervenBolleman In
I think we can also hack around when we meet soon. |
@joka921 thanks, I will set NUM_PARALLEL_PARSER_THREADS to 1/3rd total number CPU, and the others to 1/2. And see how that goes. Using First experiment is that it does not seem to really help. I am wondering about my own IO settings and if there might be a problem there. |
@joka921 +1 for that, I tried to manually increase constants that you specified and it helped to reduce index build, on relatively small data I got down from 764s to 676s |
@JervenBolleman @aindlq Just for the record: I am currently building an index for what I assume is the same dataset (complete UniProt RDF dump from 2025-02-05), and the average parsing speed is 4.1 M/s, on a Ryzen 9 9950X (16 cores) with 4 NVMe in a RAID0 and using the standard settings regarding the constants listed by @joka921. |
with image from docker hub:
with custom build:
I just more or less randomly increased constants that @joka921 recommended. Parsing n-triples from stdin in parallel. fs cache is dropped before index run.
|
@aindlq That is very valuable feedback, thanks! It would of course be great if QLever could figure that out itself given the resources. Not the highest-priority item on our list, but we will look into that eventually |
@hannahbast I am indexing on an original Epyc Zen 1 machine. Difference in parsing speed is equivalent to difference in CPU frequency. Seems like there is a bottleneck operation somewhere in there. |
Giving qlever indexer 600+ files to load, only 8 files are read at a time (notice 8 zcat files, and up to 1600% cpu usage for the IndexBuilderMain executable, average more around 1400% on top). Is there an inbuilt limitation? as the machine this is observed on has many more idle CPU cores.
The text was updated successfully, but these errors were encountered: