-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NQ parsing: IndexBuilderMain "merging partial vocabularies" takes very long time #1468
Comments
@Stiksels Thanks for reporting this. |
@joka921 it eventually did work, the merging of the partial vocabularies took 3+ hours.
Here is the log: |
Some additional info, I'm running on an older MacBook Pro model:
In our cloud /K8S setup (amd64), the index build for the compressed nquads file took ~2h in total (faster than the compressed ntriples file) I'll add a download link for the file shortly |
@Stiksels Can you provide a link to the NQ file? |
Hi @hannahbast , here is the downloadlink (exp 12h):
|
Following up on this: I ran uit-activiteiten-full-nq.index-log (1).txt Not sure if it's necessary or high prio to support older devices? I put in a request for a new laptop 😂 |
@Stiksels It's not necessarily about the age of the computer, but about the version of the compiler and maybe the operating system. The merging of the vocabularies handles many files using many threads. It seems that with older compilers and/or older operating systems, the machine code produced does something crazily non-optimal. We haven't figured out exactly what yet. |
I tried the link and got
|
@hannahbast can you try again with this new link (expires at 22h35 Brussels time):
|
update about the issue above; the difference in indexing performance between the Mac intel vs Mac silicon was due to the used qlever version (0.5.3 vs any version higher). Docker image : With version 0.5.3 and manually overwritten index.py command to use index.py def execute
mac silicon
Mac Intel
|
Also, I'm working in a virtual environment with Python 3.12.6 |
with the latest docker image
Steps to reproduce:
|
This seemed like a resource allocation issue; by(significantly) increasing the memory limit in Docker Desktop (and making sure there are no conflicts with running servers), the indexing for multiple datasets now runs stable |
Issue description
Trying to build index for a zipped nquads file (~2mio named graphs, ~140mio triples). The proces gets stuck on "Merging partial vocabularies" for over 2hours now...
Logs
The text was updated successfully, but these errors were encountered: