Slow importing of Crossref full metadata dump in LMDB #88

steppo83 · 2023-03-16T13:55:08Z

Hello,
I'm trying to use biblio-glutton inside a pod in kubernetes but I'm facing a slowness the importing of Crossref full metadata dump:

Pos has this setup:
resources:
limits:
cpu: '2'
memory: 8Gi
requests:
cpu: 250m
memory: 64Mi

The config of biblio-glutton has the default settings.
Is there that I'm missing? How to improve the importing?

Thanks!

karatekaneen · 2023-03-24T09:45:55Z

Sounds like a Kubernetes issue to me. We are running Glutton in K8s and haven't encountered indexing this slow.
What kind of disks do you have backing the PV?

karatekaneen · 2023-05-31T07:21:54Z

I actually encountered slow indexing myself when running the version from #90 and the 2023 dump. Nothing in our environment has changed except those two things and we usually index about 5-7k/s. Now running on less than 1k/s:

crossrefLookup
             count = 62658487
         mean rate = 1013.66 events/second
     1-minute rate = 927.42 events/second
     5-minute rate = 836.58 events/second
    15-minute rate = 800.44 events/second

lfoppiano · 2024-09-13T09:34:33Z

As discussed in this thread, the first suspect is the disk throughput, and the second is the RAM memory.

lfoppiano closed this as completed Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow importing of Crossref full metadata dump in LMDB #88

Slow importing of Crossref full metadata dump in LMDB #88

steppo83 commented Mar 16, 2023

karatekaneen commented Mar 24, 2023

karatekaneen commented May 31, 2023

lfoppiano commented Sep 13, 2024

Slow importing of Crossref full metadata dump in LMDB #88

Slow importing of Crossref full metadata dump in LMDB #88

Comments

steppo83 commented Mar 16, 2023

karatekaneen commented Mar 24, 2023

karatekaneen commented May 31, 2023

lfoppiano commented Sep 13, 2024