Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow importing of Crossref full metadata dump in LMDB #88

Closed
steppo83 opened this issue Mar 16, 2023 · 3 comments
Closed

Slow importing of Crossref full metadata dump in LMDB #88

steppo83 opened this issue Mar 16, 2023 · 3 comments

Comments

@steppo83
Copy link

Hello,
I'm trying to use biblio-glutton inside a pod in kubernetes but I'm facing a slowness the importing of Crossref full metadata dump:
image

Pos has this setup:
resources:       
limits:         
cpu: '2'         
memory: 8Gi       
requests:         
cpu: 250m         
memory: 64Mi

image

The config of biblio-glutton has the default settings.
Is there that I'm missing? How to improve the importing?

Thanks!

@karatekaneen
Copy link
Contributor

Sounds like a Kubernetes issue to me. We are running Glutton in K8s and haven't encountered indexing this slow.
What kind of disks do you have backing the PV?

@karatekaneen
Copy link
Contributor

I actually encountered slow indexing myself when running the version from #90 and the 2023 dump. Nothing in our environment has changed except those two things and we usually index about 5-7k/s. Now running on less than 1k/s:

crossrefLookup
             count = 62658487
         mean rate = 1013.66 events/second
     1-minute rate = 927.42 events/second
     5-minute rate = 836.58 events/second
    15-minute rate = 800.44 events/second

@lfoppiano
Copy link
Collaborator

As discussed in this thread, the first suspect is the disk throughput, and the second is the RAM memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants