Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UrsaDB process gets killed on big set of files #222

Closed
1 of 4 tasks
xor3r opened this issue Jul 20, 2020 · 3 comments
Closed
1 of 4 tasks

UrsaDB process gets killed on big set of files #222

xor3r opened this issue Jul 20, 2020 · 3 comments
Milestone

Comments

@xor3r
Copy link

xor3r commented Jul 20, 2020

Environment information

  • Mquery version (from the /status page): 1.2.0
  • Ursadb version (from the /status page): 1.4.2+afe5144
  • Installation method:
    • Generic docker-compose
    • Dev docker-compose
    • Native (from source)
    • Other (please explain)

Reproduction Steps
I have successfully installed mquery on bare-metal machine with the following config:
OS: Ubuntu 18.04
CPU: 4 cores, 8 threads
RAM: 4GB
Storage: ~1TB

It works fine and smooth on small sets of files, but I tried to test it on ~16.000 PE samples (~25GB) and after some time UrsaDB process gets killed (I will attach a screenshot below).

Expected behaviour

Mquery successfully indexes 25GB set of samples and allows to run queries on it.

Actual behaviour the bug

UrsaDB process gets killed some time after pressing "reindex" button, so it is impossible to index such set of samples.

Screenshots

image

Additional context

Maybe the problem is with the configuration of a machine itself (not enough RAM, etc.), then I would be grateful if you could point at possible issue here, or provide minimal requirements for proper work of mquery.
I have also read about utils.index method and will test it, but it would be great to use the standard way of reindexing though.

@xor3r
Copy link
Author

xor3r commented Jul 21, 2020

UPD: process of indexing batch files with utils.index also gets killed in UrsaDB.

@msm-code
Copy link
Contributor

Sorry for not responding to this issue earlier. I didn't know how to reproduce it (and forgot about it later), but that's not an excuse for ignoring it. I was also not active in this project for some time.

Some thoughts:

  • I (or other people) routinely index millions of files with mquery/ursadb. So it's certainly possible
  • How much RAM do you have? It's possible that you can trim down ursadb's configuration a bit (by default it assumes quite a lot of RAM is available)
  • I use utils.index script for most of my indexing needs (it's nice for large datasets, because - in contrast to the raw indexing method, which is transactional - it can be stopped in the middle and resumed)

I realise it's probably not important for you anymore. In this case, if the problem turns out to be non-reproducible, I think I'll have to close the issue unresolved.

@msm-cert
Copy link
Member

How much RAM do you have? It's possible that you can trim down ursadb's configuration a bit (by default it assumes quite a lot of RAM is available) After 1.5 years I've noticed you specified 4GB of RAM which is not a lot.

But you can make it work by limitting ursadb a bit. See https://cert-polska.github.io/ursadb/limits.html

As documented, you can make merging work with 4GB of RAM by tweaking merge_max_datasets, But memory consumed during indexing is not configurable currently. The only thing that affect it is the number of indexes in a dataset (gram3, hash4, text4, wide8), so you may drop one or two of them. It could be made more memory effective, at a cost of indexing speed, but working in low-memory environments is not a goal of this project.

If you still have problems with it, please create an issue in https://github.com/CERT-Polska/ursadb repository

@msm-cert msm-cert added this to the v1.5.0 milestone Sep 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants