-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
very slow on big files compared to grep #864
Comments
This isn't enough information to act on. Fixing performance bugs requires that it can be reproduced. Please find a way to reproduce the problem on an open dataset (or find a way to get me your dataset). Also, I see that you are on Windows. Almost all of my benchmarking has been done on Linux. What very little benchmarking I've done in Windows suggests that performance can be greatly impacted by active virus scanners. The high variability between your runs is also quite suspicious. Are you sure you aren't just measuring disk bandwidth? |
Hi, I will try to find a such data set. Btw. I installed rg on a debian machine: Now the times are comparable: $ time grep 'ScheduledTopUp.TimeStamp=1518' server.log.2018-02-14 -c real 2m20.756s $ time rg -j 4 -a 'ScheduledTopUp.TimeStamp=1518' server.log.2018-02-14 -c real 2m30.704s So issue seems to be windows related. |
Something is very clearly amiss. You're searching a single file. ripgrep doesn't benefit from parallelism when searching a single file, so the fact that it's faster suggests something is going wrong. One possible explanation is that ripgrep is actually searching your entire CWD, even though that would definitely be a bug given the command you're running. |
What version of ripgrep are you using?
$ rg --version
ripgrep 0.8.1 (rev c8e9f25)
-SIMD -AVX
What operating system are you using ripgrep on?
CYGWIN_NT-6.1 spdm1247 2.10.0(0.325/5/3) 2018-02-02 15:16 x86_64 Cygwin
Describe your question, feature request, or bug.
rg is 4x times slower then grep for a similar search.
I have a 11GB log file and just want to search for a simple text and count occurrences.
$ time rg -j 4 -a 'ScheduledTopUp.TimeStamp=1518' server.log.2018-02-14 -c
37930
real 0m46.510s
user 0m0.000s
sys 0m0.000s
$ time grep 'ScheduledTopUp.TimeStamp=1518' server.log.2018-02-14 -c
37930
real 0m13.145s
user 0m9.282s
sys 0m3.806s
Try with other settings too, but no improvements:
$ time rg -j 4 'ScheduledTopUp.TimeStamp=1518' server.log.2018-02-14 --mmap -c
37930
real 2m21.926s
user 0m0.156s
sys 0m0.452s
$ time rg 'ScheduledTopUp.TimeStamp=1518' server.log.2018-02-14 -c
37930
real 3m10.727s
user 0m0.234s
sys 0m0.405s
The text was updated successfully, but these errors were encountered: