-
-
Notifications
You must be signed in to change notification settings - Fork 702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cache reuse for small subset of files #1064
Comments
I rechecked how exactly this works and I don't really see a big problem in algorithm(at least with disabled delete outdated result option).
Few things that can be improved(I'm working on it):
this looks like bug, but with current master branch I cannot reproduce problem, can you reproduce this with e.g. #1072 (should use new cache files format)? |
I guess my main point in the prior message is that the algorithm should probably pre-filter the list of files via directory prefix before doing IO. It is much, much faster for the CPU to do a substring check than for the CPU to ask IO about a specific file. It really makes a big difference, especially in my specific case where the cache has many files outside of the directory that was specified for a specific run. To put it into context, the user specifies a list of directories But yes, I see you have spent a lot of time thinking about it in #1072 so I will try it out and see if I still experience the above issue. Thank you for looking into this |
After building from loading_saving
it runs in only a few seconds so I think you have fixed the issue :) |
With #1064, loading from ssd ~100000 cached results(in duplicate files mode) with testing if all files exists takes less than second, so not sure what exactly is/was a problem(I'm talking about 12 hours of cache processing.
|
I noticed that czkawka is taking >12 hours on a small directory so I popped open strace to investigate. It looks like despite only referencing one directory with no subdirectories nor hard/soft links inside of it, czkawka is checking that every file in the cache exists:
When I override the HOME environment variable it only takes 2 seconds to run:
Perhaps it would be better to only run
statx
on each path only if they are a subpath of any defined directories.Additionally, czkawka stats each file in the cache even if
--image_delete_outdated_cache_entries
is false in~/.config/czkawka/czkawka_gui_config_4.txt
The text was updated successfully, but these errors were encountered: