-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize large (~500K to 1M+) number of paths #7
Comments
Hey, thanks for filing the note! I've had mirror work with ~80k files (iirc), and 200k is in the ballpark, but both 80k and 200k are getting up there in terms of where mirror can start taking longer (generally/hopefully on the initial sync), using more memory (I typically run with ~2-4gb of RAM iirc for that many files).
That is interesting. I'm not really sure what may cause that, but your guess about the inotify limits being too low on the server side would probably explain that. I'm not sure what the failure mode for "past inotify limit" is; e.g. if the kernel would throw some sort of error that then watchman could report back to mirror that "btw, we're not actually getting notifications for this". Oh, but right, you said watchman itself is working fine. So it must be something within mirror. That is curious. You could try You could also run
Yeah, that makes sense. I've thought of potentially building this into mirror, where it assumes 1st level of directories is a large number of projects, some "active" and some "inactive", and it only fully syncs "active" projects (so for the inactive projects, it can avoid both using inotify limits and also JVM heap usage for the tree of paths/mod times). That would add some more bookkeeping internally, which would be doable, but I haven't figured out the best way to determine which projects are active. The easiest thing would probably be to wait until a write happens. But seems like ideally it could also watch for reads, but those don't go through inotify, and would require something like FUSE, to tell when the client is accessing which directories. Anyway, not sure I have anything too helpful; try the debug flags and check the heap and let me know if that works. |
It's possible related to #9 (on the server side), which I'd rather look at instead because it's affecting even small folders (so definitely not the inotify limits this time). Thanks for the quick response and detailed writeup! I think having some higher-level management of folders would be really cool in the long-term. I already wrote some small bash scripts around starting up individual mirror processes in the background and creating lockfiles etc. to ensure no clashes - would be great if that kind of stuff could eventually be integrated into the main tool (or I could release a higher-level tool which manages the various mirror processes). I think if you can help me out with #8 then I'd be able to spend some time digging into this + issues like #9 and hopefully work out what's going on. |
Hi there! I just stumbled upon this project today and it seems like exactly what I need to replace my very slow SSHFS setup, so thank you very much for all your work!
I've been playing around with it and settings things up in my environment, where (similarly to your examples) I have a folder containing various code projects, and each of these code projects contains quite a few files. After seeing things work in each project folder individually, my plan was to effectively 'mount' all of them at once and let mirror handle the syncing. I started off with an initial rsync which pulled in most of the data.
What I'm finding is that for a folder with all my projects (mirror reports that the server has 219717 paths), syncing appears to only work one way: my client can make a change and have it reflected on the server, but not the other way round. If I restart the client or server then things do get back in sync during the initial sync that occurs.
So I'm wondering if this is related to the inotify limits that you mention in the readme. Unfortunately I'm in an environment on the server where I can't change those limits. Interestingly though, watchman itself seems to detect the changes that mirror isn't responding to: I set up a trivial trigger to echo files that are changed, and I see them in the watchman log. I'm unsure if there's a way to access more verbose logs from mirror, so at this point I'm at a bit of a dead end. I took a look at some of the source code but couldn't work out where to start without access to a debugger, and my experience debugging java code is a little lacking :(
My workaround for now will likely be to spawn individual clients for each of my project folders as required, as that seems to avoid this problem. But if there is a way to have the single code/ folder picked up from one client, it would make managing those processes a little easier for sure.
The text was updated successfully, but these errors were encountered: