-
Notifications
You must be signed in to change notification settings - Fork 439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: cache large directory listings #184
Comments
Could you give more details about how/where you're running this, where your bucket is, what type of bucket it is, and what you see if you run gcsfuse with |
Centos 7 Google image. All servers and bucket in asia-east1. I don't mind the first ls being slow but second ls no faster. In our system once a file is created it's not deleted for months. WARNING: gcsfuse invoked as root. This will cause all files to be owned by Opening bucket... |
I assume you're doing |
Just repeated 'ls' on the root of the bucket. Interestingly most slowness occurs in directories with no files, just sub directories. This particular root directory has about 200 sub directories I cannot be sure what other programs will execute. I would assume a proftpd may do an ls -l behind the scenes for a directory listing |
From what i can tell proftpd is doing an equivalent ls -l on each file individually in a directory one by one. Resulting in very bad performance. I will attempt to work around that one if i can Regardless, It still seems as though nothing ever gets cached, It's like it has totally ignored the stat cache setting |
Assuming the debug output above covers more than one What evidence makes you think that proftpd is at fault here? An alternate explanation: the number of |
I really shouldn't have mentioned proftpd, I use it for reference on how unusable the non existent caching makes it. I'm fine with the first ls being slow, it's to be expected. The debug output is a simple 'ls' repeated on the same directory Yes i can see that the cache is being recorded by the program. But whats the point in the cached 'ls' being just as slow on every single hit and no faster that an uncached 'ls' |
Here we can see that the access times are exactly the same for the non cached and cached ls ############NON CACHED ls (first go)################################################################ ###################CACHED ls one second after first ls############################################################################ |
Again, you can see that fewer requests are made in the second case. The wall clock time is the same only because your listing is so huge: the stat calls are performed in parallel with follow-up list requests, and because you've got so many follow-up requests to make the stat calls complete long before you're done listing. In other cases (such as a small listing, or opening a file) the stat cache makes a big wall time difference. Unfortunately I can't help you with large listings being slow in GCS; that's just how the API works (and the backend implementation too I suspect). The best that could be done here is caching the listing itself, but that is problematic:
So at the very least I think such a cache would need to be optional and off by default. I haven't heard any other complaints about this, so I'm not sure it's worth the effort (and the rarely-used feature that's likely to rot). |
Jacobsa, A key few points I have issue with. 100 to 200 folders is not really that big a directory.
Options are great when given And no most people will not complain, they will just give up and move on Thanks for your help |
I didn't intend to say this will never happen; just that it's not a clear win. Thank you for your report, which is a data point in its favor.
|
Currently at 204059 would hope to go to at least 500000 |
Problem solved the s3fs seems to operate in the manor I require. i.e. directories are cached enough so as not to cause timeouts in programs using the directories |
+1 |
Some more words: One of the most used pieces of web software called Wordpress does mkdir -p everywhere. Any alternatives you see or things I have missed? |
Any plans on touching this, @jacobsa? |
No, I still think that this is not a particularly obvious win due to the issues I mentioned earlier. |
There are two issues which needs to addressed here.
|
A simple 'ls' listing of a directory can take 30 seconds of more to complete.
Most programs are giving up before the directory list is available. Such as an FTP server listing a folder in the bucket.
I am currently using these mount options
--stat-cache-ttl 5m
--type-cache-ttl 5m
They seem to have zero effect.
ie An 'ls' of the same directory in quick succession results in the same 30 second wait time
All sub directories seem equally slow regardless of content.
Listing of empty folders is slightly quicker
gcsfuse installed from google yum repo
The text was updated successfully, but these errors were encountered: