Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Mac OS VFS: when fileproviderextdatabase.realm becomes large, Finder becomes unresponsive #7326

Open
5 of 8 tasks
marcotrevisan opened this issue Oct 16, 2024 · 8 comments

Comments

@marcotrevisan
Copy link

⚠️ Before submitting, please verify the following: ⚠️

Bug description

The FileProviderExt process database seems to grow up very big ~/Library/Group Containers/com.nextcloud.desktopclient/FileProviderExt/Database/fileproviderextdatabase.realm.
The size may be normal but this is causing a steady 100% CPU usage (one core) by the process, so either there's a "leak" that makes the database grow too big, or we need some more indexes / query optimization.

Steps to reproduce

In our organization, Finder (especially using client 3.14.1) becomes very slow on the NC folders soon after the first installation and configuration of the client: sometimes it takes a minute to open a folder or a file; this is always accompanied by a steadily high (100% of one core) CPU usage by the FileProviderExt process.
Among the files opened by the process, I found out that ~/Library/Group Containers/com.nextcloud.desktopclient/FileProviderExt/Database/fileproviderextdatabase.realm which has grown quite big (over 100MB).

So I tried this:

  1. Disable Virtual File Sync in settings
  2. Quit the client,
  3. Clear Caches in ~/Library
  4. Remove the whole ~/Library/Group Containers/com.nextcloud.desktopclient folder.
  5. Reboot mac OS
  6. Re-Enable Virtual File Sync in settings

This made Finder become very responsive again, and the FIleProviderExt process does not eat up 100% CPU. Unfortunately I'm expecting the database to grow up again soon.

Expected behavior

I'd expect a much smaller database to be kept locally, or if not possible, the queries made not suffer that much from database size.

Which files are affected by this bug

The NC local share

Operating system

macOS

Which version of the operating system you are running.

Sonoma 14.x and Sequoia 15.0.1

Package

Official macOS 12+ universal pkg

Nextcloud Server version

29.0.8

Nextcloud Desktop Client version

3.14.1

Is this bug present after an update or on a fresh install?

Updated to a major version (ex. 3.3.6 to 3.4.0)

Are you using the Nextcloud Server Encryption module?

Encryption is Disabled

Are you using an external user-backend?

  • Default internal user-backend
  • LDAP/ Active Directory
  • SSO - SAML
  • Other

Nextcloud Server logs

No response

Additional info

No response

@marcotrevisan
Copy link
Author

@claucambra I can send you a zip of my local database if you like.

@budachst
Copy link

I am experiencing the same issue. Don't know, what is stored in that file-database, but the fun seems to start as soon as I start browsing folders on my Mac, which then change their icon from "not loaded" to "waiting for upload". Afaikt, the state doesn't change afterwards.

Maybe a spindump of the FileProviderExt would be a better choice to upload. I also regard this as a serious issue.

@marcotrevisan
Copy link
Author

marcotrevisan commented Oct 16, 2024

I've just installed Realm Studio and opened my ~100MB database.

  • ItemMetadata counted 111k entries. At a first glance (I'm not RQL expert at all) I can't see duplicate lines, it rather seems to span on a big part of the file share, including stuff I don't work on, and presumably never do.
  • LocalFileMetadata counts 61 entries, I guess they're the materialised files I am really working on.

I would like to share a thought about ItemMetadata (in case the current queries are considered good for the purpose and no bug is found): if the purpose of ItemMetadata is to avoid doing redundant IO, I'm not sure it is helpful to keep track of every item processed since the beginning of the client lifetime, because -at some point- it becomes slower than starting the db from scratch. I would consider using it more as a cache, i.e. keeping the items "touched" in the last X days only and removing the older stuff regularly, except for materialized items, to keep the count of ItemMetadata entries as small and useful as possible.
The filesystem can change while the client is offline, so ItemMetadata can have large amounts of outdated entries. The cost of having them stored can therefore be higher than re-discovering them, and this is especially true for folders that have not been "touched" for a long time.

Thanks !

@budachst
Copy link

Did the same after you mentioned it. My 67MB database has approx. 98k metadata entries although I only browsedt through some folders in the Finder, not even opening more than one file. I wonder what this DB would be in size, if there was an account that had multiple groupfolders in it, which all would have a high number of files/folders.

I also noted, that while FileProviderExt was working, it was creating quite a huge amount of download traffic from the NC server.

@budachst
Copy link

budachst commented Oct 16, 2024

Soo… I just wanted to see, how this works and I decided to wipe the FileProviderExt and start over while disabling the quick synchronization feature. I guessed that the algo was indiscriminately synchronizing probably 10% of the account's files metadata and I was curious, what would happen, if I disabled that. Turned out, that this time FileProviderExt tried to synchronize all file's metadata and them silently crashed somewhere along the way…

There was no notification and it was only me monitoring the FileProviderExt process itself, that made me notice that it crashed…

Also… the realm DB now consists of approx. 45k entries and has grown to 27MB in size. Doesn't that amount to approx. 512k per entry? How big is this DB supposed to grow? As stated earlier, we do plan to make heavy use of groupfolders, which will hold thousands of files each. Either the DB format needs to be optimized a lot, or this will probably not work for larger environments, which would be a real problem for us.

@marcotrevisan
Copy link
Author

marcotrevisan commented Oct 16, 2024

My preferred strategy is being as minimal as possible in terms of how much knowledge to keep locally. This also gives more guarantees on correctness.
Ideally, only materialized items are important and even those can be wiped after X days of inactivity.
My 2 cents.

Regards!

@budachst
Copy link

This is surely done that way to combat the performance shortcomings of WebDAV, but to me it also looks too aggressive. At least have an option to either size the amount of DB size allowed or maybe even better allow to choose different strategies for handling this metadata.

Also, there seems to be a bug in the FileProviderExt, which causes it to loop when it enters a folder with lots of files… I just tried to open a folder that contains 3583 files and the contents of the folder is not displayed, while the process for this particular FilesProviderExt continuously keeps busy at 99% CPU. I verified that on two clients and have taken a spindump from one, which I attached as well.

Spindump_FilesProviderExt.zip

@marcotrevisan
Copy link
Author

Thanks for your last message @budachst, from a user perspective seems like you better encircled the possible root cause of the performance issue. In our shared file structure I don't think we have a case with a so-high number of files in a single folder at the same level, but we reach the order of magnitude of a hundred for sure, in many folders.

This may explain why resetting the db gives immediate benefit (i.e. the provider won't sync those folders until newly discovered).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants