Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.0.3] Waiting for sync forever - not syncing [E2EE app bug] #2347

Closed
MegaV0lt opened this issue Aug 31, 2020 · 44 comments · Fixed by #2572
Closed

[3.0.3] Waiting for sync forever - not syncing [E2EE app bug] #2347

MegaV0lt opened this issue Aug 31, 2020 · 44 comments · Fixed by #2572

Comments

@MegaV0lt
Copy link

How to use GitHub

  • Please use the 👍 reaction to show that you are affected by the same issue.
  • Please don't comment if you have no relevant information to add. It's just extra noise for everyone subscribed to this issue.
  • Subscribe to receive notifications on status change and new comments.

Expected behaviour

Sync shortly after start

Actual behaviour

Tell us what happens instead

Sync folders are waitiung for starting sync. nothing happens

image

Steps to reproduce

  1. start nc client
  2. open nsettings

Client configuration

Client version: 3.0.1

Operating system: Linux Mint 20 Cinnamon

OS language: German

Qt version used by client package (Linux only, see also Settings dialog):

Client package (From Nextcloud or distro) (Linux only):

Installation path of client:

Server configuration

Nextcloud version: 19

Storage backend (external storage):

Logs

Please use Gist (https://gist.github.com/) or a similar code paster for longer
logs.

  1. Client logfile: https://www.dropbox.com/s/cphwg6cwzaktfoy/nc_log.txt?dl=0

Output of nextcloud --logdebug --logwindow or nextcloud --logdebug --logfile log.txt
(On Windows using cmd.exe, you might need to first cd into the Nextcloud directory)
(See also https://docs.nextcloud.com/desktop/2.3/troubleshooting.html#log-files)

  1. Web server error log:

  2. Server logfile: nextcloud log (data/nextcloud.log):

@er-vin
Copy link
Member

er-vin commented Aug 31, 2020

I thought this could be a duplicate of #2254 but the logs don't show that "do not enqueue" pattern. What's odd is that from the logs I'd say it's syncing mostly fine.

It complains about a symlink and a file which keeps changing all the time but apart from that proceed. Of course there are a few ignored files but they don't seem to block the sync either.

The most likely issue is the "504 Gateway Timeout" which appear several times. Those seem to indicate something server side though, that might indeed make the sync fail in such cases but nothing we can do on the client side then.

@MegaV0lt
Copy link
Author

MegaV0lt commented Aug 31, 2020

In the meantime i got a synced state, but takes very long to check/sync. I am local here on my homenetwork.
Edit. Later changes to lokan or remote files do not get synced at all

@er-vin
Copy link
Member

er-vin commented Aug 31, 2020

Yeah I guess that's related to those timeouts I'm seeing. There's apparently something going on with PROPFIND calls, they take forever to complete and sometimes they don't complete at all.

@bovender
Copy link

I have a very similar issue with a Nextcloud server v. 19.0.2 built from the official Docker image and a desktop client v. 3.0.1 running on KDE neon v. 5.19.

One thing that I noticed is high CPU load in the server container that remains high even after I terminated the desktop client. When I rebooted the whole server, the CPU was mostly idle (I have ~ 20 Docker containers with various services on this machine, but only few human users). However, as soon as I started the desktop client again, CPU load on the server was back up. The server logs (Nextcloud, Postgres, Redis, OpenLDAP) did not contain anything that seemed remotely useful.

This is the current output of a htop command that was issued inside the Nextcloud container:

image

It strikes me that I can trigger this behavior on the server by starting the desktop client. Hopefully this information helps to identify the issue.

@MegaV0lt
Copy link
Author

MegaV0lt commented Sep 2, 2020

I checked. Here too higher load. Only one desktop client is connected: MySQL only with NC you can see when Desktop was running:
grafik

@bovender
Copy link

bovender commented Sep 3, 2020

I watched htop and the Nextcloud Docker container logs simultaneously for a while.

When I SIGKILL the one Apache process that uses 50 % of CPU time, the Desktop client starts synchronizing data (several PUT requests in the log).

Then nothing happens for a few seconds.

Then a PROPFIND request appears in the logs, and CPU usage jumps up to 50 % again.

So as already noted by @er-vin, evidently something is foul about those PROPFIND requests.

@MegaV0lt
Copy link
Author

MegaV0lt commented Sep 8, 2020

The Client is unusable. Short after start it gets exclamation and is not syncing any changes
grafik

@ulugris
Copy link

ulugris commented Sep 9, 2020

I experienced the same problem as @bovender after updating my Windows client from 2.6.5 to 3.0.1. Everytime I launched the client, synchronization got stuck, and the server CPU usage increased abnormally, remaining high after the client was closed.

Our Nextcloud server configuration is different, though: it runs on Apache with PHP-FPM. So in my case the high CPU usage was not caused by 'apache2', but by a bunch of 'php-fpm' and 'smbclient' processes (we have an external SMB storage, although it is excluded from client sync). Restarting the 'php7.4-fpm' service got the CPU load back to normal.

Reverting to 2.6.5 made everything work as expected again, so I will keep this version until the problem gets fixed.

@bovender
Copy link

bovender commented Sep 9, 2020

I can confirm that reverting the desktop client to a 2.x version fixes the problem. The client synced immediately, and CPU usage on the server dropped below 1% and does not spike again.


On my KDE neon system, I had to remove nextcloud-client, nextcloud-desktop and libcloudproviders0 and then install the client as well as the dependencies with explicit version tagging, e.g.

sudo apt install libcloudproviders0=2.6.2-1build1 nextcloud-desktop=2.6.2-1build1 dolphin-nextcloud=2.6.2-1build1

(Edit: In order to find out which versions are available on a given Debian-derived system, issue apt-policy cache nextcloud-desktop.)

@eibex
Copy link

eibex commented Sep 11, 2020

I am experiencing the same issue. Installed E2EE today and the Nextcloud Windows client became incredibly slow, coupled with high CPU usage on the server when the client is "Waiting...".

The desktop client also cannot encrypt folders (empty or not) as it always says to wait for the folder to sync before encrypting it. It manages to download encrypted folders from other clients just fine though.

The client was working fine before enabling E2EE. I am not having any issues with the iOS client, so I doubt this is a server-side problem. The server logs do not show anything.

EDIT: I'm on 3.0.1 and 19.0.3
EDIT2: As mentioned in https://github.com/nextcloud/end_to_end_encryption/issues/191, PHP-FPM and the database (PostgreSQL in my case) start using 100% of the CPU until the Nextcloud desktop sync is over (over 2 minutes even if no files were changed).

@kruthoff
Copy link

kruthoff commented Sep 26, 2020

I had the same issues after client version 2.6.3. Sync was really slow, stuck, broke, ... It was just unusable for me. A change in 2.6.4 made me curious: #1823 See changelog.

And boom - setting the environment variable OWNCLOUD_HTTP2_ENABLED=1 changed the sync client to work as expected. Fast, reliable and without a single hickup! Even 3.0.2 works well with this setting. I didn't look into #1823 yet, but I'm happy that my sync works again. It works with HTTP/2.0 but doesn't with HTTP/1.1.

@bovender
Copy link

Hm, unfortunately that did not work for me...

@aluaces
Copy link

aluaces commented Sep 28, 2020 via email

@eibex
Copy link

eibex commented Sep 28, 2020

I believe it is related to E2EE as it pegs the CPU to 100% with database related work (and it didn't do so before I enabled it).

However, I don't think it's the server module itself as those scans do not start with the iOS app. They are only started with the desktop client.

EDIT: This issue still exists on version 3.0.2

@tacruc
Copy link

tacruc commented Oct 2, 2020

I think I had the same issue, at least the same behavior, disabling e2e encryption app on the server and restating the client makes it sync fast again.
Could it be related to: nextcloud/end_to_end_encryption#189

@q-wertz
Copy link
Contributor

q-wertz commented Oct 2, 2020

I'm connected to a Nextcloud Server 19 instance and to a 18 one. For me the problems only seem to appear on the newer one. Old one seems to be running fine...

@bovender
Copy link

bovender commented Oct 2, 2020

@tacruc I also disabled the E2EE app, upgraded the desktop client to 3.0.2, and now the server CPU is idling... Thanks!

edit: typo

@MegaV0lt MegaV0lt changed the title [3.0.1] Waiting for sync forever - not syncing [3.0.1] Waiting for sync forever - not syncing [E2EE app bug] Oct 2, 2020
@tacruc tacruc changed the title [3.0.1] Waiting for sync forever - not syncing [E2EE app bug] [3.0.2] Waiting for sync forever - not syncing [E2EE app bug] Oct 3, 2020
@bcutter
Copy link

bcutter commented Oct 3, 2020

I can confirm this issue is related to E2EE but unfortunately there´s absolutely no progress on that side for weeks. Promoting it as "production ready" and leaving users with stressed CPUs isn´t the "usual" understanding of production ready.

Anyway, this issue still persists with latest desktop client v3.0.2 and current Nextcloud v19.0.3.1.
See https://github.com/nextcloud/end_to_end_encryption/issues/191 or nextcloud/end_to_end_encryption#189

@bcutter
Copy link

bcutter commented Oct 3, 2020

I had the same issues after client version 2.6.3. Sync was really slow, stuck, broke, ... It was just unusable for me. A change in 2.6.4 made me curious: #1823 See changelog.

And boom - setting the environment variable OWNCLOUD_HTTP2_ENABLED=1 changed the sync client to work as expected. Fast, reliable and without a single hickup! Even 3.0.2 works well with this setting. I didn't look into #1823 yet, but I'm happy that my sync works again. It works with HTTP/2.0 but doesn't with HTTP/1.1.

@kruthoff: Can you explain where you set this env variable?
Server side in config.php, Apache/nginx, ...? Client side where exactly?

@eibex
Copy link

eibex commented Oct 3, 2020

I don't think the issue is on the E2EE module itself or the issue would be replicated with other clients. iOS doesn't have this issue. It's limited to the desktop client when E2EE is enabled.

@kruthoff
Copy link

kruthoff commented Oct 3, 2020

I had the same issues after client version 2.6.3. Sync was really slow, stuck, broke, ... It was just unusable for me. A change in 2.6.4 made me curious: #1823 See changelog.
And boom - setting the environment variable OWNCLOUD_HTTP2_ENABLED=1 changed the sync client to work as expected. Fast, reliable and without a single hickup! Even 3.0.2 works well with this setting. I didn't look into #1823 yet, but I'm happy that my sync works again. It works with HTTP/2.0 but doesn't with HTTP/1.1.

@kruthoff: Can you explain where you set this env variable?
Server side in config.php, Apache/nginx, ...? Client side where exactly?

I just set this env variable when I start the client. This fixed my unreliable sync issues. It is not related to the slow E2E issue.

@er-vin
Copy link
Member

er-vin commented Oct 5, 2020

I don't think the issue is on the E2EE module itself or the issue would be replicated with other clients. iOS doesn't have this issue. It's limited to the desktop client when E2EE is enabled.

I understand why you would make such an assumption. And at the same time it could be a wrong conclusion to draw. The thing is the mobile clients use much less of the DAV API than the desktop client (because in one case you're in a "closed world" as seen through the app while in the other case you're in an "open world" seen through the file manager outside of the application).

In particular I still suspect the PROPFIND call, we need to make one with a deeper depth than the mobile clients and it seems that for some reason it's generating a large server side CPU consumption during a SQL call.

@bcutter
Copy link

bcutter commented Oct 16, 2020

I still have no idea what the hell is going on when syncing. PHP-FPM and MYSQLD are freaking out. I see plenty of SELECT statements on nc_filecache with left joining nc_filecache_extended looking for a certain path_hash in a specific storage.

I have no idea if that's normal during sync or really E2EE related.

Guys how can we make progress here. Does nobody care cause nobody uses E2EE?

@as-com
Copy link

as-com commented Oct 16, 2020

See here: #2552

@bcutter
Copy link

bcutter commented Oct 16, 2020

See here: #2552

Well that one was closed as duplicate of this one here. Is it a endless circle thing game or...?

@epikurist
Copy link

Just to be clear #2552 specifically indentified a PROPFIND call with infinite depth as the issues of this bug:

PROPFIND https://[redacted]/remote.php/webdav HTTP/1.1
Host: [redacted]
OCS-APIREQUEST: true
Depth: infinity
Authorization: Basic [redacted]
User-Agent: Mozilla/5.0 (Windows) mirall/3.0.1stable-Win64 (build 20200828) (Nextcloud)
Accept: */*
Content-Type: text/xml; charset=utf-8
X-Request-ID: cf545295-140b-4634-b90d-cb93dfb9fbb0
Cookie: oc_sessionPassphrase=[redacted]; __Host-nc_sameSiteCookielax=true; __Host-nc_sameSiteCookiestrict=true; ocnwrqh103sw=[redacted]
Content-Length: 114
Connection: Keep-Alive
Accept-Encoding: gzip, deflate
Accept-Language: en-US,*

<d:propfind xmlns:d="DAV:"> <d:prop xmlns:nc="http://nextcloud.org/ns"> <nc:is-encrypted/> </d:prop> </d:propfind>

@tacruc
Copy link

tacruc commented Nov 2, 2020

I was quite exited about 3.0.3 as I hoped it would solve this. Unfortunately it didn't.
I'm not shure which information to post.
I have two doubts with the client log. First I found my End2End memonic in it. Second I have a second account in the clients and the log is a mix out of both accounts.

@er-vin
Copy link
Member

er-vin commented Nov 3, 2020

So we got two type of people regarding the logs: the ones which edit them to hide too sensitive information. And the ones who provide them as is via a secure file drop to us.

Actually we got a third type of people who post as is right away, obviously I advise against it.

@alex6480
Copy link

alex6480 commented Dec 3, 2020

How do I post logs with secure drop? My logs are littered with sensitive data so it would be impossile to remove it all.

I'm getting this issue on the newest server 20.2 and the newest client 3.0.3.
I can see in the log that there are timeouts, but I do not know what is causing them. In the log it says it's trying to propfind every single folder I have access to

2020-12-03 23:30:09:898 [ info nextcloud.sync.accessmanager ]:	6 "PROPFIND" "" has X-Request-ID ""
2020-12-03 23:30:09:497 [ info nextcloud.sync.networkjob ]:	OCC::GetFolderEncryptStatusJob created for "" + "remote.php/webdav" "OCC::ClientSideEncryption"

After exactly one minute it stops and Nextcloud just says waiting.
It says the following in the log when it timeouts and stops.

2020-12-03 23:51:52:055 [ warning nextcloud.sync.networkjob ]:	Network job timeout QUrl("https://example.com/remote.php/dav/files/user/")
2020-12-03 23:51:52:055 [ info nextcloud.sync.credentials.webflow ]:	request finished
2020-12-03 23:51:52:055 [ warning nextcloud.sync.networkjob ]:	QNetworkReply::OperationCanceledError "Connection timed out" QVariant(Invalid)
2020-12-03 23:51:52:055 [ warning nextcloud.sync.credentials.webflow ]:	QNetworkReply::OperationCanceledError
2020-12-03 23:51:52:055 [ warning nextcloud.sync.credentials.webflow ]:	"Operation canceled"
2020-12-03 23:51:52:055 [ info nextcloud.sync.networkjob.propfind ]:	PROPFIND of QUrl("") FINISHED WITH STATUS "OperationCanceledError Connection timed out"
2020-12-03 23:51:52:055 [ warning nextcloud.sync.networkjob.propfind ]:	*not* successful, http result code is 0 ""
2020-12-03 23:51:52:055 [ warning nextcloud.sync.credentials.webflow ]:	QNetworkReply::OperationCanceledError
2020-12-03 23:51:52:055 [ warning nextcloud.sync.credentials.webflow ]:	"Operation canceled"
2020-12-03 23:51:52:072 [ warning default ]:	QIODevice::read (QNetworkReplyHttpImpl): device not open
2020-12-03 23:51:52:073 [ info nextcloud.gui.account.state ]:	AccountState connection status change:  OCC::ConnectionValidator::Connected -> OCC::ConnectionValidator::Timeout
2020-12-03 23:51:52:075 [ info nextcloud.gui.account.state ]:	AccountState state change:  "Connected" -> "Network error"
2020-12-03 23:51:52:076 [ info nextcloud.gui.socketapi ]:	Sending SocketAPI message --> "UNREGISTER_PATH:" to QLocalSocket(0x12600942af0)
2020-12-03 23:51:52:129 [ info nextcloud.gui.socketapi ]:	Sending SocketAPI message --> "UNREGISTER_PATH:" to QLocalSocket(0x12600942f50)
2020-12-03 23:51:52:192 [ debug nextcloud.gui.account.settings ]	[ OCC::AccountSettings::showConnectionLabel ]:	"No connection to Nextcloud at <a href=\"\"></a>.\nConnection timed out"

billede

Something weird is that my server timeout is set to 300 s. I even tested it by adding sleep(1000000) to the top of index.php and I timed it to 300 seconds before I got a timeout.

I can provide full logs, if you tell me how I can do it without sharing them publicly.

@bcutter
Copy link

bcutter commented Dec 4, 2020

@alex6480 Here You can find instructions regarding a secure private file drop option:

#1269 (comment)

I really appreciate your work providing logs. Hopefully @er-vin can have a look at it and find the root cause.

@alex6480
Copy link

alex6480 commented Dec 4, 2020

I have now provided server and client logs using the file drop.

@bcutter
Copy link

bcutter commented Dec 5, 2020

So let's wait, hope and pray... 😃

@MegaV0lt MegaV0lt changed the title [3.0.2] Waiting for sync forever - not syncing [E2EE app bug] [3.0.3] Waiting for sync forever - not syncing [E2EE app bug] Dec 6, 2020
@MegaV0lt
Copy link
Author

MegaV0lt commented Dec 6, 2020

Bug still existst. Have to disable e2eeshit

@bcutter
Copy link

bcutter commented Dec 6, 2020

Bug still existst. Have to disable e2eeshit

Right. Can someone reopen this issue? It's not fixed/closed.

@er-vin
Copy link
Member

er-vin commented Dec 7, 2020

Bug still existst. Have to disable e2eeshit

Right. Can someone reopen this issue? It's not fixed/closed.

Got superseded by #2593 at that point.

As for logs no need to anymore, we got everything we need and have a pretty good understanding of what happens now. Fix will take time to deliver though, I have a branch getting there but won't land for 3.2.0 only.

@tacruc
Copy link

tacruc commented Dec 16, 2020

Just as an update: Still a problem on 3.1.0 and NC 20.0.4.

@bcutter
Copy link

bcutter commented Dec 16, 2020

Just as an update: Still a problem on 3.1.0 and NC 20.0.4.

Of course it is - and will be for every new version constellation, cause nothing happened so far. Unfortunately I can't support because of lacking dev skills :-(

@davidhildenbrand
Copy link

Also running into this issue with Linux and MacOS clients. Took me 2 hours to figure out that the e2e app resulted in this issue after an upgrade to nc20 from nc19. e2e seems to be unusable in this constellation.

@bcutter
Copy link

bcutter commented Dec 30, 2020

This one is still closed, see #2593

Damn it's so frustrating.

@tinel-c
Copy link

tinel-c commented Jan 11, 2021

Same issue here.

Nextcloud 20.0.4
Nexcloud windows 10 client 3.1.1
End-to-End Encription app 1.6.2

Disabling the e2e app and restarting the client application worked. The client stopped waiting and started the syncronization.

@nshtg
Copy link

nshtg commented Feb 2, 2021

This issue is still present. Disabling E2E fixes the problem.

Nextcloud 20.0.6, Linux and macOS Clients on 3.1.2.

@bcutter
Copy link

bcutter commented Feb 2, 2021

This one is still closed, see #2593. Only few weeks left until fix will make it to the stable desktop client...

@Dzeri96
Copy link

Dzeri96 commented Aug 9, 2021

Still an issue on Nextcloud 21.0.3 and linux client 3.1.1. I do have E2EE enabled.

@eibex
Copy link

eibex commented Aug 9, 2021

@Dzeri96 it was fixed in 3.2.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.