Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File-Event propagation stopping (worse in current stable/edge) #2417

Closed
EugenMayer opened this issue Jan 12, 2018 · 53 comments
Closed

File-Event propagation stopping (worse in current stable/edge) #2417

EugenMayer opened this issue Jan 12, 2018 · 53 comments

Comments

@EugenMayer
Copy link

I am maintaining http://docker-sync.io where we try to overcome some effects host-mounts in d4m ( docker on OSX in general ) spawn, specifically the performance.

We offer different ways ( rsync, unison ) and since about 1 year, OSXFS based "native_osx" - this relys on the FS events propagated by d4m - please see the chart here https://github.com/EugenMayer/docker-sync/wiki/8.-Strategies#native_osx-osx

a) It is known, that the propagation does stop working after some time, even before mac42: EugenMayer/docker-sync#410 - the only way we fix that is by restarting d4m completely - are there any better ways to that?

b) (the actual reason for this issue, but related to a) Now, since 42+ ( 42 works well ) things got a lot worse, that said, FS even propagation stops after some few operations or just some time, see EugenMayer/docker-sync#517

People do downgrade to mac42 EugenMayer/docker-sync#517 (comment) and seem to have far better results

Referencing to your statement

Volumes shared from the host are mounted remotely using a protocol that is quite similar to NFS, extended with support for transporting filesystem events. 

a) can we "restart" that service, when it stuck? Without restarting all containers
b) can we fix FS system propagation?
c) can we somehow detect when FS propagation has been stuck ( to trigger a )

Thanks

@djs55
Copy link
Contributor

djs55 commented Jan 12, 2018

Thanks for the ticket -- I'm hoping to investigate event propagation soon.

@pkyeck
Copy link

pkyeck commented Jan 15, 2018

I don't get any inotify events anymore - is this related?

@EugenMayer
Copy link
Author

@pkyeck yes, thats exactly the point. Those inotify events are not triggering and since we use unison to sync, that one is not triggered ( and you might use the usual node tooling watch tasks failing the same way )

@adiq
Copy link

adiq commented Mar 1, 2018

Any updates on this one? I guess this is a serious problem that should be resolved as soon as possible. If there is no time, the mount protocol should be reverted to old one.

@EugenMayer
Copy link
Author

An update here, mac55 with the fallback to raw did not help with this issue, this makes it clear, that those issues have rather to do with the new linux-distro from +42 and maybe specific ulimit/sysctl values which are now to limiting?

At least, wanting to share that mac55 is no fix for that one, since this issue is on High Sierra and Sierra, APFS and non APFS

@nazar
Copy link

nazar commented Apr 17, 2018

Any updates on this issue? This is a blocker for those of us developing inside docker containers in macos to upgrade beyond 17.09 due to the broken inotify issues reported here.

@baohx2000
Copy link

baohx2000 commented Apr 17, 2018

I think many people have switched to using the unison method instead. I've been using it with the latest docker CE (60) and it has worked fine AFAICT.
UPDATE: disregard - apparently switching to unison can cause another set of issues. :(

@EugenMayer
Copy link
Author

@baohx2000 by no means that is a solution, thats a trade of issues, and when talking about unison, usually far worse issues - the reason for that is obviously the file-watcher implemented for unison ( unox ).

@baohx2000
Copy link

D'oh! I must have missed that!

@nazar
Copy link

nazar commented Apr 17, 2018

Thanks for the info @baohx2000 - I did try using the unison strategy via the excellent docker-sync but the experience was less than ideal: it took a while to install unox on macos and CPU usage was much higher - so much so that it wasn't a usable solution for us.

Currently the perfect solution in terms of ease of setup and IO performance is using docker 17.09-mac-42 with docker-sync with the native_osx strategy. The only issue is that breaks when updating docker beyond docker 17.09-mac-42 and thus isn't a solution I can recommend to our development team.

edit: clarification that we had issues with unox and macos file system change monitoring leading to high CPU and not with unison.

@EugenMayer
Copy link
Author

@nazar @baohx2000 to clarify, that CPU usage does not come from unison, but from unox, the file watcher which triggers unison - but you need that one to run unison on OSX - that is exactly why native_osx has been created, to avoid any file watch on OSX since they all are fairly bad in performance due to the FS-Layer having aweful events.

@jak
Copy link

jak commented Apr 26, 2018

We're really struggling with this. Is there a known combination that works, using anything newer than 17.09?

Is there anything we can do to help provide additional data? We have a couple of mac's we can test on.

@djs55
Copy link
Contributor

djs55 commented Apr 26, 2018

I discovered one possible way the event stream from osxfs could fail -- the fix is in the latest edge 18.05.0-ce-rc1-mac63 released today. If you get a chance to try it let me know if it's any better. Thanks for your patience!

@jak
Copy link

jak commented Apr 26, 2018

@djs55 Thanks. Installed on two developer macbooks and first impressions from both devs are that it's better, i.e. no fs events have stopped. I will report back after some more intensive use.

@jak
Copy link

jak commented Apr 30, 2018

@djs55 Unfortunately I have a report from a dev on the team of the fs events stopping again, after a few hours use on the mac63 release. A restart of the d4m app hasn't helped. This is a new touchbar model with APFS (though earlier comments indicate this isn't a factor), with a qcow image.

Truly appreciate the effort.

@wesselvdv
Copy link

@djs55 I have been running this for a whole day on my mid-2015 MBP. No issues yet, running with a .raw image though.

@fdanielsen
Copy link

I just want to chime in here guys with a reference to #2375. Since getting my new MBP running with APFS, I haven't felt the need for docker-sync with respect to speed, but I have a somewhat weird issue of some containers seemingly receiving notification of changes to files and others not. Eg. my Python containers notice changes to Python files, while my Webpack devserver container doesn't notice changes to any source files (JS, SCSS mainly). I wouldn't doubt that's because of different container images (and OSes) or possibly different file system watch strategies, but…

In the comments to the aforementioned issue there was an interesting note of starting a separate container with docker run -v source-dir:container-dir … mounting the same directory into the container as used in Kubernetes. I tried that (in my case together with a compose file stack), and suddenly file system events where propagating into the Webpack devserver container too. Nothing but starting that extra container separately "solved" it. Quite annoying, but better than manually triggering file changes within the container.

Also running .raw image, on macOS 10.13.4 with APFS and Docker 18.05.0-ce-rc1-mac63 (24246).

@michalkleiner
Copy link

@fdanielsen interesting. What file editor you're using to edit the js and scss files? There's a thing with Sublime and maybe Atom (and potentially other file editors) where they do not save files immediately and therefore the event can't occur. Not sure how a separate container would fix this though.

@adiq
Copy link

adiq commented May 1, 2018

@michalkleiner I guess the editor does not matter.

I’ll try @fdanielsen workaround when my fs-events dies (hope it won't happen) and let you know the results.

@fdanielsen
Copy link

@michalkleiner I'm using vim. I read in Webpack's documentation that the backupcopy setting might affect its ability to pick up changes, and tried setting it to backupcopy=yes as suggested but without anything happening. I also of course use the exact same editor to edit my Python files.

@anveo
Copy link

anveo commented May 2, 2018

We are experiencing the same issue as @fdanielsen. Most projects appear to propagate filesystem changes fine, but a large ember project (~100k files in node_modules) consistently fails to pick up file changes. Recently a few developers have also reported issues with a smaller create-react-app based project. The ember app is not running along side in that case. Edge build 18.05.0-ce-rc1-mac63 doesn't appear to have fixed the problem.

@adiq
Copy link

adiq commented May 10, 2018

@djs55 I am using edge version and hasn't experienced issues yet with docker-sync. I'll try to update again on this issue after a longer period of time. I do not use dummy-container workaround.

@EugenMayer
Copy link
Author

@djs55 we have quiet some testing on docker-sync with edge63 right now. Results are basically

a) its better then anything after mac42 stable - but its worse then mac42 (significantly)
b) qcow based mac63 edge is useless - entirely broken when looking on FS events
c) mac63edge + raw is kind of something, but i would by no means advice to upgrade from 42 to edge63 - its really still a lot worse

You can see the test results and scenarios here EugenMayer/docker-sync#517 (comment) - several people have been testing it.

Hope that helps to at least mark the progress. I am really looking forward to see a progress here :)

@jak
Copy link

jak commented May 10, 2018

mac63+raw without docker-sync (using :cached etc) has been "good enough" for developers in my team to use it continuously, so I'd encourage anyone reading to try it and see if it's stable enough for your own use.

I can confirm that our experience echoes that mac63+qcow are unusable for fsevents.

@Kocicak
Copy link

Kocicak commented May 15, 2018

Have you tried NFS share? It is similar setup like docker-sync (using an extra volume), but synchronization is done by NFS in macOS. This link helps setting it up: https://medium.com/@sean.handley/how-to-set-up-docker-for-mac-with-native-nfs-145151458adc

Our benchmark shows significant performance boost comparing to :cached (nfs 40ms vs cached 150ms response time), but it is done on very small php project. Haven't tried it on larger-scale projects.

@djs55
Copy link
Contributor

djs55 commented Aug 14, 2018

@michalkleiner For bug fixes/improvements we try to update the release notes (e.g. https://docs.docker.com/docker-for-mac/edge-release-notes/) with links to the issues. However I think this isn't 100% reliable at the moment -- we need to work on our internal processes to make this a bit better I think.

Having said that, when we make a major bump to stable we usually base it on the last released edge, so all the fixes already in edge should make it across.

@nazar
Copy link

nazar commented Aug 17, 2018

@djs55 - I've updated to Version 18.06.0-ce-mac70 (26399) earlier this week. I'm on Sierra 10.12.6, using qcow2 file and I've had zero issues with synching via docker-sync.

Thank you for the fix in stable - it's very much appreciated 👍

@docker-robott
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale comment.
Stale issues will be closed after an additional 30d of inactivity.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows.
/lifecycle stale

@EugenMayer
Copy link
Author

please reopen this issue, it has been by now means fixed or done, thanks

@michalkleiner
Copy link

/remove-lifecycle stale

@michalkleiner
Copy link

/lifecycle frozen

@EugenMayer
Copy link
Author

Still closed / frozen?

@esetnik
Copy link
Contributor

esetnik commented Feb 11, 2019

I'm still experiencing this issue as well. It should be reopened.

@nazar
Copy link

nazar commented Feb 11, 2019

Same here. File propagation works well until a large number of files are updated on the HOST - i.e. I have to restart docker on High Sierra anytime I run yarn install in the project root.

@petkoneo
Copy link

We encountered this Issue and we came to a conclusion, that it is the issue with the docker dmg file and install process on docker website. If you use mac install the docker with: brew cask install docker. It will work afterwards. (For us it did.)

@michalkleiner
Copy link

@petkoneo are you suggesting that via brew the installation is somehow different to the dmg installation and one doesn't have file events propagation issues compared to the other? Can you reliably replicate that?

@petkoneo
Copy link

Yes but it is a suggestion. For now, we could resolve this problem on 2 computers. As far as I know the installation should be the same, but it still worked for us. (We were surprised as well) maybe it just needs a proper re-install.

@marbon87
Copy link

Reinstalling docker user brew cask didn't make a difference for me. Same problem :(

@daveisfera
Copy link

C18223FD-44DC-4ACD-B197-B660FC33917F/20191128104037

@daveisfera
Copy link

C18223FD-44DC-4ACD-B197-B660FC33917F/20200430200341

@michalkleiner
Copy link

@daveisfera what are these hashes for?

@daveisfera
Copy link

They're the id from when I uploaded the diagnostics when this error is happening

@daveisfera
Copy link

C18223FD-44DC-4ACD-B197-B660FC33917F/20200616212522

@docker-robott
Copy link
Collaborator

Closed issues are locked after 30 days of inactivity.
This helps our team focus on active issues.

If you have found a problem that seems similar to this, please open a new issue.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows.
/lifecycle locked

@docker docker locked and limited conversation to collaborators Jul 16, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests