-
Notifications
You must be signed in to change notification settings - Fork 712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed issues under Virtualbox #769
Comments
Thanks for the bug report. Would you mind giving us a bit more information?
With this info I'll try and get reproduce it here. In 0.11 we've tried to improve the CPU usage of Scope, and we've also found various performance issues in Docker recently (moby/moby#17720), that could be affecting this. |
Sure, thanks! Guest OS is an Ubuntu 15.04, Vagrantfile.txt Scope is version 0.11.0. weavescopelog.txt - I booted up the VM, started scope with sudo, let it fire up it's web interface, confirmed the VM is unresponsive, then proceeded with halting the weavescope container from the GUI. I see two instances of Issue was present in 0.10.0 as well. |
Thats @neo21670, I'll look at this immediately |
Possible a dupe of #715? |
From the logs its looks like the CPU isn't busy, and I cannot reproduce this on my vagrant VM with Docker 1.9.1 (but Ubuntu 14.04). I'm going to upgrade to later ubuntu and try again. |
Seems to be a good idea. I'm setting up a trusty VM right now. |
Yeah 14.04 LTS is also affected by issue. |
Another report of something similar #699. My slow internet is still downloading 15.04. |
Is there anything in |
I've run an almost identical ubuntu 15.04 virtual box, with docker 1.9.1 and a bunch of containers, and I can't get this to reproduce. I believe there is a problem though (3 reports of this), so would you mind trying a few things for me:
And let me know what affect they have. Also, are you running the weave network too? Just checking. Final question: any sign of OOM killer in dmesg? |
Not sure its relevant, but I found my virtual box VM with 2 vcpus was significantly slower than with 1. May be anecdotal. |
On
I haven't seen anything related to OOM or memory at all.
And finally: no, haven't introduced Weave Net to my stack yet. Also, here's a
|
Also, I tried to fire up the same VM with 4 cores, no result. Same VM, double the RAM, 1 core: same issue. |
I tried the same procedure on a VMWare based VM (Ubuntu 14.04 LTS, 1.5GB, 2 cores) - no issues experienced. I guess I'll try to set up a Virtualbox 5 based Vagrant env to see if the issue persists. |
Virtualbox 4.3.34 - same issue. It might be interesting, that I chose to force shut down (Close -> Power Off) the VM through the VBox GUI. (So far I've been using |
Thanks for all the info! Unfortunately this doesn't point to anything obvious. I'm running virtual box 4.3 myself, interesting that VMWare doesn't experience the issue. When the issue occurs, what does I'll speak with my colleages and see if they have any suggestions. |
I can reproduce this with Scope
The VM has 4GB of RAM reserved and can use 2 CPUs, i.e:
The problem surfaces when spawning scope while running around 30 containers. If I spawn Scope while running less containers (20) the system freezes for a while (~2 minutes) but then it recovers. I am running Docker 1.8.3. I will gather more information next week. |
This has high chances of being a duplicate of #812 |
@neo21670 If you are still interested in this, could you please try to reproduce with |
Sure, I'll check that in a day, thanks for the heads up. On this being the duplicate of #812, I doubt if that could be it. I'd like to emphasize that neither the host, nor the guest running docker experiences high CPU usage. |
@neo21670 Did you get around to run the test? |
I can now reproduce the problem systematically while running the Kubernetes cluster I mentioned above. The memory/CPU consumption of the probe is only at 1.5%/20% . So, the resource usage shouldn't be triggering the problem as @neo21670 mentioned. As surprising as it may be, doing
before launching Scope makes the problem go away. As soon as I restore the original file and restart Scope the problem comes back. My VM is using the VirtualBox's DNS proxy in NAT mode using the host resolver, which has caused us application-level problems in the past. In this case, it renders the NAT network interface of the VM unusable. More specifically, it won't allow making any external connections from the VM and it will freeze ssh sessions started with However, all the other interfaces are unaffected. In particular, ssh sessions and connections to the Scope UI made through a host-only network interface won't be affected, which why I believe @neo21670 observes that
I don't know enough about VirtualBox to see how its NAT DNS proxy can cause the NAT interface to fail and there don't seem to be any upstream tickets covering the issue, but I am pretty sure this is the cause. @neo21670 Can you confirm? I also don't know why Scope triggers the problem when the number of containers is >20, my guess it has something to do with the reverse DNS-lookups from Scope. Maybe we are hammering the Virtualbox DNS proxy. I will investigate, but in any case I don't think it should render the NAT interface unusable. |
It seems that Virtualbox's NAT proxy resolver in NAT mode chokes on IPv6 DNS lookups also causing the NAT interface to be unresponsive. I have managed to reproduce the problem by simply doing:
Without launching scope at all. Every time an IPv6 query is made to VirtualBox's proxy resolver, the whole NAT interface is unresponsive for a while. We can see the impact of it by pinging
|
Any way we can work around this? Scope does a bunch of reverse lookups, On Monday, 25 January 2016, Alfonso Acosta [email protected] wrote:
|
@tomwilkie I am waiting for @neo21670 to confirm, but in my setup it's not the reverse DNS lookups which causes it but simply the IPv6 lookups on the .local domain (see above).
I believe that disabling AAAA record lookups for scope targets would circumvent this. But I am not sure we should, since Virtualbox is clearly at fault. I just confirmed that It can be easily reproduced by doing ...
... with this Vagrantfile: Vagrant.configure("2") do |config|
config.vm.box = "ubuntu/ubuntu-15.04-amd64"
config.vm.box_url = "https://cloud-images.ubuntu.com/vagrant/vivid/current/vivid-server-cloudimg-amd64-vagrant-disk1.box"
config.vm.provider :virtualbox do |vb|
vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"]
end
end The problem seems to be caused by OSX's (my host) DNS resolver timing out when looking up AAAA records in the .local domain.
This shouldn't cause the NAT network to collapse though. I will just stop digging and report it upstream. |
Upstream bug report: https://www.virtualbox.org/ticket/15074 |
I will close in a few days if there's no more activity. @neo21670 We would appreciate if you could confirm our diagnosis is correct. |
Hi, and sorry for the delay, I've been quite busy. First, I pulled up a fresh install from everything (Virtualbox, Vagrant, Docker, etc.) and tried my usual workflow, which failed as my original report said. Next, I commented out the Thanks for your time on investigating this issue! |
Thank you for confirming @neo21670 What vagrant box are you using? And, are you running OSX? It would help us to understand how common this problem is. I am closing this now since it's a Virtual Box bug, but we may add a workaround if the upstream bug is not fixed soon and a lot of people are affected. In the mean time, feel free to comment on https://www.virtualbox.org/ticket/15074 |
I'm using official Ubuntu Trusty and Vivid boxes, usually the latest ones. Same goes for my OS: OS X, latest releases. When it comes to Virtualbox, I'm usually a bit more conservative with upgrading, however always review the changelog, and upgrade when I'm affected. |
The upstream ticket doesn't show any progress. However, I don't think we do any |
It would still be possible for a user to trigger this by providing a .local address as a Scope peer; but yes, by default our scope.weave.local lookups will only look for A records: https://github.com/weaveworks/scope/blob/master/probe/appclient/resolver.go#L65 |
Great, that's what I meant. |
This was marked as fixed upstream https://www.virtualbox.org/ticket/15074 |
I'm trying to set up Scope under a Virtualbox 4 based environment. The box is set up by a basic vagrantfile, 2 interfaces, 1.5 gigs of RAM, 2 cores with no throttling. Actual host is a recent Core i7, virtualization is supported and on, etc. (Basically multiple VMs can run without a hitch.)
After setting up Scope and launching it with
sudo scope launch
, the whole VM halts to a crawl. Scope web UI runs fine reacts in an instant, attaching to a containers shell is fluid and instantaneous, anything else however is virtually paused. SSH shells do not time out, but they are unable to respond. Neither the host nor the guest shows excess CPU usage based on top output. Requests to a basic web service running in a test container loads after several seconds of delay.Upon stopping the scope container from the GUI, the system keeps on being unusable for about 30-35 seconds, then everything returns to normal. CPU usage during the exit shows no change.
RAM usage of the whole system+scope+test container setup stops around 300 megs of active RAM. Logs show no relevant information - actually nothing at all, besides the normal output of the web service.
The text was updated successfully, but these errors were encountered: