-
Notifications
You must be signed in to change notification settings - Fork 14
Operational troubleshooting
Home > Operational troubleshooting
Frequent operational problems and errors.
Problem usually is that the MySQL engine is trying to make a reverse lookup for each and every external connection (especially for external MySQL servers), when for some reason DNSs are down or not able to do the reverse lookups.
To solve this issue, first check for appropiate DNSes under /etc/resolv.conf
; primary and secondary.
In addition, MySQL engine can be configured to avoid reverse lookups (it can increase a little bit performance on external MySQL queries). Adding the following at the end of /etc/mysql/my.cfg
:
#Skip name resolve
skip-name-resolve
Remember that this change will only take effect after MySQL engine restarts.
(v0.6) I get this error when creating a Spirent VM: 'Action create on VM *** failed: : HD type not yet supported by XEN agent'
Spirent VM template is not distributed yet in this version. Once that is done, this error won't appear anymore.
Stalled VMs can be identified by an endless "loading" animation in the status cell.
Step 1
In case the VM was stalled after being created, open a terminal in the server where the VM was created and search for the folder where the physical files for the VM are located:
cd /opt/ofelia/oxa
find cache/vms/ -name "<vm_name>.conf" && find remote/vms/ -name "<vm_name>.conf"
-
If there are different machines with the same name in different folders:
- enter each VM configuration file and check the uuid (<vm_uuid>) of the VM:
vim <find_path_N>/<vm_name>.conf
- then search for the VM uuid in the VT AM that corresponds to <vm_uuid>. Open a terminal in the OCF machine and type the following:
cd /opt/ofelia/vt_manager/src/python/vt_manager/ python manage.py shell >>> from models import VirtualMachine >>> VirtualMachine.objects.get(uuid="<vm_uuid>")
-
If there is only one resulting folder after the search -- or you know the VM uuid and have checked that it corresponds to the data inside the <vm_name>.conf file, remove the 3 physical files for the VM:
cd <find_path> rm <vm_name>.conf <vm_name>.img <vm_name>_swap.img
Step 2
If you find VMs that are stalled at VT AM, get the ID for the VM and delete it:
-
Go to the server page and scroll down to the VM list. Right click on the VM name and select "Inspect Element"
-
A frame with HTML code will appear. Write down the number at the code, which would be something similar to: id="tr_vm1299"
-
Open a terminal in the OCF machine and type the following:
cd /opt/ofelia/vt_manager/src/python/vt_manager/tests/ python deleteVM.py <vm_id>
Step 3
If you find VMs that are stalled at Expedient, get the ID for the Expedient cached VM and delete it:
-
Go to slice detail page and scroll down to the VM list. Right click on the VM name and select "Inspect Element"
-
A frame with HTML code will appear. Write down the number at the code, which would be something similar to: id="tr_vm1299"
-
Open a terminal in the OCF machine and type the following:
cd /opt/ofelia/expedient/src/python/vt_plugin/tests/ # OCF < 0.5 cd /opt/ofelia/expedient/src/python/plugins/vt_plugin/tests/ # OCF = 0.5 python deleteVM.py <vm_id>
Step 4
After deleting the VM it is time to free the associated addresses to its interface(s).
-
Look for the ranges sections within the VT AM GUI and write down each IP and MAC address related to the VM you just deleted
-
Open a terminal in the OCF machine and type the following:
cd /opt/ofelia/vt_manager/src/python/vt_manager/ python manage.py shell >>> from models import Ip4Slot >>> Ip4Slot.objects.get(ip="<vm_ip>").delete() >>> from models import MacSlot >>> MacSlot.objects.get(mac="<vm_mac_i>").delete() # Repeat N times (N = #MACs(VM))
If you got an error similar to this:
DatabaseError: (1146, "Table 'expedient.vt_plugin_xmlrpcserverproxy' doesn't exist")
it may be that the models from the plugins are not being properly synchronized when the manage.py syncdb
command is used during the installation.
To solve it type the following:
# uncomment lines no. 182, 188, 189, 190, that is:
# ('openflow.plugin', 'vt_plugin', 'vt_plugin.communication', 'openflow.dummyom')
vim /opt/ofelia/expedient/src/python/expedient/clearinghouse/defaultsettings/django.py
service apache2 restart
python manage.py syncdb
(v0.3) I get this error: 'AttributeError: 'module' object has no attribute 'XMLField' before getting a fatal error.'
The XMLField class in Django has been deprecated as of version 1.3. Please install Django 1.2.7 as follows:
gpg --keyserver pgp.mit.edu --recv-key 0x8C8B2AE1
gpg --verify Django-1.2.7.checksum.txt
wget http://www.djangoproject.com/m/releases/1.2/Django-1.2.7.tar.gz
tar xzvf Django-1.2.7.tar.gz
cd Django-1.2.7/
python setup.py install
More info at https://docs.djangoproject.com/en/dev/topics/install/
If it's the first time you install OCF this means that you probably do not have the pyPElib library. To overcome this please execute the following code in a shell:
/usr/bin/apt-get -y install python-pyparsing
/usr/bin/wget http://pypelib.googlecode.com/files/pypelib_latest_all.deb
/usr/bin/dpkg -i pypelib_latest_all.deb
rm pypelib_latest_all.deb
I try to add an Openflow Aggregate Manager and get this error: 'user X is not a clearinghouse user'.
That means that user 'X' was not set in the clearinghouse. Please take a look at the Configuring connection with Expedient section at ofam-configuration.
To solve the not found dependency with the python-pyparsing
library, it should be installed before running OFVER script using apt-get install python-pyparsing
, also making sure that your default Python path points to python-2.6
.
(v0.3) I get this error: 'django.core.exceptions.ImproperlyConfigured: settings.DATABASES is improperly configured. Please supply the ENGINE value. Check settings documentation for more details.'
See that the configuration file at optin_manager/src/python/openflow/optin_manager/localsettings.py
is properly configured. If it already is, it may be that your Python version does not have the pypelib module. Please make a symbolic link to the pypelib library from your current version (e.g. 2.7) to the 2.6:
ln -s /usr/lib/python2.6/pypelib/ /usr/lib/python2.7/pypelib
No, they don't. Requests made through the Expedient plugin can be found in the menu Administrate Flowspace
->Add rule
of the Optin Manager Web UI.
This is due to an uncommented line in the file vt_manager/src/python/vt_manager/mySettings.py
.
Please make sure the following line is commented:
ES_DIR: [networking, policyEngine, users, ...] in SRC_DIR/python/vt_manager/views/templates/theme_name as needed.
When trying to access the GUI I find this in the Apache VM AM's log: 'ImportError: No module named pypelib.persistence.backends.django'
This happens because the pyPElib library is not installed for your default Python version. To correct this you may check your Python version with python -V
and then create a symbolic link from a subfolder here to the pyPElib library. For example, if you use Python 2.7 and pyPElib is installed on Python 2.6's folder:
ln -s /usr/lib/python2.6/pypelib/ /usr/lib/python2.7/pypelib
If your VMs are not being created and your VT AM log shows this:
XMLRPC Client error: can't connect to method send at https://***:9229/ [Errno 111] Connection refused
then make sure that the server in which you try to create your VM has its agent up and running:
ps aux | grep "OfeliaAgent" | grep -v "grep"
and if it is not, start it with
service oxad start
The communication between VM AM and Expedient (and also between VM AM and agent) is fully asynchronous. Make sure there are no firewall rules between these three components and that VTAM_IP
, VTAM_PORT
settings in mySettings.py
file are correctly set.
Have a look to the Manuals for more details on configuration.
##OXA (Ofelia XEN Agent) and XEN server
If users experience this error during VM creation:
Action create on VM test failed: : [Errno 39] Directory not empty: '/tmp/oxa/hdtest_3382/'
It is most likely due to a wrong configuration of the server, specially /etc/modules
file. Please, revise XEN installation manual and note that loop module must have max_loop=64
(default value) or higher.
root@node04:~# cat /etc/modules
# /etc/modules: kernel modules to load at boot time.
#
# This file contains the names of kernel modules that should be loaded
# at boot time, one per line. Lines beginning with "#" are ignored.
# Parameters can be specified after the module name.
loop max_loop=64
8021q
Remember that changes in /etc/modules
will not take effect until you reboot the system.
If Expedient shows something similar to:
Action create on VM <X> failed: : Could not clone image to working directory project:<Y>, slice:<Z>, name:<X>
if Agent error log shows something similar to:
Error: Device 51713 (vbd) could not be connected. Failed to find an unused loop device
this is because the total number of loop devices (/dev/loop*
files) is being used. More info here and here.
You may choose between:
- Stopping started, not needed VMs at Xen to free them
- Increasing the
max_loop
value (default=64
) to allow more VMs running at a time. Refer to this section.
If your OXA log (/opt/ofelia/oxa/log/error.log
) complains about a a host it cannot route to:
error: [Errno 113] No route to host
then check that the variables VTAM_IP
, VTAM_PORT
, XMLRPC_USER
, XMLRPC_PASS
at the file /opt/ofelia/oxa/repository/vt_manager/src/python/agent/mySettings.py
are correct. You may test that those are correct by pinging the VT AM from the OXA:
~# python
>>> import xmlrpclib
>>> server = xmlrpclib.Server("https://<XMLRPC_USER>:<XMLRPC_PASS>@<VTAM_IP>:<VTAM_PORT>/xmlrpc/plugin")
>>> server.ping("test")
'test'
XEN server networking: interconnection between two VMs in the same server through data-path is possible, without any OF rule.
Although not desired, this is known limitation of the current XEN server network configuration. This is due to the fact that XEN bridges(one per physical interface) are shared among the VMs and are normal Linux bridges (learning switches).
There are plans to deploy openvswitch to do some l2 filtering / enable OF in those bridges, as well as to prevent spoofing, but this is still under discussion inside of OFELIA.
If you detect an error similar to TypeError: shutdown() takes exactly 0 arguments (1 given)
, then you are probably using Python2.7. You may need to fix this known bug in the werkzeug library. For that, open the file /usr/lib/python2.7/SocketServer.py
and add the following under the shutdown_request
method of the TCPServer
class (line ~465):
try:
request.shutdown(socket.SHUT_WR)
except socket.error:
pass
except TypeError: # << add this
request.shutdown() # << add this
It is possible that the host machine processes big amounts of data bursts, which are sent in the context of the periodic monitoring data exchange between islands. As a direct effect, Apache2 processes produce high peaks of CPU and virtual memory consumption. Limiting the latter is the solution to this problem.
For that, go to the /etc/default/apache2
and add the following:
# Set maximum virtual memory to your preferred size (in bytes)
# We recommend around 50% of your host's memory (e.g. 1Gb)
ulimit -v 1048576
If this did not work, use xm destroy
and then xm create <vm_config_file>
on the affected machine.
- Overview
- Experimenting
-
Administering
- Installing
- Upgrading
-
Configuration
- Components
- Infrastructure
- Troubleshooting
- Theme manager
-
Contributing
- Developing
-
Reporting
- Issue tracker and Roadmap