Skip to content

Compute node configuration

Antonio López Gracia edited this page Aug 4, 2015 · 53 revisions

#Introduction#

This article contains the general guidelines to configure a compute node for NFV based on a Linux system (e.g. RHEL7.1, RHEL7.0, CentOS 7.1, Ubuntu Server 14.04), 64 bits OS with KVM, qemu and libvirt.

This article is general for all Linux systems, and try to gather all the configuration steps. These steps have not been thoroughly tested in all Linux distros and there are no guarantees that the steps below will be 100% accurate.

For a detailed step-by-step installation procedure for a specific distro, follow the instructions in these links:

Note: Openmano, and particularly openvimd module which interacts with NFV Infrastructure, has been tested with servers based on Xeon E5-based Intel processors with Ivy Bridge architecture, and with Intel X520 NICs based on Intel 82599 controller. No tests have been carried out with Intel Core i3, i5 and i7 families, so there are no guarantees that the integration will be seamless.

The configuration that must be applied to the compute node is the following:

  • Install virtualization packages (kvm, qemu, libvirt, etc.)
  • Use a kernel with support of huge page TLB cache in IOMMU
  • Enable IOMMU
  • Enable 1G hugepages, and reserve enough hugepages for running the VNFs
  • Isolate CPUs so that the host OS is restricted to run on the first core of each NUMA node.
  • Enable SR-IOV
  • Enable all processor virtualization features in the BIOS;
  • Enable hyperthreading in the BIOS (optional)
  • Deactivate KSM
  • Pre-provision Linux bridges
  • Additional configuration to allow access from openvim, including the configuration to access the image repository and the creation of appropriate folders for image on-boarding

A full description of this configuration is detailed below.

#BIOS setup#

  • Ensure that virtualization options are active. If they are active, the following command should give an output:

    egrep "(vmx|svm)" /proc/cpuinfo

  • It is also recommended to activate hyperthreading. If it is active, the following command should give an output:

    egrep ht /proc/cpuinfo

#Installation of virtualization packages#

  • Install the following packages in your host OS: qemu-kvm libvirt-bin bridge-utils virt-viewer virt-manager

#IOMMU TLB cache support#

  • Use a kernel with support huge page TLB cache in IOMMU. For example RHEL7.1, Ubuntu 14.04, or a vanilla kernel 3.14 or higher. In case you are using a kernel without this support, you should update your kernel. For instance, you can use the following kernel for RHEL7.0 (not needed for RHEL7.1):

      wget http://people.redhat.com/~mtosatti/qemu-kvm-take5/kernel-3.10.0-123.el7gig2.x86_64.rpm
      rpm -Uvh kernel-3.10.0-123.el7gig2.x86_64.rpm --oldpackage
    

#Enabling IOMMU#

  • Enable IOMMU, by adding the following to the grub command line

      intel_iommu=on 
    

#Enabling 1G hugepages#

  • Enable 1G hugepages, by adding the following to the grub command line default_hugepagesz=1G hugepagesz=1G
  • There are several options to indicate the memory to reserve
    • At boot option, adding hugepages=24 at grub, (reserves 24GB)

    • With a hugetlb-gigantic-pages.service for moderm kernels. You need to create a configuration file /usr/lib/systemd/system/hugetlb-gigantic-pages.service with this content

        [Unit]
        Description=HugeTLB Gigantic Pages Reservation
        DefaultDependencies=no
        Before=dev-hugepages.mount
        ConditionPathExists=/sys/devices/system/node
        ConditionKernelCommandLine=hugepagesz=1G
        
        [Service]
        Type=oneshot
        RemainAfterExit=yes
        ExecStart=/usr/lib/systemd/hugetlb-reserve-pages
        
        [Install]
        WantedBy=sysinit.target
      

      and set the huge pages at each numa node. For instance, in a system with 2 NUMA nodes, in case we want to reserve 4GB for the host OS (2GB on each NUMA node), and all remaining memory for hugepages:

      totalmem=`dmidecode --type 17|grep Size |grep MB |gawk '{suma+=$2} END {print suma/1024}'`
      hugepages=$(($totalmem-4))
      echo $((hugepages/2)) > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages
      echo $((hugepages/2)) > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
      

      Copy the last two lines into /usr/lib/systemd/hugetlb-reserve-pages file for automatic execution after boot

#CPU isolation#

  • Isolate CPUs so that the host OS is restricted to run on the first core of each NUMA node, by adding the isolcpus field to the grub command line. For instance:

      isolcpus=1-9,11-19,21-29,31-39
    

    The exact CPU numbers might defer depending on the CPU numbers presented by the host OS. In the previous example, CPUs 0, 10, 20 and 30 are excluded because CPU 0 and its sibling 20 correspond to the first core of NUMA node 0, and CPU 10 and its sibling 30 correspond to the first core of NUMA node 1.

    Running this awk script suggest the value to use in your compute node:

      gawk 'BEGIN{pre=-2;} ($1=="processor"){pro=$3;} ($1=="core" && $4!=0){ if (pre+1==pro){endrange="-" pro} else{cpus=cpus endrange sep pro; sep=","; endrange="";}; pre=pro;} END{printf("isolcpus=%s\n",cpus endrange);}' /proc/cpuinfo
    

#Deactivating KSM# KSM enables the kernel to examine two or more already running programs and compare their memory. If any memory regions or pages are identical, KSM reduces multiple identical memory pages to a single page. This page is then marked copy on write. If the contents of the page is modified by a guest virtual machine, a new page is created for that guest virtual machine.

KSM has a performance overhead which may be too large for certain environments or host physical machine systems.

KSM can be deactivated by stopping the ksmtuned and the ksm service. Stopping the services deactivates KSM but does not persist after restarting.

# service ksmtuned stop
Stopping ksmtuned:                                         [  OK  ]
# service ksm stop
Stopping ksm:                                              [  OK  ]

Persistently deactivate KSM with the chkconfig command. To turn off the services, run the following commands:

# chkconfig ksm off
# chkconfig ksmtuned off

Check RHEL 7 - THE KSM TUNING SERVICE for more information.

#Enabling SR-IOV# We assume that you are using Intel X520 NICs (based on Intel 82599 controller). In case you are using other NICs, the configuration might be different.

  • Configure 8 virtual functions on each 10G network interface. A larger number can be configured if desired. (This paragraph is provissional, because not allways works for all nic cards!!!)

      for iface in `ifconfig -a | grep ": " | cut -f 1 -d":" | grep -v -e "_" -e "\." -e "lo" -e "virbr" -e "tap"`
      do
          driver=`ethtool -i $iface| awk '($0~"driver"){print $2}'`
          if [ "$driver" == "i40e" -o "$driver" == "ixgbe" ]
              #Create 8 SR-IOV per PF
              echo 0 >  /sys/bus/pci/devices/`ethtool -i $iface | awk '($0~"bus-info"){print $2}'`/sriov_numvfs
              echo 8 >  /sys/bus/pci/devices/`ethtool -i $iface | awk '($0~"bus-info"){print $2}'`/sriov_numvfs
          fi
      done
    
  • For Niantic X520 NICs the parameter max_vfs must be set to workaround a bug with the ixgbe driver managing VFs by the sysfs interface:

      echo "options ixgbe max_vfs=8" >> /etc/modprobe.d/ixgbe.conf
    
  • Blacklist the ixgbevf module, by adding the following to the grub command line. This must be done after adding this host to openvim, but not before. The reason for blacklisting this driver is because it causes that the VLAN tag of broadcast packets is not properly removed when received by an SRIOV port.

      modprobe.blacklist=ixgbevf
    

#Pre-provision of Linux bridges# Openmano relies on Linux bridges to interconnect VMs when there are no high performance requirements for I/O. This is the case of control plane VNF interfaces that are expected to carry a small amount of traffic.

A set of Linux bridges must be pre-provisioned on every host. Every Linux bridge must be attached to a physical host interface with a specific VLAN. In addition, a external switch must be used to interconnect those physical host interfaces. Bear in mind that the host interfaces used for data plane VM interfaces will be different from the host interfaces used for control plane VM interfaces.

For example, in RHEL7.0, to create a bridge associated to the physical "em1" interface, it is needed to add two files per bridge at /etc/sysconfig/network-scripts folder:

  • File with name ifcfg-virbrManX with the content:

       DEVICE=virbrManX
       TYPE=Bridge
       ONBOOT=yes
       DELAY=0
       NM_CONTROLLED=no
       USERCTL=no
    
  • File with name em1.200X #uses vlan tag 200X

       DEVICE=em1.200X
       ONBOOT=yes
       NM_CONTROLLED=no
       USERCTL=no
       VLAN=yes
       BOOTPROTO=none
       BRIDGE=virbrManX
    

    The name of the bridge and the VLAN tag can be different. In case you use a different name for the bridge, you should take it into account in 'openvimd.cfg'.

#Additional configuration to allow access from openvim#

  • Uncomment the following lines of /etc/libvirt/libvirtd.conf to allow external connection to libvirtd:

      unix_sock_group = "libvirt"
      unix_sock_rw_perms = "0770"
      unix_sock_dir = "/var/run/libvirt"
      auth_unix_rw = "none"
    
  • Create and configure a user for openvim access:

    • A new user must be created to access the compute node from openvim. The user must belong to group libvirt

        #creates a new user 
        useradd -m -G libvirt <user>
        #or modified an existing user
        usermod -a -G libvirt <user>
      
    • Allow to get root privileges without password, for example all members of group libvirt:

        sudo visudo # add the line:   %libvirt ALL=(ALL) NOPASSWD: ALL
      
  • Copy the ssh key of openvim into compute node. From the machine where OPENVIM is running (not from the compute node), run:

      ssh-keygen  #needed for generate ssh keys if not done before
      ssh-copy-id <user>@<compute host>
    

    After that, ensure that you can access directly without password prompt from openvim to compute host:

    ssh <user>@<compute host>
    
  • Configure access to image repository

    The way that openvim deals with images is a bit different from other CMS. Instead of copying the images when doing the on-boarding, openvim assumes that images are locally accessible on each compute node on a local folder, identical for all compute nodes. This does not mean that the images are forced to be copied on each compute node disk.

    Typically this can be done by storing all images in a remote shared location accessible by all compute nodes through a NAS file system and mounting locally the shared folder via NFS on a specific local folder with identical on each compute node.

    VNF descriptors contain image paths pointing to a location on that folder. When doing the on-boarding, the image will be copied from the image path (accessible through NFS) to the on-boarding folder, whose configuration is described next.

  • Create a local folder for image on-boarding and grant access from openvim:

    A local folder for image on-boarding must be created on each compute note (in default configuration, we assume that the folder is /opt/VNF/images). This folder must be created in a disk with enough space to store the images of the active VMs. Let's assume that "/home" contains more disk space than "/", then the folder should be created at "/home" although a soft link can be created anywhere else. As an example, this is what our script for automatic installation in RHEL7.0 does:

    mkdir -p /home/<user>/VNF_images
    rm -f /opt/VNF/images
    mkdir -p /opt/VNF/
    ln -s /home/<user>/VNF_images /opt/VNF/images
    chown -R <user> /opt/VNF
    

    Besides, access to that folder must be granted to libvirt group in a SElinux system.

    # SElinux management
    semanage fcontext -a -t virt_image_t "/home/<user>/VNF_images(/.*)?"
    cat /etc/selinux/targeted/contexts/files/file_contexts.local |grep virt_image
    restorecon -R -v /home/<user>/VNF_images
    

#Compute node configuration in special cases#

##Datacenter with different types of compute nodes##

In a datacenter with different types of compute nodes, it might happen that compute nodes use different interface naming schemes. In that case, you can take the most used interface naming scheme as the default one, and make an additional configuration in the compute nodes that do not follow the default naming scheme.

In order to do that, you should create the file hostinfo.yaml file inside the image local folder (e.g. typically /opt/VNF/images). It contains entries with:

openvim-expected-name: local-iface-name

For example, if openvim contains a network using macvtap to the physical interface em1 (macvtap:em1) but in this compute node the interface is called eth1, creates a local-image-folder/hostinfo.yaml file with this content:

em1: eth1

##Configure compute node in 'developer' mode##

In order to test a VNF, it is not really required to have a full NFV environment with 10G data plane interfaces and Openflow switches. If the VNF is able to run with virtio interfaces, you can configure a compute node in a simpler way and use the 'developer mode' in openvim. In that mode, during the instantiation phase, VMs are deployed without hugepages and with all data plane interfaces changed to virtio interfaces. It must be noticed that openmano descriptors do not change and keep identical, but openvim performs an intelligent translation during the instantiation phase.

The configuration of a compute node to be used in 'developer mode' removes the configuration that is not needed for testing purposes, that is:

  • IOMMU configuration is not required since no passthrough or SR-IOV interfaces will be used
  • Huge pages configuration is unnecessary. All memory will be assigned in 4KB pages, allowing oversubscription (as in traditional clouds).
  • No configuration of data plane interfaces (e.g. SR-IOV) is required.

A VNF developer will typically use the developer mode in order to test its VNF in its own computer. Although part of the configuration is not required, the rest of the compute node configuration is still necessary. In order to prepare your own computer or a separate one as a compute node for developing purposes, you can use the script found in here

In order to execute the script, just run this command:

sudo ./configure-compute-node-develop.sh <user> <iface>