Skip to content

Latest commit

 

History

History
146 lines (109 loc) · 6.32 KB

large_deployments.md

File metadata and controls

146 lines (109 loc) · 6.32 KB

Tips for Large Deployments with CF Networking and Silk Release

Some users have larger deployments than we regularly test with. We have heard of large deployments with 500-1000 diego cells. These deployments have specific considerations that smaller deployments don't need to worry about.

Please submit a PR or create an issue if you have come across other large deployment considerations.

Problem 0: Default overlay IP CIDR block too small when there are 250+ diego cells

Symptoms

The silk daemon on some diego cells fails because it cannot get a lease.

Solution

Increase the size of the silk-controller.network CIDR in the silk controller spec.

Problem 1: Silk Daemon uses too much CPU

Symptoms

The silk daemon begins using too much CPU on the cells. This causes the app health checks to fail, which causes the apps to evacuate the cell.

Reason

The silk daemon is deployed on every cell. It is in charge of getting the IP leases for every other cell from the silk controller. The silk daemon calls out to the silk controller every 5 seconds (by default) to get updated lease information. Every time it gets new information the silk daemon does some linux system calls to set up the networking. This can take a long time (relatively) and get expensive when there are a lot of cells with new leases. This causes the silk daemons to use a lot of CPU.

Solution

Change the property lease_poll_interval_seconds on the silk-daemon job to be greater than 5 seconds. This will cause the silk-daemon to poll the silk-controller less frequently and thus make linux system calls less frequently. However, increasing this property means that when a cell gets a new lease (this happens when a cell is rolled, recreated, or for whatever reason it doesn't renew it's lease properly) it will take longer for the other cells to know how to route container-to-container traffic to it. To start with, we suggest setting this property to 300 seconds (5 minutes). Then you can tweak accordingly.

Problem 2: ARP Cache on diego-cell not large enough

Github issue

Symptoms

Silk daemon fails to converge leases. Errors in the silk-daemon logs might look like this:

{
   "timestamp": "TIME",
   "source": "cfnetworking.silk-daemon",
   "message": "cfnetworking.silk-daemon.poll-cycle",
   "log_level": 2,
   "data": {
      "error":"converge leases: del neigh with ip/hwaddr 10.255.21.2 : no such file or directory"
   }
}

Also kernel logs might look like this:

neighbour: arp_cache: neighbor table overflow

Reason

ARP cache on the diego cell is not large enough to handle the number of entries the silk-daemon is trying to write.

Solution

Increase the ARP cache size on the diego cells.

  1. Look at the current size of your ARP cache

    • ssh onto a diego-cell and become root
    • inspect following kernel variables
    sysctl net.ipv4.neigh.default.gc_thresh1
    sysctl net.ipv4.neigh.default.gc_thresh2
    sysctl net.ipv4.neigh.default.gc_thresh3
  2. Manually increase ARP cache size on the cell. This is good for fixing the issue in the moment, but isn't a good long term soluation because the values will be reset when the cell is recreated.

    • set new, larger values for the kernel variables. These sizes were used successfully for a deployment of ~800 cells.
      sudo sysctl -w net.ipv4.neigh.default.gc_thresh3=8192;
      sudo sysctl -w net.ipv4.neigh.default.gc_thresh2=4096;
      sudo sysctl -w net.ipv4.neigh.default.gc_thresh1=2048;
  3. For a more permanent solution, set these variables by adding the os-conf-release sysctl job to the deigo-cell instance group. A conf file will be autogenerated into /etc/stsctl.d/71-bosh-os-conf-sysctl.conf.

    • the manifest changes will look similar to this:
      instance_groups:
      - name: diego-cell
        jobs:
        - name: sysctl
          properties:
             sysctl:
             - net.ipv4.neigh.default.gc_thresh3=8192
             - net.ipv4.neigh.default.gc_thresh2=4096
             - net.ipv4.neigh.default.gc_thresh1=2048
          release: os-conf
      
      ...
      
      releases:
      - name: "os-conf"
        version: "20.0.0"
        url: "https://bosh.io/d/github.com/cloudfoundry/os-conf-release?v=20.0.0"
        sha1: "a60187f038d45e2886db9df82b72a9ab5fdcc49d"

Problem 3: Reaching the Upper Limit of Network Policies

To our knowledge no one has actually run into this problem, even in the largest of deployments. However our team is asked about this, so it seems important to cover it.

Summary

The quick answer is that you are limited to 65,635 apps used in network policies. This results in at least 32,767 network policies.

Reason

Container networking policies are implemented using linux marks. Each source and destination app in a networking policy is assigned a mark at the policy creation time. If the source or destination app already has a mark assigned to it from a different policy, then the app uses that mark and does not get a new one. The overlay network for container networking uses VXLAN. VXLAN limits the marks to 16-bits. With 16 bits there are 2^16 (or 65,536) distinct values for marks. The first mark is saved and not given to apps, so that results in 65,535 marks available for apps.

Scenario 1 - policies with no overlapping apps

Let's imagine that there are 65,535 different apps. A user could create 32,767 network policies from appA --> appB, where appA and appB are only ever used in ONE network policy. Each of the 32,767 policies includes two apps (the source and the destination) and each of those apps needs a mark. This would result in 65,634 marks. This would reach the upper limits of network policies.

Scenario 2 - policies with overlapping apps

Let's imagine that there are 5 apps. Let's say a user wants all 5 apps to be able to talk to everyother app. This would result in 25 network policies. However, this would only use up 5 marks (one per app). There are still 65,630 marks available for other apps. This scenario shows how the more "overlapping" the policies are, the more policies you can have.