Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploying bootstrap node failing with error "Jinja variable 'None' has no attribute 'get'" #2137

Closed
ShashiKanthGitHub opened this issue Dec 14, 2019 · 18 comments
Labels
kind:bug Something isn't working priority:low Low priority issues that can be safely postponed if needed

Comments

@ShashiKanthGitHub
Copy link

ShashiKanthGitHub commented Dec 14, 2019

Component:

Deploying bootstrap node (2.4.2 version)

What happened:

bootstrap.sh is failing with the step "Deploying bootstrap node (this may take a while)" and the error is

==
Command: crictl exec -i 3ff605264fb242cea6adf0e9dd9ad4df378fa8b41f836d578fd4359bd0eee696 salt-run --state-output=mixed state.orchestrate metalk8s.orchestrate.bootstrap saltenv=metalk8s-2.4.2-dev pillar={   'bootstrap_id': 'metalk8snode1.swinfra.net' }

Output:

<< BEGIN >>
[ERROR   ] {u'ret': {u'metalk8snode1.swinfra.net': [u"Rendering SLS 'metalk8s-2.4.2-dev:metalk8s.kubernetes.sa.advertised' failed: Jinja variable 'None' has no attribute 'get'", u'Rendering SLS \'metalk8s-2.4.2-dev:metalk8s.kubernetes.apiserver.cryptconfig\' failed: Jinja error: argument of type \'NoneType\' is not iterable\nTraceback (most recent call last):\n  File "/usr/lib/python2.7/site-packages/salt/utils/templates.py", line 393, in render_jinja_tmpl\n    output = template.render(**decoded_context)\n  File "/usr/lib/python2.7/site-packages/jinja2/environment.py", line 969, in render\n    return self.environment.handle_exception(exc_info, True)\n  File "/usr/lib/python2.7/site-packages/jinja2/environment.py", line 742, in handle_exception\n    reraise(exc_type, exc_value, tb)\n  File "<template>", line 14, in top-level template code\nTypeError: argument of type \'NoneType\' is not iterable\n\n; line 14\n\n---\n[...]\n  - .installed\n\n{% set encryption_source_path = \'/etc/metalk8s/crypt/apiserver.key\' %}\n{% set encryption_k8s_path = \'/etc/kubernetes/encryption.conf\' %}\n\n{% if \'apiserver_key\' not in pillar.metalk8s.private %}    <======================\n\n{% set encryption_key = salt[\'random.get_str\'](32) | base64_encode %}\n\nCreate encryption configuration from scratch:\n  file.managed:\n[...]\n---']}, u'out': u'highstate'}
metalk8snode1.swinfra.net_master:
  Name: saltutil.sync_all - Function: salt.runner - Result: Changed Started: - 13:50:09.204248 Duration: 4322.749 ms
  Name: saltutil.sync_roster - Function: salt.runner - Result: Changed Started: - 13:50:13.527237 Duration: 3798.591 ms
  Name: metalk8s_saltutil.sync_auth - Function: salt.runner - Result: Changed Started: - 13:50:17.326087 Duration: 3787.038 ms
  Name: saltutil.sync_all - Function: salt.function - Result: Changed Started: - 13:50:21.113388 Duration: 806.679 ms
  Name: Deploy CA role on bootstrap minion - Function: salt.state - Result: Changed Started: - 13:50:21.920804 Duration: 15822.66 ms
----------
          ID: Bring bootstrap minion to highstate
    Function: salt.state
      Result: False
     Comment: Run failed on minions: metalk8snode1.swinfra.net
     Started: 13:50:37.744644
    Duration: 12979.987 ms
     Changes:   
              metalk8snode1.swinfra.net:
                  Data failed to compile:
              ----------
                  Rendering SLS 'metalk8s-2.4.2-dev:metalk8s.kubernetes.sa.advertised' failed: Jinja variable 'None' has no attribute 'get'
              ----------
                  Rendering SLS 'metalk8s-2.4.2-dev:metalk8s.kubernetes.apiserver.cryptconfig' failed: Jinja error: argument of type 'NoneType' is not iterable
              Traceback (most recent call last):
                File "/usr/lib/python2.7/site-packages/salt/utils/templates.py", line 393, in render_jinja_tmpl
                  output = template.render(**decoded_context)
                File "/usr/lib/python2.7/site-packages/jinja2/environment.py", line 969, in render
                  return self.environment.handle_exception(exc_info, True)
                File "/usr/lib/python2.7/site-packages/jinja2/environment.py", line 742, in handle_exception
                  reraise(exc_type, exc_value, tb)
                File "<template>", line 14, in top-level template code
              TypeError: argument of type 'NoneType' is not iterable
              
              ; line 14
              
              ---
              [...]
                - .installed
              
              {% set encryption_source_path = '/etc/metalk8s/crypt/apiserver.key' %}
              {% set encryption_k8s_path = '/etc/kubernetes/encryption.conf' %}
              
              {% if 'apiserver_key' not in pillar.metalk8s.private %}    <======================
              
              {% set encryption_key = salt['random.get_str'](32) | base64_encode %}
              
              Create encryption configuration from scratch:
                file.managed:
              [...]
    ==

What was expected:

/srv/scality/metalk8s-2.4.2-dev/bootstrap.sh script should complete successfully.

Steps to reproduce

N/A.

Resolution proposal (optional):


Workaround:

  1. Install kubectl on the bootstrap node
yun install -y kubectl
  1. Add master role on bootstrap node (replace <node_name> with the bootstrap node name)
kubectl --kubeconfig=/etc/kubernetes/admin.conf label node <node_name> node-role.kubernetes.io/master=""
  1. Launch again the bootstrap process

Note by @NicolasT: edited report to put command output in a literal block for readability
Note by @TeddyAndrieux added a workaround

@ShashiKanthGitHub ShashiKanthGitHub added kind:bug Something isn't working moonshot labels Dec 14, 2019
@ShashiKanthGitHub
Copy link
Author

please propose solution for this problem. Thanks in advance.

@NicolasT
Copy link
Contributor

From 2.4.2-dev, we know you're not using a released version of MetalK8s but a development snapshot of the development/2.4 branch. Could you let us know the exact version (e.g., as found in product.txt on the resulting ISO) you're using? Mainly looking for the value of the GIT variable.

@NicolasT
Copy link
Contributor

One thing you could definitely try is simple re-run the bootstrap script, which could succeed.

Not sure where this error is coming from, though... It appears the metalk8s value in the Pillar has no private field. Someone in @scality/metalk8s may have an idea why that could be the case?

@ShashiKanthGitHub
Copy link
Author

ShashiKanthGitHub commented Dec 15, 2019

# cat /srv/scality/metalk8s-2.4.2-dev/product.txt 
NAME=MetalK8s
VERSION=2.4.2-dev
SHORT_VERSION=2.4
GIT=2.4.1-49-g198a747
DEVELOPMENT_RELEASE=1
BUILD_TIMESTAMP=2019-11-09T02:18:56Z
BUILD_HOST=localhost.localdomain

I actually had a successful install of same ISO a month ago, but due to Zenko installation failures, i have to re-install the cluster. But this time i am not understanding why i am getting this error.

I ran bootstrap script several times, and it is failing with same error.

@gdemonet
Copy link
Contributor

From your logs, Deploy CA role worked, which means the private key data should exist.

The error you get is not that metalk8s:private pillar was not set, but that it was explicitly set to None, which the code doesn't handle properly yet.
Something wrong with the default value we set in the metalk8s_private ext_pillar:

# salt/_pillar/metalk8s_private.py L59-68
    private_data = {'private': None}

    if "master" in node_info["roles"]:
        # update in `private_data`

    result = {"metalk8s": private_data}

Now, there should be no reason for this problem to exist during bootstrap. Indeed, we force the Bootstrap roles to include master during metalk8s.orchestrate.bootstrap, so you should at least see some info. So, not sure exactly why, but something prevents the metalk8s_private ext_pillar from being computed/updated for your Bootstrap minion.

What was the state of the platform before initial Bootstrap @shashscality?

@ShashiKanthGitHub
Copy link
Author

I have followed very straight forward steps.

  1. Made metalk8s ISO as per the documentation.
  2. Freshly created 5 VMs with CentOS 7.6 with SELinux and Firewall disabled.
  3. Mounted metalk8s ISO on the 1st VM (bootstrapper).
  4. Created the file /etc/metalk8s/bootstrap.yaml as per the documentation and edited the required parameters.
  5. Ran bootstrap.sh script and failed with the given error.

I have attached the complete bootstrap.log file.
bootstrap.log

@NicolasT
Copy link
Contributor

Can you provide the output of

sudo salt-call pillar.items metalk8s:nodes metalk8s:private

(executed on the bootstrap node after the bootstrap script failed).

Feel free to remove the 'secret' strings from the metalk8s:private value (if anything is in there, of course...)

@ShashiKanthGitHub
Copy link
Author

ShashiKanthGitHub commented Dec 15, 2019

[root@metalk8snode1 ~]# salt-call pillar.items metalk8s:nodes metalk8s:private
local:
    ----------
    metalk8s:nodes:
        ----------
        metalk8snode1.swinfra.net:
            ----------
            roles:
                - ca
            version:
                None
    metalk8s:private:
        None

@NicolasT
Copy link
Contributor

That's not at all what we except while bootstrapping: the roles of the node should include master, etcd, infra and bootstrap as well, not just ca.

We'll need to look into what's causing this...

Also, please use literal blocks when providing command output and such 😉

@NicolasT
Copy link
Contributor

@shashscality We've been looking into this, and we tried to reproduce the issue, but have been unable to.

Is there any other information you could provide related to your environment? Anything which could help our investigation? If there's any information that would help we can think of, we'll ask 😄

@ShashiKanthGitHub
Copy link
Author

@NicolasT I tried a fresh installation again and ended with same error. I am not understanding where i am doing the mistake. When i run the bootstrap.sh script for the very first time, i will get some package related errors and i tried manually installing them with "yum install" command. Once all the package related errors disappears, the next invoke of bootstrap.sh script fails with the reported error.

I have attached the bootstrap.log file of my today attempt.
bootstrap.log

@NicolasT
Copy link
Contributor

Hello @shashscality,

Thanks for getting back. It looks like initially you had some misconfiguration in /etc/metalk8s/bootstrap.yaml (missing the apiServer configuration, which is with the latest versions in the development/2.4 branch no longer required, but is in the version you're likely using), then indeed some repositories not being available (we expect the 'standard' CentOS/RHEL repositories to be available on the system, either online or using some local mirror).

Finally, the same error appears again, though it's still unclear why that's the case. One thing that could help debugging is looking into the Salt master logs, which you can find in /var/log/containers/salt-master-NODE_ID_kube-system_salt-master-RANDOM_UUID.log (note the placeholders in the filename, I can't know what those will be on your system).

Meanwhile, we're still looking into what could be going on. Are you sure the ISO you're using has been built in a pristine environment?

@ShashiKanthGitHub
Copy link
Author

@NicolasT I have attached salt master logs. Please let me know your observations.
salt-master.log.tar.gz

@ShashiKanthGitHub
Copy link
Author

I tried with 2.4.1 too, and it is ended with same error. Can someone please help me to solve this issue.
I have attached all the logs (/var/log/pods/, /var/log/metalk8s/, /var/log/containers/, /var/log/messages).
logs.tar.gz

@NicolasT
Copy link
Contributor

NicolasT commented Jan 8, 2020

Hello @shashscality! I looked into those logs but could not see anything that provides more insight 🤔 Asking the team to take another look as well...

Were you able to make any progress on your side?

@ShashiKanthGitHub
Copy link
Author

@NicolasT , I could able to make some progress, i could able to install MetalK8s on 5 node cluster. If i build the ISO image on the same bootstrapper node then i am observing that the bootstrap.sh is going fine, but if i make ISO image on some other node with the same OS and configuration then it is failing with the reported error.

But anyway, i have many different problems now after the installation. Could you please guide me with the bellow points.

  1. The default POD and Service network used in metalk8s (10.233.0.0/16 and 10.96.0.0/12) is conflicting with my office network, i am not finding in the documentation how to change them and where i need to declare them?

  2. After the installation some of the PODs in ContainerCreating, Unknown states. The common error reported is "Failed create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "be9ccbb2e199b67d39b02925148d34ca64035fe8ce03b19a57fe2219f63773b4": error getting ClusterInformation: Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: proxyconnect tcp: dial tcp: lookup http on <My_Gateway_IP>:53: no such host". I am not understanding how to solve this error.

@NicolasT
Copy link
Contributor

NicolasT commented Jan 10, 2020

I could able to make some progress, i could able to install MetalK8s on 5 node cluster.

That's good news!

If i build the ISO image on the same bootstrapper node then i am observing that the bootstrap.sh is going fine, but if i make ISO image on some other node with the same OS and configuration then it is failing with the reported error.

That's really strange and rather unexpected: we run builds on various developer machines (including Fedora, Ubuntu, Arch Linux,...) as well as in CI (CentOS 7), and as far as I'm aware the build system works fine on all of them. Anything particular in your setup?

In any case, Scality customers and partners can receive pre-built ISOs which are fully tested, validated and supported by Scality. As such, if you're either a customer or a partner, please reach out to your Scality representative to get access to these builds!

But anyway, i have many different problems now after the installation. Could you please guide me with the bellow points.

  1. The default POD and Service network used in metalk8s (10.233.0.0/16 and 10.96.0.0/12) is conflicting with my office network, i am not finding in the documentation how to change them and where i need to declare them?

These can be defined in the BootstrapConfiguration in /etc/metalk8s/bootstrap.yaml, under the networks section (similar to where controlPlane and workloadPlane are defined): the relevant keys are pods and services. Indeed, this is missing from the documentation. The values of these keys should be in the ipaddress/cidr format, so the defaults are 10.233.0.0/16 for pods and 10.96.0.0/12 for services.

  1. After the installation some of the PODs in ContainerCreating, Unknown states. The common error reported is "Failed create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "be9ccbb2e199b67d39b02925148d34ca64035fe8ce03b19a57fe2219f63773b4": error getting ClusterInformation: Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: proxyconnect tcp: dial tcp: lookup http on <My_Gateway_IP>:53: no such host". I am not understanding how to solve this error.

Since you have overlap between the service network as used by MetalK8s and a 'real' network, this may be the cause of this issue: Calico is trying to access the Kubernetes API server through its Service address (10.96.0.1, by default), and this fails. As such, Calico can't do its job, and since the network can't be configured by CNI (Calico), containers aren't started properly.

@thomasdanan thomasdanan added the priority:low Low priority issues that can be safely postponed if needed label Apr 6, 2020
@TeddyAndrieux
Copy link
Collaborator

Just added a workarround in the ticket description.

FYI: You get this error if first run of the bootstrap deployment failed when salt-master get deployed with external pillar declared in it and apiserver role not yet set on the bootstrap node.
Then apiserver salt states will fail because private from ext_pillar is None

salt/_pillar/metalk8s_private.py:59:    private_data = {'private': None}
salt/_pillar/metalk8s_private.py-60-
salt/_pillar/metalk8s_private.py-61-    if "master" in node_info["roles"]:
salt/_pillar/metalk8s_private.py-62-        data = {}
salt/_pillar/metalk8s_private.py-63-        data.update(_read_sa_private_key())
salt/_pillar/metalk8s_private.py-64-        data.update(_read_apiserver_key())
salt/_pillar/metalk8s_private.py:65:        private_data['private'] = data
salt/_pillar/metalk8s_private.py:66:        __utils__['pillar_utils.promote_errors'](private_data, 'private')
salt/_pillar/metalk8s_private.py-67-
salt/_pillar/metalk8s_private.py:68:    result = {"metalk8s": private_data}

So as a workaround manually add the label for role master fix this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug Something isn't working priority:low Low priority issues that can be safely postponed if needed
Projects
None yet
Development

No branches or pull requests

5 participants