Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OKD 4.6: Missing /etc/resolv.conf when using static ip configuration #380

Closed
timbrd opened this issue Nov 20, 2020 · 38 comments
Closed

OKD 4.6: Missing /etc/resolv.conf when using static ip configuration #380

timbrd opened this issue Nov 20, 2020 · 38 comments

Comments

@timbrd
Copy link

timbrd commented Nov 20, 2020

Hello,

I am currently testing the okd nightly release 4.6.0-0.okd-2020-11-18-085718.
Since I would like to use static ip configuration, I have added the following kargs to the bootstrap node:

ip=10.0.54.101::10.0.54.1:255.255.255.0:okd-infratest-bootstrap:ens192:none nameserver=10.0.54.98

Everything worked fine at first, but after the first reboot, systemd-resolved wasn't starting anymore:

[root@okd-infratest-bootstrap ~]# systemctl status systemd-resolved
● systemd-resolved.service - Network Name Resolution
     Loaded: loaded (/usr/lib/systemd/system/systemd-resolved.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/systemd-resolved.service.d
             └─disabled.conf
     Active: inactive (dead)
  Condition: start condition failed at Fri 2020-11-20 12:11:46 UTC; 11min ago
             └─ ConditionPathExists=/enoent was not met
       Docs: man:systemd-resolved.service(8)
             https://www.freedesktop.org/wiki/Software/systemd/resolved
             https://www.freedesktop.org/wiki/Software/systemd/writing-network-configuration-managers
             https://www.freedesktop.org/wiki/Software/systemd/writing-resolver-clients

Nov 20 12:11:45 okd-infratest-bootstrap systemd[1]: Condition check resulted in Network Name Resolution being skipped.
Nov 20 12:11:46 okd-infratest-bootstrap systemd[1]: Condition check resulted in Network Name Resolution being skipped.

There is also the /run/systemd/resolve directory missing, which the resolv.conf links to:

[root@okd-infratest-bootstrap ~]# ls -l /etc/resolv.conf
lrwxrwxrwx. 1 root root 39 Nov 20 12:11 /etc/resolv.conf -> ../run/systemd/resolve/stub-resolv.conf

[root@okd-infratest-bootstrap ~]# ls -l /run/systemd/resolve
ls: cannot access '/run/systemd/resolve': No such file or directory

Name resolution still works (the bootkube service can download the required container images), but the containers expect the hosts resolv.conf to be mounted:

Nov 20 12:27:42 okd-infratest-bootstrap podman[243792]: 2020-11-20 12:27:42.797734916 +0000 UTC m=+0.135549293 container create 772351e721c044e494e8e062930c5aaba735e3efc500a4a136aeb024c02f426a (image=registry.svc.ci.openshift.org/origin/release@sha256:17d23274c22e8c8f75f77e5cf1ca00259a929bec2c0f81b47267cf06efd74775, name=nostalgic_sanderson)
Nov 20 12:27:42 okd-infratest-bootstrap bootkube.sh[243792]: Error: error creating resolv.conf for container 772351e721c044e494e8e062930c5aaba735e3efc500a4a136aeb024c02f426a: lstat /etc/../run/systemd/resolve: no such file or directory
Nov 20 12:27:42 okd-infratest-bootstrap systemd[1]: bootkube.service: Main process exited, code=exited, status=127/n/a
Nov 20 12:27:42 okd-infratest-bootstrap systemd[1]: bootkube.service: Failed with result 'exit-code'.

I'm not sure if this is a bug or if I am missing something.

Thanks
Tim

@timbrd timbrd changed the title Missing /etc/resolv.conf when using static ip configuration OKD 4.6: Missing /etc/resolv.conf when using static ip configuration Nov 20, 2020
@vrutkovs vrutkovs added the triage/needs-information Indicates an issue needs more information in order to work on it. label Nov 20, 2020
@vrutkovs
Copy link
Member

systemd-resolved is expected to be disabled (it breaks hostname resolution later).

Please provide log-bundle

@timbrd
Copy link
Author

timbrd commented Nov 20, 2020

I wasn't able to use oc gather bootstrap, since it could not connect to the nodes via ssh somehow. Instead, I tried to gather all the logs manually.

You find the bootkube.service logs attached. All the container logs are empty and the kubelet and crio logs do not contain any interesting information:

[root@ocp-infratest-bootstrap ~]# journalctl -b -f -u kubelet.service -u crio.service
-- Logs begin at Fri 2020-11-20 13:39:03 UTC. --
Nov 20 15:33:51 ocp-infratest-bootstrap hyperkube[1534]: I1120 15:33:51.154846    1534 kubelet_node_status.go:526] Recording NodeHasNoDiskPressure event message for node ocp-infratest-bootstrap
Nov 20 15:33:51 ocp-infratest-bootstrap hyperkube[1534]: I1120 15:33:51.154868    1534 kubelet_node_status.go:526] Recording NodeHasSufficientPID event message for node ocp-infratest-bootstrap
Nov 20 15:34:01 ocp-infratest-bootstrap hyperkube[1534]: I1120 15:34:01.172314    1534 kubelet_node_status.go:334] Setting node annotation to enable volume controller attach/detach
Nov 20 15:34:01 ocp-infratest-bootstrap hyperkube[1534]: I1120 15:34:01.206871    1534 kubelet_node_status.go:526] Recording NodeHasSufficientMemory event message for node ocp-infratest-bootstrap
Nov 20 15:34:01 ocp-infratest-bootstrap hyperkube[1534]: I1120 15:34:01.207231    1534 kubelet_node_status.go:526] Recording NodeHasNoDiskPressure event message for node ocp-infratest-bootstrap
Nov 20 15:34:01 ocp-infratest-bootstrap hyperkube[1534]: I1120 15:34:01.207262    1534 kubelet_node_status.go:526] Recording NodeHasSufficientPID event message for node ocp-infratest-bootstrap
Nov 20 15:34:11 ocp-infratest-bootstrap hyperkube[1534]: I1120 15:34:11.225452    1534 kubelet_node_status.go:334] Setting node annotation to enable volume controller attach/detach
Nov 20 15:34:11 ocp-infratest-bootstrap hyperkube[1534]: I1120 15:34:11.257014    1534 kubelet_node_status.go:526] Recording NodeHasSufficientMemory event message for node ocp-infratest-bootstrap
Nov 20 15:34:11 ocp-infratest-bootstrap hyperkube[1534]: I1120 15:34:11.257063    1534 kubelet_node_status.go:526] Recording NodeHasNoDiskPressure event message for node ocp-infratest-bootstrap
Nov 20 15:34:11 ocp-infratest-bootstrap hyperkube[1534]: I1120 15:34:11.257076    1534 kubelet_node_status.go:526] Recording NodeHasSufficientPID event message for node ocp-infratest-bootstrap

It seems to break at the very beginning of the bootstrap process.

@vrutkovs
Copy link
Member

That's odd - could you attach the whole journalctl output for this boot? I expected coreos-migrate-to-systemd-resolved to be disabled and NetworkManager would setup DNS nameserver from kernel args.

Perhaps some of these settings are overwritten on target system?

@timbrd
Copy link
Author

timbrd commented Nov 20, 2020

I have attached the whole journalctl output: boot.log

Btw, I have recreated the cluster from scratch before and checked if the settings you mentioned differ, but they don't:

[root@ocp-infratest-bootstrap /]# cat etc/systemd/system/systemd-resolved.service.d/disabled.conf
[Unit]
ConditionPathExists=/enoent

[root@ocp-infratest-bootstrap /]# cat etc/NetworkManager/conf.d/dns.conf
[main]
dns=default

[root@ocp-infratest-bootstrap /]# cat etc/systemd/system/coreos-migrate-to-systemd-resolved.service.d/disabled.conf
[Unit]
ConditionPathExists=/enoent
[root@ocp-infratest-bootstrap /]# systemctl status coreos-migrate-to-systemd-resolved
● coreos-migrate-to-systemd-resolved.service - CoreOS Migrate to Systemd Resolved
     Loaded: loaded (/usr/lib/systemd/system/coreos-migrate-to-systemd-resolved.service; disabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/coreos-migrate-to-systemd-resolved.service.d
             └─disabled.conf
     Active: inactive (dead)
       Docs: https://github.com/coreos/fedora-coreos-tracker/issues/646

@LorbusChris
Copy link
Contributor

Nov 20 12:27:42 okd-infratest-bootstrap bootkube.sh[243792]: Error: error creating resolv.conf for container 772351e721c044e494e8e062930c5aaba735e3efc500a4a136aeb024c02f426a: lstat /etc/../run/systemd/resolve: no such file or directory

I think this might be a podman issue. Maybe @haircommander can help.

@haircommander
Copy link

I may not be the best person to help, but I think any podman person who would be would need to know where this bootkube.sh file lives and what podman is doing/expected to be doing with this resolv.conf

@vrutkovs
Copy link
Member

vrutkovs commented Nov 20, 2020

Oh, systemd-resolved still runs on first boot - before release image is downloaded:

Nov 20 17:46:59 ocp-infratest-bootstrap systemd-resolved[1346]: Using degraded feature set UDP instead of UDP+EDNS0 for DNS server 10.0.54.98.

It sets /etc/resolv.conf symlink, which is broken on second boot - systemd-resolved disabled and there is no service which creates /run/systemd/resolve

I don't think it has anything to do with podman

@vrutkovs vrutkovs removed the triage/needs-information Indicates an issue needs more information in order to work on it. label Nov 21, 2020
@vrutkovs
Copy link
Member

@timbrd could you give https://amd64.origin.releases.ci.openshift.org/releasestream/4.6.0-0.okd/release/4.6.0-0.okd-2020-11-21-155444 a try? okd-machine-os should now remove all traces of systemd-resolved. It works with DHCP (on AWS and GCP), so hopefully it works with kernel args as well

@timbrd
Copy link
Author

timbrd commented Nov 21, 2020

Unfortunately still the same error:

Nov 21 16:51:20 ocp-infratest-bootstrap bootkube.sh[166195]: Rendering Cluster Version Operator Manifests...
Nov 21 16:51:20 ocp-infratest-bootstrap podman[169194]: 2020-11-21 16:51:20.623665491 +0000 UTC m=+0.104853428 container create a595bee1975ab494b5d9fbc681ed4f4718f75d685524ba346ca13effc5e79a39 (image=registry.svc.ci.ope>
Nov 21 16:51:20 ocp-infratest-bootstrap bootkube.sh[169194]: Error: error creating resolv.conf for container a595bee1975ab494b5d9fbc681ed4f4718f75d685524ba346ca13effc5e79a39: lstat /etc/../run/systemd/resolve: no such f>
Nov 21 16:51:20 ocp-infratest-bootstrap systemd[1]: bootkube.service: Main process exited, code=exited, status=127/n/a
Nov 21 16:51:20 ocp-infratest-bootstrap systemd[1]: bootkube.service: Failed with result 'exit-code'.

I have extracted the new installer and used it to generate the ignition configs.
How can I check if your changes have made it into my machine? I think, /usr/lib/tmpfiles.d/etc.conf didn't exist before, right?

[root@ocp-infratest-bootstrap ~]# cat /usr/lib/tmpfiles.d/etc.conf
 #  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.
# See tmpfiles.d(5) for details
L /etc/os-release - - - - ../usr/lib/os-release
L+ /etc/mtab - - - - ../proc/self/mounts
C! /etc/nsswitch.conf - - - -
C! /etc/pam.d - - - -
C! /etc/issue - - - -

@vrutkovs
Copy link
Member

I think, /usr/lib/tmpfiles.d/etc.conf didn't exist before, right?

That's odd, it should have existed - previous version also had

L! /etc/resolv.conf - - - - ../run/systemd/resolve/stub-resolv.conf

which created the borked symlink

@timbrd
Copy link
Author

timbrd commented Nov 21, 2020

I think, /usr/lib/tmpfiles.d/etc.conf didn't exist before, right?

That's odd, it should have existed - previous version also had

L! /etc/resolv.conf - - - - ../run/systemd/resolve/stub-resolv.conf

which created the borked symlink

Okay, but just removing the symlink does not solve the problem which prevents podman to start any containers? If I understand the linked issue correctly, podman expects either the file /etc/resolv.conf or the stub-configuration in /run/systemd/resolve/stub-resolv.conf to exist.

@timbrd
Copy link
Author

timbrd commented Nov 21, 2020

After creating a valid /etc/resolv.conf, podman is able to start the containers.

@vrutkovs
Copy link
Member

vrutkovs commented Nov 22, 2020

IIUC here's what's happening:

  • initial FCOS boots, it has systemd-resolved enabled
  • this creates a symlink
  • NM configuration from kernel arguments written
  • however, NM is not managing the resolv.conf, so its being ignored. Nameserver is written to /run/systemd/resolve/stub-resolv.conf instead
  • bootstrap node pivots, boots to updated deployment, which has systemd-resolved disabled (it doesn't play nice with CoreDNS later in deploy)
  • NM is now managing resolv.conf - but it can't write nameserver, as resolv.conf is a simlink to missing file

Could you test two different payloads (use the same installer with export OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE=.. before generating Ignition files)

imageContentSources:
- mirrors:
  - quay.io/vrutkovs/okd-release
  source: registry.build01.ci.openshift.org/ci-op-m107l8cy/release
- mirrors:
  - quay.io/vrutkovs/okd-release
  source: registry.build01.ci.openshift.org/ci-op-m107l8cy/stable
imageContentSources:
- mirrors:
  - quay.io/vrutkovs/okd-release
  source: registry.build01.ci.openshift.org/ci-op-rnq12cmv/release
- mirrors:
  - quay.io/vrutkovs/okd-release
  source: registry.build01.ci.openshift.org/ci-op-rnq12cmv/stable

Unfortunately, we don't have a good CI system setup to try out bare metal UPI without DHCP, so I'm hesitant to merge openshift/okd-machine-os#14 or 15 just yet.

@timbrd
Copy link
Author

timbrd commented Nov 22, 2020

Thanks for the update.
I'll probably be able to try both solutions tomorrow. The machines are behind a corporate proxy and registry.build01.ci.openshift.org has to be added to the whitelist first.

@vrutkovs
Copy link
Member

I'll probably be able to try both solutions tomorrow.

Thanks!

The machines are behind a corporate proxy and registry.build01.ci.openshift.org has to be added to the whitelist first.

The images are mirrored to quay.io, so its actually best to leave registry.build01 blocked, so that quay mirror would be used

@timbrd
Copy link
Author

timbrd commented Nov 22, 2020

The machines are behind a corporate proxy and registry.build01.ci.openshift.org has to be added to the whitelist first.

The images are mirrored to quay.io, so its actually best to leave registry.build01 blocked, so that quay mirror would be used

Okay, what do I have to change for using the quay.io mirror?

[root@ocp-infratest-bootstrap ~]# journalctl -f -b -u release-image
-- Logs begin at Sun 2020-11-22 10:54:02 UTC, end at Sun 2020-11-22 12:49:07 UTC. --
[...]
Nov 22 12:50:03 ocp-infratest-bootstrap systemd[1]: Starting Download the OpenShift Release Image...
Nov 22 12:50:03 ocp-infratest-bootstrap release-image-download.sh[262003]: Pulling quay.io/vrutkovs/okd-release:4.6-bug-380...
Nov 22 12:50:05 ocp-infratest-bootstrap podman[262004]: 2020-11-22 12:50:05.506702006 +0000 UTC m=+1.582405611 image pull
Nov 22 12:50:05 ocp-infratest-bootstrap release-image-download.sh[262004]: 52573a14928aa980b6be3ebddb37337650e9ba6053ad455565cee7adc6c73cc7
Nov 22 12:50:05 ocp-infratest-bootstrap podman[262088]: 2020-11-22 12:50:05.717209002 +0000 UTC m=+0.092684828 container create 0b7e0e524dfb6d33fba84857483a02f8d1f7f8a94e7b46946104d270d7c873fe (image=quay.io/vrutkovs/okd-release:4.6-bug-380, name=zen_lovelace)
Nov 22 12:50:05 ocp-infratest-bootstrap podman[262088]: 2020-11-22 12:50:05.808012524 +0000 UTC m=+0.183488361 container init 0b7e0e524dfb6d33fba84857483a02f8d1f7f8a94e7b46946104d270d7c873fe (image=quay.io/vrutkovs/okd-release:4.6-bug-380, name=zen_lovelace)
Nov 22 12:50:05 ocp-infratest-bootstrap podman[262088]: 2020-11-22 12:50:05.826475134 +0000 UTC m=+0.201950985 container start 0b7e0e524dfb6d33fba84857483a02f8d1f7f8a94e7b46946104d270d7c873fe (image=quay.io/vrutkovs/okd-release:4.6-bug-380, name=zen_lovelace)
Nov 22 12:50:05 ocp-infratest-bootstrap podman[262088]: 2020-11-22 12:50:05.826589994 +0000 UTC m=+0.202065885 container attach 0b7e0e524dfb6d33fba84857483a02f8d1f7f8a94e7b46946104d270d7c873fe (image=quay.io/vrutkovs/okd-release:4.6-bug-380, name=zen_lovelace)
Nov 22 12:50:05 ocp-infratest-bootstrap podman[262088]: 2020-11-22 12:50:05.880495227 +0000 UTC m=+0.255971130 container died 0b7e0e524dfb6d33fba84857483a02f8d1f7f8a94e7b46946104d270d7c873fe (image=quay.io/vrutkovs/okd-release:4.6-bug-380, name=zen_lovelace)
Nov 22 12:50:05 ocp-infratest-bootstrap podman[262088]: 2020-11-22 12:50:05.931372361 +0000 UTC m=+0.306848182 container remove 0b7e0e524dfb6d33fba84857483a02f8d1f7f8a94e7b46946104d270d7c873fe (image=quay.io/vrutkovs/okd-release:4.6-bug-380, name=zen_lovelace)
Nov 22 12:50:06 ocp-infratest-bootstrap podman[262211]: 2020-11-22 12:50:06.045583625 +0000 UTC m=+0.092726366 container create a2a6f296294bd07e193dd2f939df9051f91a76fc1f6ea1b56b1e5f355249211e (image=quay.io/vrutkovs/okd-release:4.6-bug-380, name=admiring_burnell)
Nov 22 12:50:06 ocp-infratest-bootstrap podman[262211]: 2020-11-22 12:50:06.127476344 +0000 UTC m=+0.174619092 container init a2a6f296294bd07e193dd2f939df9051f91a76fc1f6ea1b56b1e5f355249211e (image=quay.io/vrutkovs/okd-release:4.6-bug-380, name=admiring_burnell)
Nov 22 12:50:06 ocp-infratest-bootstrap podman[262211]: 2020-11-22 12:50:06.145586631 +0000 UTC m=+0.192729377 container start a2a6f296294bd07e193dd2f939df9051f91a76fc1f6ea1b56b1e5f355249211e (image=quay.io/vrutkovs/okd-release:4.6-bug-380, name=admiring_burnell)
Nov 22 12:50:06 ocp-infratest-bootstrap podman[262211]: 2020-11-22 12:50:06.145703277 +0000 UTC m=+0.192846181 container attach a2a6f296294bd07e193dd2f939df9051f91a76fc1f6ea1b56b1e5f355249211e (image=quay.io/vrutkovs/okd-release:4.6-bug-380, name=admiring_burnell)
Nov 22 12:50:06 ocp-infratest-bootstrap podman[262211]: 2020-11-22 12:50:06.197649006 +0000 UTC m=+0.244791823 container died a2a6f296294bd07e193dd2f939df9051f91a76fc1f6ea1b56b1e5f355249211e (image=quay.io/vrutkovs/okd-release:4.6-bug-380, name=admiring_burnell)
Nov 22 12:50:06 ocp-infratest-bootstrap podman[262211]: 2020-11-22 12:50:06.252906668 +0000 UTC m=+0.300049448 container remove a2a6f296294bd07e193dd2f939df9051f91a76fc1f6ea1b56b1e5f355249211e (image=quay.io/vrutkovs/okd-release:4.6-bug-380, name=admiring_burnell)
Nov 22 12:50:07 ocp-infratest-bootstrap release-image-download.sh[262354]: Error: unable to pull registry.build01.ci.openshift.org/ci-op-m107l8cy/stable@sha256:3d348d299bbbbd266961a789f157f28d9b3b15a0c2bc54278b30549ec47f33f1: Error parsing image configuration: Error fetching blob: invalid status code from registry 403 (Forbidden)
Nov 22 12:50:07 ocp-infratest-bootstrap systemd[1]: release-image.service: Main process exited, code=exited, status=125/n/a
Nov 22 12:50:07 ocp-infratest-bootstrap systemd[1]: release-image.service: Failed with result 'exit-code'.
Nov 22 12:50:07 ocp-infratest-bootstrap systemd[1]: Failed to start Download the OpenShift Release Image.

@timbrd
Copy link
Author

timbrd commented Nov 22, 2020

Nov 22 14:05:01 ocp-infratest-bootstrap release-image-download.sh[29926]: Error: unable to pull registry.build01.ci.openshift.org/ci-op-m107l8cy/stable@sha256:3d348d299bbbbd266961a789f157f28d9b3b15a0c2bc54278b30549ec47f33f1: Error initializing source docker://registry.build01.ci.openshift.org/ci-op-m107l8cy/stable@sha256:3d348d299bbbbd266961a789f157f28d9b3b15a0c2bc54278b30549ec47f33f1: (Mirrors also failed: [quay.io/vrutkovs/okd-release@sha256:3d348d299bbbbd266961a789f157f28d9b3b15a0c2bc54278b30549ec47f33f1: Error reading manifest sha256:3d348d299bbbbd266961a789f157f28d9b3b15a0c2bc54278b30549ec47f33f1 in quay.io/vrutkovs/okd-release: manifest unknown: manifest unknown]): registry.build01.ci.openshift.org/ci-op-m107l8cy/stable@sha256:3d348d299bbbbd266961a789f157f28d9b3b15a0c2bc54278b30549ec47f33f1: Error reading manifest sha256:3d348d299bbbbd266961a789f157f28d9b3b15a0c2bc54278b30549ec47f33f1 in registry.build01.ci.openshift.org/ci-op-m107l8cy/stable: unauthorized: authentication required
Nov 22 14:05:01 ocp-infratest-bootstrap systemd[1]: release-image.service: Main process exited, code=exited, status=125/n/a
Nov 22 14:05:01 ocp-infratest-bootstrap systemd[1]: release-image.service: Failed with result 'exit-code'.
Nov 22 14:05:01 ocp-infratest-bootstrap systemd[1]: Failed to start Download the OpenShift Release Image.

According to quay.io the hash should be 89b10e23c9ca983f9cf65b9b63eb4a0f276edc6886da3bf8b2bc6084e756d17f and not 3d348d299bbbbd266961a789f157f28d9b3b15a0c2bc54278b30549ec47f33f1, right?

@vrutkovs
Copy link
Member

vrutkovs commented Nov 22, 2020

Oh, oops, yeah - try digests for these:
quay.io/vrutkovs/okd-release:4.6-bug-380 -> quay.io/vrutkovs/okd-release@sha256:89b10e23c9ca983f9cf65b9b63eb4a0f276edc6886da3bf8b2bc6084e756d17f

quay.io/vrutkovs/okd-release:4.6-bug-380-systemd-resolved -> quay.io/vrutkovs/okd-release@sha256:80a81f79518ce8237ccbe7ed7fc2e682949f78115aec76449c497748a3cbff07

@timbrd
Copy link
Author

timbrd commented Nov 22, 2020

Hm, it still doesn't work. The release-image service says it would download the image with the correct digest, but podman then tries to download the old one.

[root@ocp-infratest-bootstrap ~]# journalctl -f -b -u release-image
-- Logs begin at Sun 2020-11-22 19:41:51 UTC. --
Nov 22 19:42:00 ocp-infratest-bootstrap systemd[1]: Starting Download the OpenShift Release Image...
Nov 22 19:42:00 ocp-infratest-bootstrap release-image-download.sh[1410]: Pulling quay.io/vrutkovs/okd-release@sha256:89b10e23c9ca983f9cf65b9b63eb4a0f276edc6886da3bf8b2bc6084e756d17f...
Nov 22 19:42:01 ocp-infratest-bootstrap podman[1413]: 2020-11-22 19:42:01.422056501 +0000 UTC m=+0.727591337 system refresh
Nov 22 19:43:12 ocp-infratest-bootstrap podman[1413]: 2020-11-22 19:43:12.021398349 +0000 UTC m=+71.326933101 image pull
Nov 22 19:43:12 ocp-infratest-bootstrap release-image-download.sh[1413]: 52573a14928aa980b6be3ebddb37337650e9ba6053ad455565cee7adc6c73cc7
Nov 22 19:43:12 ocp-infratest-bootstrap podman[1742]: 2020-11-22 19:43:12.262734518 +0000 UTC m=+0.106524232 container create e73184d0e9592887cb6bfbfc6d772df5ea77b094b33066b24982668903defc97 (image=quay.io/vrutkovs/okd-release@sha256:89b10e23c9ca983f9cf65b9b63eb4a0f276edc6886da3bf8b2bc6084e756d17f, name=jovial_kirch)
Nov 22 19:43:12 ocp-infratest-bootstrap podman[1742]: 2020-11-22 19:43:12.461869801 +0000 UTC m=+0.305659542 container init e73184d0e9592887cb6bfbfc6d772df5ea77b094b33066b24982668903defc97 (image=quay.io/vrutkovs/okd-release@sha256:89b10e23c9ca983f9cf65b9b63eb4a0f276edc6886da3bf8b2bc6084e756d17f, name=jovial_kirch)
Nov 22 19:43:12 ocp-infratest-bootstrap podman[1742]: 2020-11-22 19:43:12.480770495 +0000 UTC m=+0.324560168 container start e73184d0e9592887cb6bfbfc6d772df5ea77b094b33066b24982668903defc97 (image=quay.io/vrutkovs/okd-release@sha256:89b10e23c9ca983f9cf65b9b63eb4a0f276edc6886da3bf8b2bc6084e756d17f, name=jovial_kirch)
Nov 22 19:43:12 ocp-infratest-bootstrap podman[1742]: 2020-11-22 19:43:12.480894686 +0000 UTC m=+0.324684432 container attach e73184d0e9592887cb6bfbfc6d772df5ea77b094b33066b24982668903defc97 (image=quay.io/vrutkovs/okd-release@sha256:89b10e23c9ca983f9cf65b9b63eb4a0f276edc6886da3bf8b2bc6084e756d17f, name=jovial_kirch)
Nov 22 19:43:12 ocp-infratest-bootstrap podman[1742]: 2020-11-22 19:43:12.504693482 +0000 UTC m=+0.348483144 container died e73184d0e9592887cb6bfbfc6d772df5ea77b094b33066b24982668903defc97 (image=quay.io/vrutkovs/okd-release@sha256:89b10e23c9ca983f9cf65b9b63eb4a0f276edc6886da3bf8b2bc6084e756d17f, name=jovial_kirch)
Nov 22 19:43:13 ocp-infratest-bootstrap podman[1742]: 2020-11-22 19:43:13.296075674 +0000 UTC m=+1.139865356 container remove e73184d0e9592887cb6bfbfc6d772df5ea77b094b33066b24982668903defc97 (image=quay.io/vrutkovs/okd-release@sha256:89b10e23c9ca983f9cf65b9b63eb4a0f276edc6886da3bf8b2bc6084e756d17f, name=jovial_kirch)
Nov 22 19:43:13 ocp-infratest-bootstrap podman[1888]: 2020-11-22 19:43:13.430066389 +0000 UTC m=+0.109280399 container create 1d14282d6d10ed0e198a0864dffed6e72736082db510650284c1aa789c805517 (image=quay.io/vrutkovs/okd-release@sha256:89b10e23c9ca983f9cf65b9b63eb4a0f276edc6886da3bf8b2bc6084e756d17f, name=awesome_antonelli)
Nov 22 19:43:13 ocp-infratest-bootstrap podman[1888]: 2020-11-22 19:43:13.53200961 +0000 UTC m=+0.211223663 container init 1d14282d6d10ed0e198a0864dffed6e72736082db510650284c1aa789c805517 (image=quay.io/vrutkovs/okd-release@sha256:89b10e23c9ca983f9cf65b9b63eb4a0f276edc6886da3bf8b2bc6084e756d17f, name=awesome_antonelli)
Nov 22 19:43:13 ocp-infratest-bootstrap podman[1888]: 2020-11-22 19:43:13.551688593 +0000 UTC m=+0.230902620 container start 1d14282d6d10ed0e198a0864dffed6e72736082db510650284c1aa789c805517 (image=quay.io/vrutkovs/okd-release@sha256:89b10e23c9ca983f9cf65b9b63eb4a0f276edc6886da3bf8b2bc6084e756d17f, name=awesome_antonelli)
Nov 22 19:43:13 ocp-infratest-bootstrap podman[1888]: 2020-11-22 19:43:13.552002645 +0000 UTC m=+0.231216681 container attach 1d14282d6d10ed0e198a0864dffed6e72736082db510650284c1aa789c805517 (image=quay.io/vrutkovs/okd-release@sha256:89b10e23c9ca983f9cf65b9b63eb4a0f276edc6886da3bf8b2bc6084e756d17f, name=awesome_antonelli)
Nov 22 19:43:13 ocp-infratest-bootstrap podman[1888]: 2020-11-22 19:43:13.600941886 +0000 UTC m=+0.280155999 container died 1d14282d6d10ed0e198a0864dffed6e72736082db510650284c1aa789c805517 (image=quay.io/vrutkovs/okd-release@sha256:89b10e23c9ca983f9cf65b9b63eb4a0f276edc6886da3bf8b2bc6084e756d17f, name=awesome_antonelli)
Nov 22 19:43:13 ocp-infratest-bootstrap podman[1888]: 2020-11-22 19:43:13.655492022 +0000 UTC m=+0.334706082 container remove 1d14282d6d10ed0e198a0864dffed6e72736082db510650284c1aa789c805517 (image=quay.io/vrutkovs/okd-release@sha256:89b10e23c9ca983f9cf65b9b63eb4a0f276edc6886da3bf8b2bc6084e756d17f, name=awesome_antonelli)
Nov 22 19:43:16 ocp-infratest-bootstrap release-image-download.sh[2021]: Error: unable to pull registry.build01.ci.openshift.org/ci-op-m107l8cy/stable@sha256:3d348d299bbbbd266961a789f157f28d9b3b15a0c2bc54278b30549ec47f33f1: Error initializing source docker://registry.build01.ci.openshift.org/ci-op-m107l8cy/stable@sha256:3d348d299bbbbd266961a789f157f28d9b3b15a0c2bc54278b30549ec47f33f1: (Mirrors also failed: [quay.io/vrutkovs/okd-release@sha256:3d348d299bbbbd266961a789f157f28d9b3b15a0c2bc54278b30549ec47f33f1: Error reading manifest sha256:3d348d299bbbbd266961a789f157f28d9b3b15a0c2bc54278b30549ec47f33f1 in quay.io/vrutkovs/okd-release: manifest unknown: manifest unknown]): registry.build01.ci.openshift.org/ci-op-m107l8cy/stable@sha256:3d348d299bbbbd266961a789f157f28d9b3b15a0c2bc54278b30549ec47f33f1: Error reading manifest sha256:3d348d299bbbbd266961a789f157f28d9b3b15a0c2bc54278b30549ec47f33f1 in registry.build01.ci.openshift.org/ci-op-m107l8cy/stable: unauthorized: authentication required
Nov 22 19:43:16 ocp-infratest-bootstrap systemd[1]: release-image.service: Main process exited, code=exited, status=125/n/a
Nov 22 19:43:16 ocp-infratest-bootstrap systemd[1]: release-image.service: Failed with result 'exit-code'.
Nov 22 19:43:16 ocp-infratest-bootstrap systemd[1]: Failed to start Download the OpenShift Release Image.

Edit: It seems, I also have to override MACHINE_CONFIG_OPERATOR_IMAGE. In /sysroot/ostree/deploy/fedora-coreos/var/usrlocal/bin/release-image-download.sh, the mco image uses the incorrect digest:

[root@ocp-infratest-bootstrap ~]# /sysroot/ostree/deploy/fedora-coreos/var/usrlocal/bin/release-image-download.sh
[...]
+ echo 'ADD systemd.unified_cgroup_hierarchy=0'
+ echo 'DELETE mitigations=auto,nosmt'
+ mkdir --parents bin/
+ podman run --quiet --net=host --entrypoint=cat registry.build01.ci.openshift.org/ci-op-m107l8cy/stable@sha256:3d348d299bbbbd266961a789f157f28d9b3b15a0c2bc54278b30549ec47f33f1 /usr/bin/machine-config-daemon
Error: unable to pull registry.build01.ci.openshift.org/ci-op-m107l8cy/stable@sha256:3d348d299bbbbd266961a789f157f28d9b3b15a0c2bc54278b30549ec47f33f1: Error initializing source docker://registry.build01.ci.openshift.org/ci-op-m107l8cy/stable@sha256:3d348d299bbbbd266961a789f157f28d9b3b15a0c2bc54278b30549ec47f33f1: (Mirrors also failed: [quay.io/vrutkovs/okd-release@sha256:3d348d299bbbbd266961a789f157f28d9b3b15a0c2bc54278b30549ec47f33f1: Error reading manifest sha256:3d348d299bbbbd266961a789f157f28d9b3b15a0c2bc54278b30549ec47f33f1 in quay.io/vrutkovs/okd-release: manifest unknown: manifest unknown]): registry.build01.ci.openshift.org/ci-op-m107l8cy/stable@sha256:3d348d299bbbbd266961a789f157f28d9b3b15a0c2bc54278b30549ec47f33f1: Error reading manifest sha256:3d348d299bbbbd266961a789f157f28d9b3b15a0c2bc54278b30549ec47f33f1 in registry.build01.ci.openshift.org/ci-op-m107l8cy/stable: unauthorized: authentication required

Is there a way to override the mco image variable?

@vrutkovs
Copy link
Member

Hrm, okay, I think we'll go with openshift/okd-machine-os#15 sooner or later anyway

@vrutkovs
Copy link
Member

@timbrd
Copy link
Author

timbrd commented Nov 22, 2020

The stub-listener doesn't exist anymore (the broken symlink has been created though) and the containers still do not start.

[root@ocp-infratest-bootstrap ~]# ls -l /run/systemd/resolve/
total 4
drwx------. 2 systemd-resolve systemd-resolve  60 Nov 22 21:17 netif
-rw-r--r--. 1 systemd-resolve systemd-resolve 599 Nov 22 21:17 resolv.conf

[root@ocp-infratest-bootstrap ~]# ls -l /etc/resolv.conf
lrwxrwxrwx. 1 root root 39 Nov 22 21:17 /etc/resolv.conf -> ../run/systemd/resolve/stub-resolv.conf
[root@ocp-infratest-bootstrap ~]# journalctl -b -u bootkube.service
[...]
Nov 22 21:23:35 ocp-infratest-bootstrap bootkube.sh[85483]: Rendering Cluster Version Operator Manifests...
Nov 22 21:23:35 ocp-infratest-bootstrap podman[88479]: 2020-11-22 21:23:35.483789395 +0000 UTC m=+0.103059691 container create 15c194ca0ac78f4d98ded427da2a61d8bd3ceb1bfad84f09949658aa7a59f1fd (image=registry.svc.ci.openshift.org/origin/release@sha256:deac9acdbb23ff0b8823fc47b925a382a51922851f8372a4498b5424afff33cb, name=tender_lovelace)
Nov 22 21:23:35 ocp-infratest-bootstrap bootkube.sh[88479]: Error: error creating resolv.conf for container 15c194ca0ac78f4d98ded427da2a61d8bd3ceb1bfad84f09949658aa7a59f1fd: lstat /etc/../run/systemd/resolve/stub-resolv.conf: no such file or directory
Nov 22 21:23:35 ocp-infratest-bootstrap systemd[1]: bootkube.service: Main process exited, code=exited, status=127/n/a
Nov 22 21:23:35 ocp-infratest-bootstrap systemd[1]: bootkube.service: Failed with result 'exit-code'.

@vrutkovs
Copy link
Member

Hmm, probably its one of FCOS services creating the broken symlink

@timbrd
Copy link
Author

timbrd commented Nov 23, 2020

Hmm, probably its one of FCOS services creating the broken symlink

Is there anything I can do or test?

@vrutkovs
Copy link
Member

Not sure why DHCP case (on AWS) shows entirely different results. On latest 4.6 nightly I get these on boostrap node:

[core@ip-10-0-27-80 ~]$ ls -la /etc/resolv.conf 
-rw-r--r--. 1 root root 84 Nov 23 09:34 /etc/resolv.conf
[core@ip-10-0-27-80 ~]$ cat /etc/resolv.conf
# Generated by NetworkManager
search us-east-2.compute.internal
nameserver 10.0.0.2
[core@ip-10-0-27-80 ~]$ systemctl status systemd-resolved
● systemd-resolved.service - Network Name Resolution
     Loaded: loaded (/usr/lib/systemd/system/systemd-resolved.service; enabled; vendor preset: enabled)
     Active: active (running) since Mon 2020-11-23 09:34:57 UTC; 3min 0s ago
       Docs: man:systemd-resolved.service(8)
             https://www.freedesktop.org/wiki/Software/systemd/resolved
             https://www.freedesktop.org/wiki/Software/systemd/writing-network-configuration-managers
             https://www.freedesktop.org/wiki/Software/systemd/writing-resolver-clients
   Main PID: 684 (systemd-resolve)
     Status: "Processing requests..."
      Tasks: 1 (limit: 9193)
     Memory: 10.4M
     CGroup: /system.slice/systemd-resolved.service
             └─684 /usr/lib/systemd/systemd-resolved

Nov 23 09:34:57 fedora systemd-resolved[684]: . IN DS 20326 8 2 e06d44b80b8f1d39a95c0b0d7c65d08458e880409bbc683457104237c7f8ec8d
Nov 23 09:34:57 fedora systemd-resolved[684]: Negative trust anchors: 10.in-addr.arpa 16.172.in-addr.arpa 17.172.in-addr.arpa 18.172.in-addr.arpa 19.172.in-ad>
Nov 23 09:34:57 fedora systemd-resolved[684]: Using system hostname 'fedora'.
Nov 23 09:34:57 fedora systemd-resolved[684]: Failed to symlink /run/systemd/resolve/stub-resolv.conf: Permission denied
Nov 23 09:34:57 fedora systemd[1]: Started Network Name Resolution.
Nov 23 09:34:57 ip-10-0-27-80 systemd-resolved[684]: System hostname changed to 'ip-10-0-27-80'.
Nov 23 09:34:57 ip-10-0-27-80 systemd-resolved[684]: Failed to symlink /run/systemd/resolve/stub-resolv.conf: Permission denied
Nov 23 09:34:57 ip-10-0-27-80 systemd-resolved[684]: Failed to symlink /run/systemd/resolve/stub-resolv.conf: Permission denied
Nov 23 09:34:58 ip-10-0-27-80 systemd-resolved[684]: Using degraded feature set UDP instead of UDP+EDNS0 for DNS server 10.0.0.2.
Nov 23 09:35:03 ip-10-0-27-80 systemd-resolved[684]: Using degraded feature set UDP instead of UDP+EDNS0 for DNS server 10.0.0.2.
[core@ip-10-0-27-80 ~]$ ls -la /run/systemd/resolve/
total 4
drwxr-xr-x.  3 systemd-resolve systemd-resolve  80 Nov 23 09:34 .
drwxr-xr-x. 20 root            root            480 Nov 23 09:34 ..
drwx------.  2 systemd-resolve systemd-resolve  60 Nov 23 09:34 netif
-rw-r--r--.  1 systemd-resolve systemd-resolve 651 Nov 23 09:34 resolv.conf
[core@ip-10-0-27-80 ~]$ sudo systemd-resolve --statistics
DNSSEC supported by current servers: no

Transactions             
Current Transactions: 0  
  Total Transactions: 686
                         
Cache                    
  Current Cache Size: 7  
          Cache Hits: 607
        Cache Misses: 82 
                         
DNSSEC Verdicts          
              Secure: 0  
            Insecure: 0  
               Bogus: 0  
       Indeterminate: 0 

Could you run the same commands on your install?

@timbrd
Copy link
Author

timbrd commented Nov 23, 2020

[root@ocp-infratest-bootstrap ~]# ls -la /etc/resolv.conf
lrwxrwxrwx. 1 root root 39 Nov 22 21:17 /etc/resolv.conf -> ../run/systemd/resolve/stub-resolv.conf

[root@ocp-infratest-bootstrap ~]# cat /etc/resolv.conf
cat: /etc/resolv.conf: No such file or directory

[root@ocp-infratest-bootstrap ~]# systemctl status systemd-resolved
● systemd-resolved.service - Network Name Resolution
     Loaded: loaded (/usr/lib/systemd/system/systemd-resolved.service; enabled; vendor preset: enabled)
     Active: active (running) since Sun 2020-11-22 21:17:36 UTC; 12h ago
       Docs: man:systemd-resolved.service(8)
             https://www.freedesktop.org/wiki/Software/systemd/resolved
             https://www.freedesktop.org/wiki/Software/systemd/writing-network-configuration-managers
             https://www.freedesktop.org/wiki/Software/systemd/writing-resolver-clients
   Main PID: 894 (systemd-resolve)
     Status: "Processing requests..."
      Tasks: 1 (limit: 9458)
     Memory: 11.0M
     CGroup: /system.slice/systemd-resolved.service
             └─894 /usr/lib/systemd/systemd-resolved

Nov 22 21:17:36 ocp-infratest-bootstrap systemd[1]: Starting Network Name Resolution...
Nov 22 21:17:36 ocp-infratest-bootstrap systemd-resolved[894]: Positive Trust Anchors:
Nov 22 21:17:36 ocp-infratest-bootstrap systemd-resolved[894]: . IN DS 20326 8 2 e06d44b80b8f1d39a95c0b0d7c65d08458e880409bbc683457104237c7f8ec8d
Nov 22 21:17:36 ocp-infratest-bootstrap systemd-resolved[894]: Negative trust anchors: 10.in-addr.arpa 16.172.in-addr.arpa 17.172.in-addr.arpa 18.172.in-addr.arpa 19.172.in-addr.arpa 20.172.in-addr.a>
Nov 22 21:17:36 ocp-infratest-bootstrap systemd-resolved[894]: Using system hostname 'ocp-infratest-bootstrap'.
Nov 22 21:17:36 ocp-infratest-bootstrap systemd-resolved[894]: Failed to symlink /run/systemd/resolve/stub-resolv.conf: Permission denied
Nov 22 21:17:36 ocp-infratest-bootstrap systemd[1]: Started Network Name Resolution.
Nov 22 21:17:36 ocp-infratest-bootstrap systemd-resolved[894]: Failed to symlink /run/systemd/resolve/stub-resolv.conf: Permission denied
Nov 22 21:17:36 ocp-infratest-bootstrap systemd-resolved[894]: Failed to symlink /run/systemd/resolve/stub-resolv.conf: Permission denied

[root@ocp-infratest-bootstrap ~]# ls -la /run/systemd/resolve/
total 4
drwxr-xr-x.  3 systemd-resolve systemd-resolve  80 Nov 22 21:17 .
drwxr-xr-x. 20 root            root            480 Nov 22 21:17 ..
drwx------.  2 systemd-resolve systemd-resolve  60 Nov 22 21:17 netif
-rw-r--r--.  1 systemd-resolve systemd-resolve 599 Nov 22 21:17 resolv.conf

[root@ocp-infratest-bootstrap ~]# systemd-resolve --statistics
DNSSEC supported by current servers: no

Transactions
Current Transactions: 0
  Total Transactions: 58

Cache
  Current Cache Size: 2
          Cache Hits: 10
        Cache Misses: 48

DNSSEC Verdicts
              Secure: 0
            Insecure: 0
               Bogus: 0
       Indeterminate: 0

@vrutkovs
Copy link
Member

Odd. Do you use F32 FCOS initial image (current stable) or F33 from testing/next?

@timbrd
Copy link
Author

timbrd commented Nov 23, 2020

Odd. Do you use F32 FCOS initial image (current stable) or F33 from testing/next?

It is Fedora CoreOS 33 from testing:

[core@ocp-infratest-bootstrap ~]$ cat /etc/os-release
NAME=Fedora
VERSION="33.20201121.10.0 (CoreOS)"
ID=fedora
VERSION_ID=33
VERSION_CODENAME=""
PLATFORM_ID="platform:f33"
PRETTY_NAME="Fedora CoreOS 33.20201121.10.0"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:33"
HOME_URL="https://getfedora.org/coreos/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora-coreos/"
SUPPORT_URL="https://github.com/coreos/fedora-coreos-tracker/"
BUG_REPORT_URL="https://github.com/coreos/fedora-coreos-tracker/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=33
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=33
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
VARIANT="CoreOS"
VARIANT_ID=coreos
OSTREE_VERSION='33.20201121.10.0'

@vrutkovs
Copy link
Member

Aha, okay, that's not what we're going with on OKD 4.6 yet (but will hit soon). Does it work with current FCOS Stable (its still 32 based)

I think FCOS shouldn't make the symlink (systemd-resolved should do this automatically), so we might need a fix for coreos-migrate-to-systemd-resolved instead.

@timbrd
Copy link
Author

timbrd commented Nov 23, 2020

Oh, I thought OKD 4.6 needs FCOS 33. I have checked the latest okd 4.6 build for the current fcos image, which was 33.20201121.10:
https://origin-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.6.0-0.okd/release/4.6.0-0.okd-2020-11-18-085718

Which exact fcos 32 release should I use then? Is 32.20200629.3, the fcos release used by the latest stable okd 4.5, still a valid release?

@vrutkovs
Copy link
Member

Oh, I thought OKD 4.6 needs FCOS 33

Initial FCOS image doesn't matter (well, except this case :) ), machines would be updated to machine-os-content image (which is F33-based now).

Which exact fcos 32 release should I use then?

We're testing with 32.20201104.3.0 from https://getfedora.org/en/coreos?stream=stable

@timbrd
Copy link
Author

timbrd commented Nov 26, 2020

Sorry for the delay.
Using fcos 32 and the okd 4.6.0-0-2020-11-22-200916 testing release, I was able to start the cluster initialization.
The installation does not succeed since the master nodes are not able to start the sdn pods for some reason. But this is clearly not an issue with systemd-resolve.

@vrutkovs
Copy link
Member

vrutkovs commented Dec 4, 2020

Checking recent fixes to okd-machine-os have resolved this when starting with Fedora 33

@vrutkovs vrutkovs reopened this Dec 4, 2020
@timbrd
Copy link
Author

timbrd commented Dec 5, 2020

Checking recent fixes to okd-machine-os have resolved this when starting with Fedora 33

Which fcos 33 version should be tested? Do the latest testing or nextstream releases (33.20201201.2.0 or 33.20201130.1.0) include the fixes you mentioned?

@vrutkovs
Copy link
Member

vrutkovs commented Dec 5, 2020

33.20201201.2.0 or 33.20201130.1.0

Could you give any of these a try on latest 4.6 or 4.7 nightlies?

@danielchristianschroeter
Copy link

danielchristianschroeter commented Dec 25, 2020

I tested with FCOS 33.20201214.2.0 and 4.6.0-0.okd-2020-12-12-135354 (bare matal) with the following coreos-installer parameters:
sudo coreos-installer install /dev/sda --insecure-ignition --copy-network --ignition-url http://***/list/***-okd/***/bootstrap.ign --append-karg="ip=10.1.232.57::10.1.232.1:255.255.255.0:k8s-bootstrap-1-01.okd.***:ens160:none:10.1.231.85:10.1.231.5"

install-config.yaml

apiVersion: v1
baseDomain: ***
proxy:
  httpProxy: http://***-proxy-01.***:3128
  httpsProxy: http://***-proxy-01.***:3128
  noProxy: localhost,127.0.0.0/8,::1/128,***,***,okd.***,10.1.232.0/24
additionalTrustBundle: |
  -----BEGIN CERTIFICATE-----
***
  -----END CERTIFICATE-----
compute:
- hyperthreading: Enabled
  name: worker
  replicas: 0
controlPlane:
  hyperthreading: Enabled
  name: master
  replicas: 3
metadata:
  name: okd
networking:
  clusterNetwork:
  - cidr: 10.200.0.0/16
    hostPrefix: 21
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
platform:
  none: {}
fips: false
pullSecret: '***' 
sshKey: 'ssh-ed25519 AAAA***'

/etcd/resolv.conf is gone after the bootstrap machine reboots.

[core@k8s-bootstrap-1-01 ~]$ cat /etc/resolv.conf 
cat: /etc/resolv.conf: No such file or directory

After I added the namservers manually again to the resolv.conf, the bootstrap process seems to continue:

Dez 25 19:53:27 k8s-bootstrap-1-01.okd.*** release-image-download.sh[758]: Pull failed. Retrying quay.io/openshift/okd@sha256:01948f4c6bdd85cdd212eb40d96527a53d6382c4489d7da57522864178620a2c...
Dez 25 19:53:27 k8s-bootstrap-1-01.okd.*** release-image-download.sh[535281]: Error: Error initializing source docker://quay.io/openshift/okd@sha256:01948f4c6bdd85cdd212eb40d96527a53d6382c4489d7da57522864178620a2c: error pinging docker registry quay.io: Get "https://quay.io/v2/": proxyconnect tcp: dial tcp: lookup ***-proxy-01.*** on [::1]:53: read udp [::1]:38212->[::1]:53: read: connection refused
Dez 25 19:53:27 k8s-bootstrap-1-01.okd.*** release-image-download.sh[758]: Pull failed. Retrying quay.io/openshift/okd@sha256:01948f4c6bdd85cdd212eb40d96527a53d6382c4489d7da57522864178620a2c...

Dez 25 19:53:29 k8s-bootstrap-1-01.okd.*** podman[535327]: 2020-12-25 19:53:29.171957528 +0000 UTC m=+1.291813627 image pull  
Dez 25 19:53:29 k8s-bootstrap-1-01.okd.*** release-image-download.sh[535327]: 9b89d802f81dc4c465222c4f0389a527b3973c6c9dde5238b443e6c2e92b2109
Dez 25 19:53:29 k8s-bootstrap-1-01.okd.*** systemd[1]: Finished Download the OpenShift Release Image.

Update: I tested the installation also with version 4.6.0-0.okd-2020-12-21-142926 (https://origin-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.6.0-0.okd/release/4.6.0-0.okd-2020-12-21-142926). In this release the issue seems to be resolved. The /etc/resolv.conf is still available after the the bootstrap reboots.

[core@k8s-bootstrap-1-01 ~]$ cat /etc/resolv.conf 
# This file is managed by man:systemd-resolved(8). Do not edit.
#
# This is a dynamic resolv.conf file for connecting local clients directly to
# all known uplink DNS servers. This file lists all configured search domains.
#
# Third party programs should typically not access this file directly, but only
# through the symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a
# different way, replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.

nameserver 10.1.231.85
nameserver 10.1.231.5

@vrutkovs
Copy link
Member

vrutkovs commented Jan 7, 2021

The /etc/resolv.conf is still available after the the bootstrap reboots.

Perfect, thank you. Closing this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants