-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OKD 4.6: Missing /etc/resolv.conf when using static ip configuration #380
Comments
systemd-resolved is expected to be disabled (it breaks hostname resolution later). Please provide log-bundle |
I wasn't able to use You find the bootkube.service logs attached. All the container logs are empty and the kubelet and crio logs do not contain any interesting information:
It seems to break at the very beginning of the bootstrap process. |
That's odd - could you attach the whole journalctl output for this boot? I expected Perhaps some of these settings are overwritten on target system? |
I have attached the whole journalctl output: boot.log Btw, I have recreated the cluster from scratch before and checked if the settings you mentioned differ, but they don't:
|
I think this might be a podman issue. Maybe @haircommander can help. |
I may not be the best person to help, but I think any podman person who would be would need to know where this bootkube.sh file lives and what podman is doing/expected to be doing with this resolv.conf |
Oh,
It sets I don't think it has anything to do with podman |
@timbrd could you give https://amd64.origin.releases.ci.openshift.org/releasestream/4.6.0-0.okd/release/4.6.0-0.okd-2020-11-21-155444 a try? okd-machine-os should now remove all traces of systemd-resolved. It works with DHCP (on AWS and GCP), so hopefully it works with kernel args as well |
Unfortunately still the same error:
I have extracted the new installer and used it to generate the ignition configs.
|
That's odd, it should have existed - previous version also had
which created the borked symlink |
Okay, but just removing the symlink does not solve the problem which prevents podman to start any containers? If I understand the linked issue correctly, podman expects either the file |
After creating a valid /etc/resolv.conf, podman is able to start the containers. |
IIUC here's what's happening:
Could you test two different payloads (use the same installer with
Unfortunately, we don't have a good CI system setup to try out bare metal UPI without DHCP, so I'm hesitant to merge openshift/okd-machine-os#14 or 15 just yet. |
Thanks for the update. |
Thanks!
The images are mirrored to quay.io, so its actually best to leave registry.build01 blocked, so that quay mirror would be used |
Okay, what do I have to change for using the quay.io mirror?
|
add |
According to quay.io the hash should be |
Oh, oops, yeah - try digests for these:
|
Hm, it still doesn't work. The release-image service says it would download the image with the correct digest, but podman then tries to download the old one.
Edit: It seems, I also have to override
Is there a way to override the mco image variable? |
Hrm, okay, I think we'll go with openshift/okd-machine-os#15 sooner or later anyway |
https://amd64.origin.releases.ci.openshift.org/releasestream/4.6.0-0.okd/release/4.6.0-0.okd-2020-11-22-200916 includes this fix, could you give a try? |
The stub-listener doesn't exist anymore (the broken symlink has been created though) and the containers still do not start.
|
Hmm, probably its one of FCOS services creating the broken symlink |
Is there anything I can do or test? |
Not sure why DHCP case (on AWS) shows entirely different results. On latest 4.6 nightly I get these on boostrap node:
Could you run the same commands on your install? |
|
Odd. Do you use F32 FCOS initial image (current stable) or F33 from testing/next? |
It is Fedora CoreOS 33 from testing:
|
Aha, okay, that's not what we're going with on OKD 4.6 yet (but will hit soon). Does it work with current FCOS Stable (its still 32 based) I think FCOS shouldn't make the symlink (systemd-resolved should do this automatically), so we might need a fix for coreos-migrate-to-systemd-resolved instead. |
Oh, I thought OKD 4.6 needs FCOS 33. I have checked the latest okd 4.6 build for the current fcos image, which was 33.20201121.10: Which exact fcos 32 release should I use then? Is 32.20200629.3, the fcos release used by the latest stable okd 4.5, still a valid release? |
Initial FCOS image doesn't matter (well, except this case :) ), machines would be updated to machine-os-content image (which is F33-based now).
We're testing with |
Sorry for the delay. |
Checking recent fixes to okd-machine-os have resolved this when starting with Fedora 33 |
Which fcos 33 version should be tested? Do the latest testing or nextstream releases (33.20201201.2.0 or 33.20201130.1.0) include the fixes you mentioned? |
Could you give any of these a try on latest 4.6 or 4.7 nightlies? |
I tested with FCOS 33.20201214.2.0 and 4.6.0-0.okd-2020-12-12-135354 (bare matal) with the following coreos-installer parameters: install-config.yaml
/etcd/resolv.conf is gone after the bootstrap machine reboots.
After I added the namservers manually again to the resolv.conf, the bootstrap process seems to continue:
Update: I tested the installation also with version 4.6.0-0.okd-2020-12-21-142926 (https://origin-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.6.0-0.okd/release/4.6.0-0.okd-2020-12-21-142926). In this release the issue seems to be resolved. The /etc/resolv.conf is still available after the the bootstrap reboots.
|
Perfect, thank you. Closing this |
Hello,
I am currently testing the okd nightly release 4.6.0-0.okd-2020-11-18-085718.
Since I would like to use static ip configuration, I have added the following kargs to the bootstrap node:
Everything worked fine at first, but after the first reboot, systemd-resolved wasn't starting anymore:
There is also the
/run/systemd/resolve
directory missing, which the resolv.conf links to:Name resolution still works (the bootkube service can download the required container images), but the containers expect the hosts resolv.conf to be mounted:
I'm not sure if this is a bug or if I am missing something.
Thanks
Tim
The text was updated successfully, but these errors were encountered: