Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exoscale build: Resolv.conf not created at first boot #415

Closed
pierre-emmanuelJ opened this issue Mar 9, 2020 · 11 comments
Closed

Exoscale build: Resolv.conf not created at first boot #415

pierre-emmanuelJ opened this issue Mar 9, 2020 · 11 comments

Comments

@pierre-emmanuelJ
Copy link

pierre-emmanuelJ commented Mar 9, 2020

Hello I'm using a build of FCOS for Exoscale Cloud Provider:
#384

/etc/resolv.con is not created at first boot, But if I reboot the instance resolv.conf is created.

Network Manager log output on first boot (journalctl -u NetworkManager.service)

-- Logs begin at Mon 2020-03-09 14:19:08 UTC, end at Mon 2020-03-09 15:10:00 UTC. --
Mar 09 14:19:19 fcos systemd[1]: Starting Network Manager...
Mar 09 14:19:19 fcos NetworkManager[974]: <info>  [1583763559.7661] NetworkManager (version 1.20.10-1.fc31) is starting... (for the first time)
Mar 09 14:19:19 fcos NetworkManager[974]: <info>  [1583763559.7665] Read config: /etc/NetworkManager/NetworkManager.conf (lib: 10-disable-default-plugins.conf, 20-client-id-from-mac.conf) (run: 10-dracut-dhclient.conf)
Mar 09 14:19:19 fcos NetworkManager[974]: <info>  [1583763559.8137] bus-manager: acquired D-Bus service "org.freedesktop.NetworkManager"
Mar 09 14:19:19 fcos systemd[1]: Started Network Manager.
Mar 09 14:19:19 fcos NetworkManager[974]: <info>  [1583763559.8301] manager[0x55e34978a130]: monitoring kernel firmware directory '/lib/firmware'.
Mar 09 14:19:20 fcos NetworkManager[974]: <info>  [1583763560.2339] hostname: hostname: using hostnamed
Mar 09 14:19:20 fcos NetworkManager[974]: <info>  [1583763560.2340] hostname: hostname changed from (none) to "fcos"
Mar 09 14:19:20 fcos NetworkManager[974]: <info>  [1583763560.2342] dns-mgr[0x55e34976f240]: init: dns=default,systemd-resolved rc-manager=symlink
Mar 09 14:19:20 fcos NetworkManager[974]: <info>  [1583763560.2409] manager[0x55e34978a130]: rfkill: Wi-Fi hardware radio set enabled
Mar 09 14:19:20 fcos NetworkManager[974]: <info>  [1583763560.2409] manager[0x55e34978a130]: rfkill: WWAN hardware radio set enabled
Mar 09 14:19:20 fcos NetworkManager[974]: <info>  [1583763560.2459] manager: rfkill: Wi-Fi enabled by radio killswitch; enabled by state file
Mar 09 14:19:20 fcos NetworkManager[974]: <info>  [1583763560.2460] manager: rfkill: WWAN enabled by radio killswitch; enabled by state file
Mar 09 14:19:20 fcos NetworkManager[974]: <info>  [1583763560.2460] manager: Networking is enabled by state file
Mar 09 14:19:20 fcos NetworkManager[974]: <info>  [1583763560.2500] dhcp-init: Using DHCP client 'dhclient'
Mar 09 14:19:20 fcos NetworkManager[974]: <info>  [1583763560.2501] settings: Loaded settings plugin: keyfile (internal)
Mar 09 14:19:20 fcos NetworkManager[974]: <info>  [1583763560.2525] device (lo): carrier: link connected
Mar 09 14:19:20 fcos NetworkManager[974]: <info>  [1583763560.2527] manager: (lo): new Generic device (/org/freedesktop/NetworkManager/Devices/1)
Mar 09 14:19:20 fcos NetworkManager[974]: <info>  [1583763560.2533] device (eth0): carrier: link connected
Mar 09 14:19:20 fcos NetworkManager[974]: <info>  [1583763560.2584] manager: (eth0): new Ethernet device (/org/freedesktop/NetworkManager/Devices/2)
Mar 09 14:19:20 fcos NetworkManager[974]: <info>  [1583763560.2613] settings: (eth0): created default wired connection 'Wired connection 1'
Mar 09 14:19:20 fcos NetworkManager[974]: <info>  [1583763560.2677] device (eth0): state change: unmanaged -> unavailable (reason 'connection-assumed', sys-iface-state: 'external')
Mar 09 14:19:20 fcos NetworkManager[974]: <info>  [1583763560.2703] device (eth0): state change: unavailable -> disconnected (reason 'connection-assumed', sys-iface-state: 'external')
Mar 09 14:19:20 fcos NetworkManager[974]: <info>  [1583763560.2737] device (eth0): Activation: starting connection 'eth0' (d6928523-083c-4b24-a50f-e0a20332cf34)
Mar 09 14:19:20 fcos NetworkManager[974]: <info>  [1583763560.2818] device (eth0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'external')
Mar 09 14:19:20 fcos NetworkManager[974]: <info>  [1583763560.2847] device (eth0): state change: prepare -> config (reason 'none', sys-iface-state: 'external')
Mar 09 14:19:20 fcos NetworkManager[974]: <info>  [1583763560.2857] device (eth0): state change: config -> ip-config (reason 'none', sys-iface-state: 'external')
Mar 09 14:19:20 fcos NetworkManager[974]: <info>  [1583763560.2879] device (eth0): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'external')
Mar 09 14:19:20 fcos NetworkManager[974]: <info>  [1583763560.2918] device (eth0): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'external')
Mar 09 14:19:20 fcos NetworkManager[974]: <info>  [1583763560.2925] device (eth0): state change: secondaries -> activated (reason 'none', sys-iface-state: 'external')
Mar 09 14:19:20 fcos NetworkManager[974]: <info>  [1583763560.2937] manager: NetworkManager state is now CONNECTED_LOCAL
Mar 09 14:19:20 fcos NetworkManager[974]: <info>  [1583763560.2959] device (eth0): Activation: successful, device activated.
Mar 09 14:19:20 fcos NetworkManager[974]: <info>  [1583763560.2973] manager: NetworkManager state is now CONNECTED_GLOBAL
Mar 09 14:19:20 fcos NetworkManager[974]: <info>  [1583763560.2998] manager: startup complete

log after reboot:

-- Reboot --
Mar 09 16:23:10 fcos systemd[1]: Starting Network Manager...
Mar 09 16:23:10 fcos NetworkManager[695]: <info>  [1583770990.6587] NetworkManager (version 1.20.10-1.fc31) is starting... (for the first time)
Mar 09 16:23:10 fcos NetworkManager[695]: <info>  [1583770990.6590] Read config: /etc/NetworkManager/NetworkManager.conf (lib: 10-disable-default-plugins.conf, 20-client-id-from-mac.conf)
Mar 09 16:23:10 fcos systemd[1]: Started Network Manager.
Mar 09 16:23:10 fcos NetworkManager[695]: <info>  [1583770990.7027] bus-manager: acquired D-Bus service "org.freedesktop.NetworkManager"
Mar 09 16:23:10 fcos NetworkManager[695]: <info>  [1583770990.7163] manager[0x5634a10d60d0]: monitoring kernel firmware directory '/lib/firmware'.
Mar 09 16:23:11 fcos NetworkManager[695]: <info>  [1583770991.0550] hostname: hostname: using hostnamed
Mar 09 16:23:11 fcos NetworkManager[695]: <info>  [1583770991.0551] hostname: hostname changed from (none) to "fcos"
Mar 09 16:23:11 fcos NetworkManager[695]: <info>  [1583770991.0553] dns-mgr[0x5634a10bc240]: init: dns=default,systemd-resolved rc-manager=symlink
Mar 09 16:23:11 fcos NetworkManager[695]: <info>  [1583770991.0616] manager[0x5634a10d60d0]: rfkill: Wi-Fi hardware radio set enabled
Mar 09 16:23:11 fcos NetworkManager[695]: <info>  [1583770991.0616] manager[0x5634a10d60d0]: rfkill: WWAN hardware radio set enabled
Mar 09 16:23:11 fcos NetworkManager[695]: <info>  [1583770991.0688] manager: rfkill: Wi-Fi enabled by radio killswitch; enabled by state file
Mar 09 16:23:11 fcos NetworkManager[695]: <info>  [1583770991.0690] manager: rfkill: WWAN enabled by radio killswitch; enabled by state file
Mar 09 16:23:11 fcos NetworkManager[695]: <info>  [1583770991.0696] manager: Networking is enabled by state file
Mar 09 16:23:11 fcos NetworkManager[695]: <info>  [1583770991.0702] dhcp-init: Using DHCP client 'internal'
Mar 09 16:23:11 fcos NetworkManager[695]: <info>  [1583770991.0712] settings: Loaded settings plugin: keyfile (internal)
Mar 09 16:23:11 fcos NetworkManager[695]: <info>  [1583770991.0740] device (lo): carrier: link connected
Mar 09 16:23:11 fcos NetworkManager[695]: <info>  [1583770991.0748] manager: (lo): new Generic device (/org/freedesktop/NetworkManager/Devices/1)
Mar 09 16:23:11 fcos NetworkManager[695]: <info>  [1583770991.0785] manager: (eth0): new Ethernet device (/org/freedesktop/NetworkManager/Devices/2)
Mar 09 16:23:11 fcos NetworkManager[695]: <info>  [1583770991.0860] settings: (eth0): created default wired connection 'Wired connection 1'
Mar 09 16:23:11 fcos NetworkManager[695]: <info>  [1583770991.0865] device (eth0): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external')
Mar 09 16:23:11 fcos NetworkManager[695]: <info>  [1583770991.0873] device (eth0): carrier: link connected
Mar 09 16:23:11 fcos NetworkManager[695]: <info>  [1583770991.0914] device (eth0): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'managed')
Mar 09 16:23:11 fcos NetworkManager[695]: <info>  [1583770991.0922] policy: auto-activating connection 'Wired connection 1' (563282ca-6ff1-3cdf-a0c0-25481e2ebc70)
Mar 09 16:23:11 fcos NetworkManager[695]: <info>  [1583770991.0952] device (eth0): Activation: starting connection 'Wired connection 1' (563282ca-6ff1-3cdf-a0c0-25481e2ebc70)
Mar 09 16:23:11 fcos NetworkManager[695]: <info>  [1583770991.0953] device (eth0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
Mar 09 16:23:11 fcos NetworkManager[695]: <info>  [1583770991.0977] manager: NetworkManager state is now CONNECTING
Mar 09 16:23:11 fcos NetworkManager[695]: <info>  [1583770991.0980] device (eth0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Mar 09 16:23:11 fcos NetworkManager[695]: <info>  [1583770991.1062] device (eth0): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Mar 09 16:23:11 fcos NetworkManager[695]: <info>  [1583770991.1072] dhcp4 (eth0): activation: beginning transaction (timeout in 45 seconds)
Mar 09 16:23:11 fcos NetworkManager[695]: <info>  [1583770991.1209] dhcp4 (eth0): state changed unknown -> bound
Mar 09 16:23:11 fcos NetworkManager[695]: <info>  [1583770991.1219] device (eth0): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed')
Mar 09 16:23:11 fcos NetworkManager[695]: <info>  [1583770991.1265] device (eth0): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed')
Mar 09 16:23:11 fcos NetworkManager[695]: <info>  [1583770991.1268] device (eth0): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed')
Mar 09 16:23:11 fcos NetworkManager[695]: <info>  [1583770991.1273] manager: NetworkManager state is now CONNECTED_LOCAL
Mar 09 16:23:11 fcos NetworkManager[695]: <info>  [1583770991.1283] manager: NetworkManager state is now CONNECTED_SITE
Mar 09 16:23:11 fcos NetworkManager[695]: <info>  [1583770991.1284] policy: set 'Wired connection 1' (eth0) as default for IPv4 routing and DNS
Mar 09 16:23:11 fcos NetworkManager[695]: <info>  [1583770991.1327] device (eth0): Activation: successful, device activated.
Mar 09 16:23:11 fcos NetworkManager[695]: <info>  [1583770991.1350] manager: NetworkManager state is now CONNECTED_GLOBAL
Mar 09 16:23:11 fcos NetworkManager[695]: <info>  [1583770991.1353] manager: startup complete

Do you think can be a race condition with the DHCP server (option-6)?

Thnak you 😃

@jlebon
Copy link
Member

jlebon commented Mar 9, 2020

Hmm, for comparison, do you have this issue with the traditional Fedora cloud image? (I'm assuming cloud-init already supports Exoscale.)

@pierre-emmanuelJ
Copy link
Author

Yes cloud-init work.
I've just tested with the raw image:
https://download.fedoraproject.org/pub/fedora/linux/releases/31/Cloud/x86_64/images/Fedora-Cloud-Base-31-1.9.x86_64.raw.xz
I converted it in qcow2 and register as a custom cloud image in Exoscale.

In conclusion:
I don't have the issue resolve.conf is created! So I can directly resolve a domain name at first boot.

[fedora@Ftest ~]$ ping getfedora.org
PING getfedora.org (209.132.190.2) 56(84) bytes of data.
64 bytes from proxy13-rdu02.fedoraproject.org (209.132.190.2): icmp_seq=1 ttl=49 time=103 ms

@lucab
Copy link
Contributor

lucab commented Mar 10, 2020

I have a strong feeling this is the same issue we are seeing on Azure: #356.
If so, the common root-cause fix would be #394.

I'm less sure about why we don't see this more often, though. That is:

  • why we don't experience this on other DHCP-based platforms (e.g. AWS)
  • why this only happens on first-boot (i.e. to the best of my knowledge, we always bring up network via DHCP in initramfs, not only on first boot)

@darkmuggle @dustymabe any clues on the two doubts above?

@jlebon
Copy link
Member

jlebon commented Mar 10, 2020

Ahh thanks, totally forgot about #356!

why this only happens on first-boot (i.e. to the best of my knowledge, we always bring up network via DHCP in initramfs, not only on first boot)

We do not actually, see https://github.com/coreos/coreos-assembler/blob/378fb9d7670a32b1c16835739e82c15bb0e8f6aa/src/grub.cfg#L43-L58.

@darkmuggle
Copy link
Contributor

Does anyone have an Exoscale account for debugging? At the very least I would like to see the journal and the dhcp server responses.

FWIW, I don't think that putting the NM in the Initramfs will fix this. I suspect that the issue is that we're getting N DHCP responses and only the last response is lacking a resolver.

@dustymabe
Copy link
Member

FWIW, I don't think that putting the NM in the Initramfs will fix this. I suspect that the issue is that we're getting N DHCP responses and only the last response is lacking a resolver.

i.e. like I described in #393 (comment) ?

@darkmuggle
Copy link
Contributor

FWIW, I don't think that putting the NM in the Initramfs will fix this. I suspect that the issue is that we're getting N DHCP responses and only the last response is lacking a resolver.

i.e. like I described in #393 (comment) ?

Yup :)

@pierre-emmanuelJ
Copy link
Author

Yes we can provide you an Exoscale Dev account. PM me pierro777 on freenode

@dadux
Copy link

dadux commented Mar 25, 2020

why we don't experience this on other DHCP-based platforms (e.g. AWS)

I've just ran into the problem on AWS. No resolv.conf on first boot.

Version : fedora-coreos-31.20200310.2.0-hvm

@dustymabe
Copy link
Member

@dadux I just launched the latest stable (31.20200223.3.0) and the latest testing (31.20200310.2.0) images in AWS and I don't see this problem. Can you open a new issue where we can debug it further?

@dustymabe
Copy link
Member

We are now using NetworkManager in the initramfs and also propagating network information from the initramfs (kargs) when appropriate, which we think fixes this issue.

See #394 (comment) and the preceding discussion for more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants