-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hostname changes updating a node to latest stable #1385
Comments
When I change it with |
Hello @till, I just tested with a Flatcar instance on OpenStack and QEMU and I can't reproduce:
The hostname stays the same between the Ignition configuration and the actual hostname Is that possible you have a third party tools that manipulates the hostname on these nodes? EDIT: Tried the upgrade path from 3510.2.2 to 3815.2.0 and it works too. |
Hello, this might be an issue with how coreos-cloudinit handles the metadata coming from OpenStack. |
Would coreos-cloudinit run if the instance is provisioned with ignition? @till can you upload a full journalctl from the relevant boots? And be a aware that FQDN hostnames are not recommended or well supported, the hostname really should be the first component of the fqdn. |
|
For the record, this worked for forever. I don't remember what we did before Flatcar, but we've been using this since 2020. ;) On to the logs: I couldn't find anything (quickly) with
I looked at the user data (configdrive too): "storage": {
"files": [
{
"overwrite": true,
"path": "/etc/hostname",
"contents": {
"compression": "",
"source": "data:,node-001.docker"
},
"mode": 420
}, Is there more/anything specific that I can share? |
Indeed cloudinit seems to be the culprit. I don't follow why we would want cloudinit to run on every boot on a system provisioned with ignition. @ader1990? @gabriel-samfira? |
@till, can you check if "/media/configdrive/openstack/latest/meta_data.json" contains The last change to coreos-cloudinit was this one https://github.com/flatcar/coreos-cloudinit/pull/19/files + flatcar/scripts@8f44cbf, but this change should not produce this behaviour. It looks to be a change now on the service ordering: ignition service runs before coreos-cloudinit or coreos-cloudinit runs at every boot which did not happen before? I will need to reproduce the behaviour first to take a better look. Would be helpfull to have the "/media/configdrive/openstack/latest/meta_data.json" content for reproduction. Thanks. |
@ader1990 it contains two variants: {
"uuid": "ABC",
"meta": {
"group": "customer",
"label": "docker",
"role": "worker",
"origin": "foo"
},
"admin_pass": "ABC",
"hostname": "node-001-docker",
"name": "node-001.docker",
"launch_index": 0,
"availability_zone": "nova",
"random_seed": "ABC",
"project_id": "ABC",
"devices": [],
"dedicated_cpus": []
} From your PR I don't immediately see how I also looked at Gophercloud if there's something that sets a default, derived from the name. But I don't see anything that "resembles" hostname with a cursory search: https://pkg.go.dev/github.com/gophercloud/gophercloud/openstack/compute/v2/servers#CreateOpts But then again, why would it work as expected on an older release and start doing that now? |
That's correct, Cloudinit is not supposed to run if the system is provisioned with Ignition: https://github.com/flatcar/init/blob/7e30bf5baa1abc5113024f2238d9c235aedaf62e/systemd/system/enable-oem-cloudinit.service#L8-L10 |
It's openstack metadata that is doing the translation. The change to flatcar is that we now apply the metadata hostname on every boot. The question is whether this is intentional...
This is the unit doing the applying: https://github.com/flatcar/coreos-cloudinit/blob/flatcar-master/units/user-configdrive.service |
Yes, of course. What I meant to add is, I couldn't find anything obvious how to set the |
The metadata service also responds with the broken hostname. |
Normally it shouldn't need to be run on every boot, unless we rely on it to apply networking info (including hostname). Also, if I remember correctly, if it detects anything other than cloud-init userdata, it should do nothing. If we're talking about OpenStack, we enabled it there to allow flatcar to deal with cloud-init style metadata. The idea was to allow better compatibility. The old kops issue comes to mind. Most tools target cloud-init, more so in private clouds. I can debug this tomorrow.
Interesting. I need to see the order of precedence. If I remember correctly, user defined hostnames via userdata should take precedence over metadata. Although, metadata value being |
Is there anything I should add to ignition to force my hostname? I think so far I only write to |
Hi @till First and foremost, my apologies for the inconvenience cause by this. Give me a couple of days to track down this issue. I remember bits a pieces from a while ago in regards to setting the hostname (there were some issues when setting fqdn as a hostname as opposed to short form hostnames and adding the fqdn in /etc/hosts). I also remember we had coreos-cloudinit run if we detected cloud-init metadata, but on OpenStack we may have enabled it to always run. I need to track down that discussion. In the meantime, a few questions:
curl http://169.254.169.254/latest/meta-data/hostname && echo
openstack --os-compute-api-version 2.90 server set --hostname node-001 <YOUR_VM_ID> @jepio @pothos I think it may be worth starting a discussion in regards to enabling a better way to toggle the use of coreos-cloudinit. Perhaps only start it when cloud-init specific metadata is present, or some other hint. |
The output is:
This is also rather interesting:
It seems like it'll use I am not sure how to follow the |
The openstack metadata service will default to the instance name if no hostname is explicitly set when you A really dirty trick that you don't need to do, but would probably work is to also replace your current metadata with: #cloud-init
hostname: node-001.docker But I am not sure if anything would break for your instance. Perhaps try it on a test VM if you have one available. There is a high chance that the short form of that hostname will be set. In any case, I will investigate in the following days. I've set up a local OpenStack to test on my side. |
Another really ugly trick (me ducks for cover) can be done if you're not currently using config drive. In theory, from ignition you can write For a long term fix I am looking at the Will also add a kill switch like cloud-init has for situations in which we know we need to disable |
@gabriel-samfira thanks for looking into it. I made a PR to Gophercloud to set the hostname going forward. And I'll see what else I can do in terms of user-data/ignition. I think my only question/concern right now is, how do I fix nodes that will exhibit this problem when we upgrade? I can't rebuild everything. I am almost sure that OpenStack won't allow me to "patch" the hostname field after an instance is launched. |
@till clearly a proper fix won't have you manually patching things. These are just debug steps that help us get a sense of where this issue is happening and why. I managed to boot a flatcar In that case, the flatcar ~ # cat /etc/.ignition-result.json
{
"provisioningBootID": "e2ca2859-7a42-4fb1-9d99-6de31e48c8d7",
"provisioningDate": "2024-03-06T09:46:13Z",
"userConfigProvided": false
} Notice the According to the existing enable-oem-cloudinit.service config, So both conditions (is first boot and I then created a butane config with the following content: variant: flatcar
version: 1.0.0
passwd:
users:
- name: core
ssh_authorized_keys:
- ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQC2oT7j/+elHY9U2ibgk2RYJgCvqIwewYKJTtHslTQFDWlHLeDam93BBOFlQJm9/wKX/qjC8d26qyzjeeeVf2EEAztp+jQfEq9OU+EtgQUi589jxtVmaWuYED8KVNbzLuP79SrBtEZD4xqgmnNotPhRshh3L6eYj4XzLWDUuOD6kzNdsJA2QOKeMOIFpBN6urKJHRHYD+oUPUX1w5QMv1W1Srlffl4m5uE+0eJYAMr02980PG4+jS4bzM170wYdWwUI0pSZsEDC8Fn7jef6QARU2CgHJYlaTem+KWSXislOUTaCpR0uhakP1ezebW20yuuc3bdRNgSlZi9B7zAPALGZpOshVqwF+KmLDi6XiFwG+NnwAFa6zaQfhOxhw/rF5Jk/wVjHIHkNNvYewycZPbKui0E3QrdVtR908N3VsPtLhMQ59BEMl3xlURSi0fiOU3UjnwmOkOoFDy/WT8qk//gFD93tUxlf4eKXDgNfME3zNz8nVi2uCPvG5NT/P/VWR8NMqW6tZcmWyswM/GgL6Y84JQ3ESZq/7WvAetdc1gVIDQJ2ejYbSHBcQpWvkocsiuMTCwiEvQ0sr+UE5jmecQvLPUyXOhuMhw43CwxnLk1ZSeYeCorxbskyqIXH71o8zhbPoPiEbwgB+i9WEoq02u7c8CmCmO8Y9aOnh8MzTKxIgQ== [email protected] and booted a new server with the transpiled ignition config. The server came up and the flatcar ~ # cat /etc/.ignition-result.json
{
"provisioningBootID": "7db0409f-cc73-4334-94af-55bb4d2cdc2c",
"provisioningDate": "2024-03-06T12:56:16Z",
"userConfigProvided": true
} In this case, the Mar 06 13:03:47 flatcar.novalocal systemd[1]: enable-oem-cloudinit.service: Skipped due to 'exec-condition'.
Mar 06 13:03:47 flatcar.novalocal systemd[1]: Condition check resulted in enable-oem-cloudinit.service - Enable cloudinit being skipped. So your case is interesting. By all accounts, if you've configured those instances with ignition, your To debug this a bit further, I have a few questions/requests (if possible):
systemctl status oem-cloudinit.service On a system that is not upgraded and on a system after it was upgraded.
|
As a second set of tests, I will try to replicate your scenario. Boot a |
@gabriel-samfira are you booting with a /media/configdrive/? There appear to be additional services that launch coreos-cloudinit (like user-configdrive.service) |
@jepio in my case, I'm using the standard Openstack metadata service. So coreos-cloudinit enablement seems to be properly handled. When ignition configures the system, coreos-cloudinit is skipped. Will try with config drive as well. I hadn't considered different behavior in case of config drive. @till are you using config drive? |
@gabriel-samfira Yes, using config drive and ignition. |
And there we have it: Mar 06 17:27:59 localhost systemd[1]: Starting user-configdrive.service - Load cloud-config from /media/configdrive...
Mar 06 17:27:59 localhost coreos-cloudinit[1235]: 2024/03/06 17:27:59 Checking availability of "cloud-drive"
Mar 06 17:27:59 localhost coreos-cloudinit[1235]: 2024/03/06 17:27:59 Fetching meta-data from datasource of type "cloud-drive"
Mar 06 17:27:59 localhost coreos-cloudinit[1235]: 2024/03/06 17:27:59 Attempting to read from "/media/configdrive/openstack/latest/meta_data.json"
Mar 06 17:27:59 localhost coreos-cloudinit[1235]: 2024/03/06 17:27:59 Fetching user-data from datasource of type "cloud-drive"
Mar 06 17:27:59 localhost coreos-cloudinit[1235]: 2024/03/06 17:27:59 Attempting to read from "/media/configdrive/openstack/latest/user_data" You were right @jepio. I think we can add the same: ExecCondition=/usr/bin/jq -e '.userConfigProvided == false' condition in the |
Tried to modify the released image to test this. Long story, short: hooray for |
This seems to work: gabriel-samfira/coreos-cloudinit@3bbda2f @jepio should I create a PR with this? |
I think it would make sense, but I defer to @pothos he might have more overview. |
But this could also be a drop-in installed through the ebuild. |
This would also need |
@pothos Added this PR: Testing it today |
I wonder if we should always skip this unit for openstack when we already have |
Sorry to intersect here, but Is there anything I can do to override this behavior in the meantime? Prevents us from updating and breaking existing installations. |
You can mask |
@pothos I can confirm that masking works. 👍🏼 I can test the other, could you or someone else provide an ignition example how to include this? I tried fiddling with the drop in, but wasn't able to for some reason. |
This is how it would look with Butane YAML
|
Will be fixed by flatcar/scripts#1790 - I think we can also backport this |
For a backport to Beta/Stable we need backport branches, will look into that on Tuesday. |
Done, should be part of the next round of releases |
Description
The hostname of a node is updated/changed, after the last update.
Impact
Configs are broken.
Environment and steps to reproduce
Here is part of our butane config when we initially create a new node:
Instance boots an older version of Flatcar Linux:
Flatcar Container Linux by Kinvolk stable 3510.2.2 for Openstack
Hostname is correct/as expected:
Then I download updates (
update_engine_client -check_for_updates
) and eventually reboot into the latest stable:Flatcar Container Linux by Kinvolk stable 3815.2.0 for Openstack
Now the hostname is changed:
And
/etc/hostname
is changed as well.Expected behavior
Hostname doesn't change,
/etc/hostname
doesn't change.Additional information
Please add any information here that does not fit the above format.
The text was updated successfully, but these errors were encountered: