Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zgenhostid: accept hostid arguments equal to zero. #11189

Closed
wants to merge 2 commits into from

Conversation

ericonr
Copy link
Contributor

@ericonr ericonr commented Nov 10, 2020

Motivation and Context

A common usage pattern for zgenhostid, including in the ZFS dracut
module, is running it as:

zgenhostid $(hostid)

However, zgenhostid only accepted hostid arguments greater than 0, which
meant that, when the output of hostid(1) was "00000000", zgenhostid
would error out, even though 0 is a possible return value for the
gethostid(3) function used by hostid(1):

  • On current musl libc, gethostid(3) is a stub that always returns 0.
  • On glibc, gethostid(3) will return 0 if /etc/hostid exists but is
    smaller than 4 bytes.

Fixes #11174

I should note that the current mmp_set_hostid test in tests/zfs-tests/tests/functional/mmp/mmp.kshlib is simply broken on musl-based systems, due to using the hostid(1) utility. An alternative to this PR is extending zgenhostid to read /etc/hostid by itself instead of relying on the external hostid utility, since zfs already avoids using libc's gethostid(3) in libspl.

Description

In these cases, it makes more sense for zgenhostid to treat a value of 0
as other parts of the zfs codebase do, meaning that a hostid value
couldn't be determined; therefore, it should attempt to generate a
random value to write into /etc/hostid.

How Has This Been Tested?

I ran it as zgenhostid -o /tmp/hostid 00000000 and /tmp/hostid was now populated with random values.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

@behlendorf
Copy link
Contributor

@gyakovlev would you mind looking at this change to zgenhostid. Given the described use case this sounds pretty reasonable to me.

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Nov 10, 2020
A common usage pattern for zgenhostid, including in the ZFS dracut
module, is running it as:

  zgenhostid $(hostid)

However, zgenhostid only accepted hostid arguments greater than 0, which
meant that, when the output of hostid(1) was "00000000", zgenhostid
would error out, even though 0 is a possible return value for the
gethostid(3) function used by hostid(1):

- On current musl libc, gethostid(3) is a stub that always returns 0.
- On glibc, gethostid(3) will return 0 if /etc/hostid exists but is
  smaller than 4 bytes.

In these cases, it makes more sense for zgenhostid to treat a value of 0
as other parts of the zfs codebase do, meaning that a hostid value
couldn't be determined; therefore, it should attempt to generate a
random value to write into /etc/hostid.

The manpage and usage output have been updated to reflect this.

Whitespace has also been fixed in the usage output.

Closes openzfs#11174

Signed-off-by: Érico Rolim <[email protected]>
@ericonr
Copy link
Contributor Author

ericonr commented Nov 11, 2020

@behlendorf I updated the PR to include a couple small fixes in the usage function, both to document the new behavior and fix an issue with whitespace. I hope that's okay to include in this PR.

Copy link
Contributor

@gyakovlev gyakovlev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks,
it makes sense.
code and doc changes look good to me. tested briefly. works as expected if hostid binary returns all zeroes and it's output passed to zgenhostid, it properly generates random one.

I only tested zgenhostid generation on musl/glibc, but not actual dracut.
if that's enough to fix musl, it should be ok.

documentation mentions just 0, so maybe docs should mention that user needs to pass 00000000 or 0x00000000 explicitly?
Sure, previous sentence mentions length requirements, but providing just 0 as example value may be a bit confusing.

@ericonr
Copy link
Contributor Author

ericonr commented Nov 11, 2020

documentation mentions just 0, so maybe docs should mention that user needs to pass 00000000 or 0x00000000 explicitly?
Sure, previous sentence mentions length requirements, but providing just 0 as example value may be a bit confusing.

The other values are shown as 1 and 2^32-1, so I thought it would be best to stick with the same style. Not using 0 feels inconsistent imho, but I can change it.

@ericonr
Copy link
Contributor Author

ericonr commented Nov 11, 2020

I have tested dracut as well by replacing the binary in /usr/bin, and it worked fine in that it didn't show any error messages.

However, as far as I can tell, most musl users should be advised to manually copy the hexadecimal representation of /etc/hostid to spl_hostid= in the kernel command line; or possibly the dracut module should be changed to simply copy /etc/hostid to the initramfs if it exists; and use zgenhostid when it doesn't.

@ahesford
Copy link
Contributor

ahesford commented Nov 11, 2020

I think the change to zgenhostid is reasonable because, when run on a live system, it allows the user to run zgenhostid $(hostid) and write a valid /etc/hostid even on musl systems where hostid prints zeros. My understanding is that ZFS (or SPL) looks for spl_hostid on the kernel command line first and, if that is not found, falls back to directly reading /etc/hostid rather than calling gethostid(3). (If there is no /etc/hostid, ZFS will just assume a zero host ID.)

There is still an outstanding issue in the dracut module-setup.sh. On musl, indiscrimininately running zgenhostid $(hostid) in dracut will always produce the wrong behavior.

On musl, assuming the user doesn't force spl_hostid, before the proposed change:

  • No /etc/hostid => ZFS on the running system assumes a zero host ID, zgenhostid during dracut setup errors out and does not write /etc/hostid to the initramfs, so ZFS in the initramfs will assume a zero host ID. System boots, but dracut is noisy.
  • With /etc/hostid => ZFS on the running system assumes stored host ID, zgenhostid during dracut setup errors out and does not write /etc/hostid to the initramfs, so ZFS in the initramfs will assume a zero host ID. System will not boot, and dracut is noisy.

After the proposed change:

  • No /etc/hostid => ZFS on the running system assumes a zero host ID, zgenhostid during dracut setup generates a random /etc/hostid in the initramfs, so ZFS in the initramfs will assume a different, non-zero host ID. System will not boot.
  • With /etc/hostid => ZFS on the running system assumes stored host ID, zgenhostid during dracut setup generates a random /etc/hostid in the initramfs, so ZFS in the initramfs will assume a different host ID. System will not boot.

The right thing to do in the dracut module is

if [ -f "${dracutsysrootdir}/etc/hostid" ]; then
    cp "${dracutsysrootdir}/etc/hostid" "${initdir}/etc/hostid"
else
    HOSTID="$(hostid)"
    if [ "${HOSTID}" != "00000000" ]; then
        zgenhostid -o "${initdir}/etc/hostid" "${HOSTID}"
    fi
fi

With glibc, this should always result in the initramfs containing an /etc/hostid that matches that of the running system. On musl, this ensures /etc/hostid matches if there is one, and otherwise omits the file from the initramfs so the fall-through case of zero host ID will consistently happen in the initramfs and the running system.

EDIT: Use $dracutsysrootdir in example to respect the possibility of a dracut --sysroot (even though this doesn't actually work in practice).

@gyakovlev
Copy link
Contributor

documentation mentions just 0, so maybe docs should mention that user needs to pass 00000000 or 0x00000000 explicitly?
Sure, previous sentence mentions length requirements, but providing just 0 as example value may be a bit confusing.

The other values are shown as 1 and 2^32-1, so I thought it would be best to stick with the same style. Not using 0 feels inconsistent imho, but I can change it.

maybe you are right. let's leave as is.

@behlendorf
Copy link
Contributor

@ahesford thanks for the clear explanation. Then it sounds like we need to include your proposed dracut change in this PR to avoid breaking the "No /etc/hostid" case, which while noisy, does work. @ahesford @ericonr what do you think about updating this PR accordingly.

@ahesford
Copy link
Contributor

As @ericonr is the musl user here, I'll defer to him to make sure my string comparison is sane, tweak anything necessary and incorporate here.

@ericonr
Copy link
Contributor Author

ericonr commented Nov 11, 2020

Sounds good, I will update the PR ASAP.

@ahesford
Copy link
Contributor

FYI, it looks like the proper way to cp "${dracutsysrootdir}/etc/hostid" "${initdir}/etc/hostid" in module-setup.sh is actually inst /etc/hostid.

@ericonr
Copy link
Contributor Author

ericonr commented Nov 13, 2020

PR updated with @ahesford's suggestions.

inst @sysconfdir@/hostid
type mark_hostonly >/dev/null 2>&1 && mark_hostonly @sysconfdir@/hostid
elif HOSTID="$(hostid 2>/dev/null)" && [ "${HOSTID}" != "00000000" ]; then
zgenhostid -o "${initdir}@sysconfdir@/hostid" "${hostid}"
Copy link
Contributor

@gyakovlev gyakovlev Nov 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ${hostid} should be ${HOSTID} here.

also I think it'd be better to check for zeroes first and resort to copying only if that condition is false and hostid file exists, so behaviour for non-musl systems does not change, besides marking hostonly.

consider alternative code (also added some extra quoting and changed -e to -f test )

HOSTID="$(hostid)"
if [ "${HOSTID}" != "00000000" ]; then
    zgenhostid -o "${initdir}/@sysconfdir@/hostid" "${HOSTID}"
    type mark_hostonly >/dev/null 2>&1 && mark_hostonly @sysconfdir@/hostid
else
	# probably musl stub gethostid() returned 0
	if [ -f "@sysconfdir@/hostid" ]; then
		inst "@sysconfdir@/hostid"
        type mark_hostonly >/dev/null 2>&1 && mark_hostonly "@sysconfdir@/hostid"
    fi
fi

Copy link
Contributor

@gyakovlev gyakovlev Nov 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if file exists, hostid should already return it's value, but we benefit from zgenhostid's error checking and not blindly copying the file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that ${hostid} should be ${HOSTID}. I also think that moving from -e to -f in the test is a good choice, although I opted for -e because the immediately preceding tests also use -e.

I think we should reduce the dependency on hostid, not elevate it. ZFS does not use the same mechanism for defining the hostid as the hostid(1) executable; it favors spl_hostid on the kernel command line and falls back to directly reading /etc/hostid (unless spl is configured to look elsewhere with another command-line parameter, but if that happens, this whole block is broken anyway). If /etc/hostid is directly a source of truth for ZFS, let's copy that truth without relying on an intermediary.

From a functional standpoint, the difference only matters if there exists a possibility that hostid(1) prints something other than the contents of /etc/hostid. I don't know whether this possibility exists.

If the hostid check is elevated, I still advocate the construct

if HOSTID="$(hostid 2>/dev/null)" && [ "${HOSTID}" != "00000000" ]

because it's probably a good idea to consider any stdout from a hostid that returns an error code as invalid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should reduce the dependency on hostid, not elevate it.

I strongly agree with this point: the dracut module uses hostid(1) as if its behavior were guaranteed to be the same as what the kernel module (which reads spl_hostid from the kernel command line or the /etc/hostid file) and the ZFS user space tools (which read /etc/hostid directly as well) do. hostid(1) is an interface to libc's gethostid(3), which, as a non-standard function, can be implemented in all sorts of ways. Moving to treating the hostid as close to what the other parts of ZFS do sounds to me like the best way forward.

From a functional standpoint, the difference only matters if there exists a possibility that hostid(1) prints something other than the contents of /etc/hostid. I don't know whether this possibility exists.

This possibility does exist on glibc, where, if /etc/hostid can't be opened, gethostid(3) twiddles the current IP address into an uint32_t and returns that. This is a value that is not at all guaranteed to stay the same, and is miles away from what ZFS treats as the system's hostid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everybody happy with the updated dracut change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current state reflects my intended fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it's ok.
in gentoo we have another initrd generator which supports zfs (genkernel) and it copies hostid file just fine, instead of generating it.

On systems with musl libc, hostid(1) always prints "00000000", which
will cause improper behavior when the 90zfs module is configured in a
dracut initramfs. Work around this by copying the host /etc/hostid if
the file exists, and otherwise only write /etc/hostid if hostid(1)
returns something meaningful. This avoids zgenhostid creating a random
/etc/hostid for the initramfs, which could lead to errors when trying to
import the pool if spl_hostid isn't defined in the kernel command line.

Furthermore, tag the /etc/hostid file as hostonly, since it is system
specific and shouldn't be taken into account when trying to use an
initramfs generated in one system to boot into a different system.

Co-authored-by: Andrew J. Hesford <[email protected]>
Signed-off-by: Érico Rolim <[email protected]>
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Nov 14, 2020
behlendorf pushed a commit that referenced this pull request Nov 15, 2020
On systems with musl libc, hostid(1) always prints "00000000", which
will cause improper behavior when the 90zfs module is configured in a
dracut initramfs. Work around this by copying the host /etc/hostid if
the file exists, and otherwise only write /etc/hostid if hostid(1)
returns something meaningful. This avoids zgenhostid creating a random
/etc/hostid for the initramfs, which could lead to errors when trying to
import the pool if spl_hostid isn't defined in the kernel command line.

Furthermore, tag the /etc/hostid file as hostonly, since it is system
specific and shouldn't be taken into account when trying to use an
initramfs generated in one system to boot into a different system.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Georgy Yakovlev <[email protected]>
Co-authored-by: Andrew J. Hesford <[email protected]>
Signed-off-by: Érico Rolim <[email protected]>
Closes #11174
Closes #11189
behlendorf pushed a commit that referenced this pull request Nov 17, 2020
A common usage pattern for zgenhostid, including in the ZFS dracut
module, is running it as:

  zgenhostid $(hostid)

However, zgenhostid only accepted hostid arguments greater than 0, which
meant that, when the output of hostid(1) was "00000000", zgenhostid
would error out, even though 0 is a possible return value for the
gethostid(3) function used by hostid(1):

- On current musl libc, gethostid(3) is a stub that always returns 0.
- On glibc, gethostid(3) will return 0 if /etc/hostid exists but is
  smaller than 4 bytes.

In these cases, it makes more sense for zgenhostid to treat a value of 0
as other parts of the zfs codebase do, meaning that a hostid value
couldn't be determined; therefore, it should attempt to generate a
random value to write into /etc/hostid.

The manpage and usage output have been updated to reflect this.

Whitespace has also been fixed in the usage output.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Georgy Yakovlev <[email protected]>
Reviewed-by: Andrew J. Hesford <[email protected]>
Signed-off-by: Érico Rolim <[email protected]>
Closes #11174
Closes #11189
behlendorf pushed a commit that referenced this pull request Nov 17, 2020
On systems with musl libc, hostid(1) always prints "00000000", which
will cause improper behavior when the 90zfs module is configured in a
dracut initramfs. Work around this by copying the host /etc/hostid if
the file exists, and otherwise only write /etc/hostid if hostid(1)
returns something meaningful. This avoids zgenhostid creating a random
/etc/hostid for the initramfs, which could lead to errors when trying to
import the pool if spl_hostid isn't defined in the kernel command line.

Furthermore, tag the /etc/hostid file as hostonly, since it is system
specific and shouldn't be taken into account when trying to use an
initramfs generated in one system to boot into a different system.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Georgy Yakovlev <[email protected]>
Co-authored-by: Andrew J. Hesford <[email protected]>
Signed-off-by: Érico Rolim <[email protected]>
Closes #11174
Closes #11189
jsai20 pushed a commit to jsai20/zfs that referenced this pull request Mar 30, 2021
A common usage pattern for zgenhostid, including in the ZFS dracut
module, is running it as:

  zgenhostid $(hostid)

However, zgenhostid only accepted hostid arguments greater than 0, which
meant that, when the output of hostid(1) was "00000000", zgenhostid
would error out, even though 0 is a possible return value for the
gethostid(3) function used by hostid(1):

- On current musl libc, gethostid(3) is a stub that always returns 0.
- On glibc, gethostid(3) will return 0 if /etc/hostid exists but is
  smaller than 4 bytes.

In these cases, it makes more sense for zgenhostid to treat a value of 0
as other parts of the zfs codebase do, meaning that a hostid value
couldn't be determined; therefore, it should attempt to generate a
random value to write into /etc/hostid.

The manpage and usage output have been updated to reflect this.

Whitespace has also been fixed in the usage output.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Georgy Yakovlev <[email protected]>
Reviewed-by: Andrew J. Hesford <[email protected]>
Signed-off-by: Érico Rolim <[email protected]>
Closes openzfs#11174
Closes openzfs#11189
jsai20 pushed a commit to jsai20/zfs that referenced this pull request Mar 30, 2021
On systems with musl libc, hostid(1) always prints "00000000", which
will cause improper behavior when the 90zfs module is configured in a
dracut initramfs. Work around this by copying the host /etc/hostid if
the file exists, and otherwise only write /etc/hostid if hostid(1)
returns something meaningful. This avoids zgenhostid creating a random
/etc/hostid for the initramfs, which could lead to errors when trying to
import the pool if spl_hostid isn't defined in the kernel command line.

Furthermore, tag the /etc/hostid file as hostonly, since it is system
specific and shouldn't be taken into account when trying to use an
initramfs generated in one system to boot into a different system.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Georgy Yakovlev <[email protected]>
Co-authored-by: Andrew J. Hesford <[email protected]>
Signed-off-by: Érico Rolim <[email protected]>
Closes openzfs#11174
Closes openzfs#11189
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
A common usage pattern for zgenhostid, including in the ZFS dracut
module, is running it as:

  zgenhostid $(hostid)

However, zgenhostid only accepted hostid arguments greater than 0, which
meant that, when the output of hostid(1) was "00000000", zgenhostid
would error out, even though 0 is a possible return value for the
gethostid(3) function used by hostid(1):

- On current musl libc, gethostid(3) is a stub that always returns 0.
- On glibc, gethostid(3) will return 0 if /etc/hostid exists but is
  smaller than 4 bytes.

In these cases, it makes more sense for zgenhostid to treat a value of 0
as other parts of the zfs codebase do, meaning that a hostid value
couldn't be determined; therefore, it should attempt to generate a
random value to write into /etc/hostid.

The manpage and usage output have been updated to reflect this.

Whitespace has also been fixed in the usage output.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Georgy Yakovlev <[email protected]>
Reviewed-by: Andrew J. Hesford <[email protected]>
Signed-off-by: Érico Rolim <[email protected]>
Closes openzfs#11174
Closes openzfs#11189
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
On systems with musl libc, hostid(1) always prints "00000000", which
will cause improper behavior when the 90zfs module is configured in a
dracut initramfs. Work around this by copying the host /etc/hostid if
the file exists, and otherwise only write /etc/hostid if hostid(1)
returns something meaningful. This avoids zgenhostid creating a random
/etc/hostid for the initramfs, which could lead to errors when trying to
import the pool if spl_hostid isn't defined in the kernel command line.

Furthermore, tag the /etc/hostid file as hostonly, since it is system
specific and shouldn't be taken into account when trying to use an
initramfs generated in one system to boot into a different system.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Georgy Yakovlev <[email protected]>
Co-authored-by: Andrew J. Hesford <[email protected]>
Signed-off-by: Érico Rolim <[email protected]>
Closes openzfs#11174
Closes openzfs#11189
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New zgenhostid breaks dracut initramfs generation under musl libc
4 participants