Skip to content
This repository has been archived by the owner on Feb 26, 2020. It is now read-only.

Simplify hostid logic #224

Closed
wants to merge 1 commit into from
Closed

Simplify hostid logic #224

wants to merge 1 commit into from

Conversation

ryao
Copy link
Contributor

@ryao ryao commented Mar 12, 2013

There is plenty of compatibility code for a hw_hostid
that isn't used by anything. At the same time, there are apparently
issues with the current hostid logic. coredumb in #zfsonlinux on
freenode reported that Fedora 17 changes its hostid on every boot, which
required force importing his pool. A suggestion by wca was to adopt
FreeBSD's behavior, where it treats hostid as zero if /etc/hostid does
not exist

Adopting FreeBSD's behavior permits us to eliminate plenty of code,
including a userland helper that invokes the system's hostid as a
fallback.

Signed-off-by: Richard Yao [email protected]

@dajhorn
Copy link
Contributor

dajhorn commented Mar 12, 2013

There is plenty of compatibility code for a hw_hostid that isn't used by anything.

The hostid call was added a long time ago at SVN commit f23e92f. Does anybody know if there was a reason for doing it this way over any other?

A suggestion by wca was to adopt FreeBSD's behavior, where it treats hostid as zero if /etc/hostid does not exist

I recommend leaving the HW_HOSTID_MASK at the glibc value so that /bin/hostid always returns the same value that the operator will see in the system log. I use this kind of thing for doing sanity checks on support calls.

coredumb in #zfsonlinux on freenode reported that Fedora 17 changes its hostid on every boot, which required force importing his pool.

Fedora 17 plumbs lo later than other interfaces, which means that its generated hostid can be changed by dhcp leases or perhaps brctl assignments.

@dajhorn
Copy link
Contributor

dajhorn commented Mar 12, 2013

The purpose of the hostid check is to prevent conflicting access into shared storage by cluster nodes, which is an uncommon configuration, or to prevent things like accidental SAN imports. This pull request got me thinking that:

  • Anybody doing such work is already super careful about shared storage by necessity, so the hostid check provides only marginal safety, and it integrates poorly with native HA solutions like heartbeat.
  • Most ZoL installations default to 0x007f0101 anyways, so we know that the scope of protection is small.
  • But the hostid check causes a disproportionate amount of user frustration and consequent support load in the general case.

Would either of these two things be acceptable alternatives?

  1. Use the hostname instead, which is more likely to be unique and stable on a Linux system than the hostid.
  2. Or disable the hostid check entirely and punt all import policy decisions up into a regular configuration file.

@behlendorf
Copy link
Contributor

The hostid call was added a long time ago at SVN commit f23e92f. Does anybody know if there was a reason for doing it this way over any other?

Two reasons:

  • At the time I wanted to do the same thing as OpenSolaris and expected to revisit this latter (which is now).
  • The trip through user space was unpleasant but required because the hostid isn't available in the kernel.

However, these days I'm much less attached to doing things the OpenSolaris way particularly if there's a more Linux friendly approach which achieves the same thing.

Use the hostname instead, which is more likely to be unique and stable on a Linux system than the hostid.

Perhaps, the only thing required of this value is that it be 1) persistent, 2) unique (usually) 3) available early in the boot process, and 4) accessible from within the kernel. The hostname should cover 1) and 2) as long as we avoid the per-process namespaces. But I don't believe we're guaranteed the hostname is set early in boot process. And as for 4) init_uts_ns is marked GPL-only is the kernel (for god knows what reason) and that may cause us some trouble down the road.

Perhaps there's something else we could use?

Or disable the hostid check entirely and punt all import policy decisions up into a regular configuration file.

This I find more appealing. As you mentioned above in practice the amount of protection we're getting from this is minimal for many end user configurations. We could disable it by default and force installations which are running in a shared storage environment to manually enable it. By doing so we'd be no worse off than all of the other existing Linux filesystems which have no such protection. Longer term we'll be replacing the hostid functionality with proper MMP protection anyway which is described in #745.

@hvenzke
Copy link

hvenzke commented May 9, 2013

Plenty decade ago on Solaris Sparc with sol 9 and bevor,
the HOSTID was the MAC adress of the FIRST Ethernet card
That was Build IN the Solaris HOST SYSTEM .. hme0 i.e
This MAC Chip was moved when an system going to death by an SUN Onsite field engineer ...
As such staff ..many years .. I know it ...

Same thing with Ether mac used are used almost on UniX hardware and Software like systems.
i.e 100% of the FlexNet software based ..use the mac of the first ether .
If changed that , you licenses where trash.

MAC addresses can changed on modern cards today using firmware tools.
on i.e VMware or other systems you can define an mac

So my assumtion is to use the MAC of the eth0 i.e as hostaddess to have an Uniq ID for the zpool/ZIl ...

Even due some due some LinuX Disto Concepts ( i.e Fc ) of implementing Network,
with Changing the UUID every boot due Privacy Concerns,
ZFS require an Uniq identifer since ZFS been not today an Cluster Filesystem.

hope this helps.

There is plenty of compatibility code for a hw_hostid
that isn't used by anything. At the same time, there are apparently
issues with the current hostid logic. coredumb in #zfsonlinux on
freenode reported that Fedora 17 changes its hostid on every boot, which
required force importing his pool. A suggestion by wca was to adopt
FreeBSD's behavior, where it treats hostid as zero if /etc/hostid does
not exist

Adopting FreeBSD's behavior permits us to eliminate plenty of code,
including a userland helper that invokes the system's hostid as a
fallback.

Signed-off-by: Richard Yao <[email protected]>
@ryao
Copy link
Contributor Author

ryao commented Apr 11, 2014

I have refreshed the patch based on feedback over the past year. It is almost exactly what I had been using in Gentoo, except I no longer mask off the top 4-bits to match Solaris.

@behlendorf
Copy link
Contributor

@dajhorn The amount of suffering caused by this hostid nonsense has me leaning towards picking up this change. Or if not exactly this change then just making the proposed behavior in the patch the default with a module option to re-enable the previous behavior. But before doing that I wanted to check if that will cause you issues.

@dajhorn
Copy link
Contributor

dajhorn commented Apr 12, 2014

@behlendorf, nope. Moving towards hostid retirement is good for ZoL.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants