pgwire: tolerate unknown HBA configs from future versions #43717

knz · 2020-01-03T16:27:41Z

If a new feature is added in the next release, it can cause
the HBA cluster setting to contain values not recognized
by the current release.

Rules that don't match are to be ignored.

Release note (bug fix): Using the 'gss' option in a HBA configuration
using an CCL license will not any more cause the cluster to stop
accepting client connections when the nodes are restarted
with a non-CCL (pure BSL) binary.

Release note (bug fix): Using a new HBA feature from a new version of
CokroachDB during an upgrade will not any more cause all the previous
version nodes to stop accepting connections.

cockroach-teamcity · 2020-01-03T16:27:48Z

This change is

knz · 2020-01-03T16:28:16Z

I am preparing a test for this change, but I cannot include it in this PR because I want this PR to be back-ported to 19.2 and the test needs new infrastructure which we can't backport.

If a new feature is added in the next release, it can cause the HBA cluster setting to contain values not recognized by the current release. Rules that don't match are to be ignored. Release note (bug fix): Using the 'gss' option in a HBA configuration using an CCL license will not any more cause the cluster to stop accepting client connections when the nodes are restarted with a non-CCL (pure BSL) binary. Release note (bug fix): Using a new HBA feature from a new version of CokroachDB during an upgrade will not any more cause all the previous version nodes to stop accepting connections.

maddyblue

I'm trying to think through the security implications of this. This could would cause the authentication method to be valid but differ between two different cockroach versions. That is, the 20.1 and (backported) 19.2 nodes could both accept an incoming connection, but for different reasons, with the 19.2 node accepting the connection due to a later matching rule. Could an attacker use this knowledge to gain unauthorized entry into the DB? Is it possible that the DB admin wrote the later HBA rules under the assumption that earlier rules matched based on user/IP matching, and the later rules can thus assume those users/IPs are no longer possibly in scope? Consider:

host all root all blah
host all all all trust

This should force all connections of the root user to use the blah auth method, and allow everyone else through. If a DB admin is doing a cluster upgrade and adds a rule like this (or fat-fingers and uses an auth method like cret) then with this patch the root user is no longer protected as assumed by the DB admin.

knz · 2020-01-03T20:22:41Z

I have two different ideas. 1)I could implement the "reject" method in the same PR so it can also be backported. This way if an admin introduces a rule using a new auth method, they can immediately follow the new rule with the same condition and the "reject" method. 2) we could add a "method option" in the optional last column to indicate what to do if the method is not recognized, e.g. "newmethod fallback=reject" (other poasible value: "next") and have the old version use the fallback if it doesn't know the main method. This way the operator can control precisely what happens during the upgrade. What do you think? -- Verstuurd vanaf mijn Android apparaat met K-9 Mail. Excuseer mijn beknoptheid.

maddyblue · 2020-01-03T20:39:47Z

I think that our defaults should be safe, and users should have to opt-out of them. I like your second idea because it explicitly requires admins to say what to do (and if omitted, we can do the safest thing which is to fail fast). I'm ok with the fallback=method|next option. I'm also ok with an option that's like fallthrough or something which would be equivalent to fallback=next and lose the ability to specific exactly the fallback auth method. One benefit of the fallthrough option is that it could apply to more things than just auth method. This PR has stuff for hostnames which don't fit well into the fallback=method option, but would be fine with fallthrough.

knz · 2020-01-03T21:21:51Z

There is also option 3) keep the current behavior but be careful to gate new HBA features behind a cluster setting. This is more co.Plex and more error prone, with the risk of hosing a cluster as described in the issue linked above.

…

-- Verstuurd vanaf mijn Android apparaat met K-9 Mail. Excuseer mijn beknoptheid.

maddyblue · 2020-01-03T21:36:19Z

Option 3 is also nice. What complexity is there? Doesn't seem too complex to have the HBA validator have certain auth methods gated behind cluster versions and disallow them until the version is bumped. Or is there something else I'm missing?

knz · 2020-01-03T22:29:59Z

The complexity is that there is no obstacle for the future implementer from forgetting to gate the new setting.

I'll sleep over this.

I think it would also help if @aaron-crl would give us some informed opinion about best practices here.

maddyblue · 2020-01-03T22:34:07Z

You could do things like make the auth methods a data structure where a required parameter is the minimum cluster version. That might fix this problem, and make it very obvious exactly how to add new auth methods in the future.

knz · 2020-01-03T22:36:02Z

You could do things like make the auth methods a data structure where a required parameter is the minimum cluster version.

That's one way to do this. I am still not fully happy about this, because of the other case explained in #43716: if a user uses a CCL method in a rule then tries to start their cluster without support for it (e.g. a pure BSL binary) everything will break.

Maybe we could do both though; combine option 3 (on the entry side, to prevent dubious configurations to start with) with option 2 (as a fallback if a faulty config gets into the cluster setting).

maddyblue · 2020-01-04T00:04:16Z

I like that. Helps both us and users.

knz · 2020-01-06T07:11:19Z

You could do things like make the auth methods a data structure where a required parameter is the minimum cluster version.

I like this idea - see #43731

aaron-crl · 2020-01-06T18:52:37Z

Undefined behavior in the security state machine should result in a secure system state (in this case I feel this means closed, and a loggable error).

(i.) I like putting a minimum supported version in the auth method too (and a maximum) as that will also allow controls to be aged out (not just added).

(ii.) Creating hba.conf configurations that interpret authentication rules differently on different hosts can result in broken auth as @mjibson mentioned and should be avoided or strongly discouraged.

How common are mixed-version clusters? Would it be acceptable to expect the administrator to initially craft an hba.conf that meets the lowest common denominator for authentication configurations? Once all nodes support the "new" authentication, the hba.conf can be updated to reflect "new" authentication configuration.

As an additional thought, if an administrator wants to ignore unsupported auth rules, this could be added as a startup flag with a big "WARNING! This can result in broken authentication and may render the cluster nonfunctional or allow unauthenticated access! Don't use this unless you know what you are doing." message.

43731: pgwire,hba: introduce the 'trust' and 'reject' auth methods r=knz a=knz First commits from #43734 and #43726. The 'trust' and 'reject' methods, as their name implies, unconditionally allow and deny authentication of matching connections. This patch introduces them as a prerequisite to later work on Unix socket authentication, but also to introduce a bit of infrastructure that binds new auth methods to a minimum required cluster version. This new infrastructure ensures that future new auth methods do not risk hosting a mixed-version cluster. (See discussion on #43717.) The patch also introduces support for the 'local' rule prefix, which is still unused. Co-authored-by: Raphael 'kena' Poss <[email protected]>

knz · 2020-09-11T10:04:16Z

Closing this - we're operating with version checks now (i.e. can't use a new feature until all nodes recognize it).

If a cluster uses a CCL feature (e.g. gss) and the binary is "downgraded" to OSS-only, they get a clean error. From that point they can get back to CCL, change the setting, then come back to OSS-only.

knz requested a review from maddyblue January 3, 2020 16:27

knz force-pushed the 20200103-hba-upgrade branch from c18c2b4 to 8b18b09 Compare January 3, 2020 16:29

knz force-pushed the 20200103-hba-upgrade branch from 8b18b09 to c2116cb Compare January 3, 2020 16:31

maddyblue reviewed Jan 3, 2020

View reviewed changes

knz mentioned this pull request Jan 6, 2020

pgwire,hba: introduce the 'trust' and 'reject' auth methods #43731

Merged

knz closed this Sep 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pgwire: tolerate unknown HBA configs from future versions #43717

pgwire: tolerate unknown HBA configs from future versions #43717

knz commented Jan 3, 2020 •

edited

Loading

cockroach-teamcity commented Jan 3, 2020

knz commented Jan 3, 2020

maddyblue left a comment

knz commented Jan 3, 2020 via email

maddyblue commented Jan 3, 2020

knz commented Jan 3, 2020 via email

maddyblue commented Jan 3, 2020

knz commented Jan 3, 2020

maddyblue commented Jan 3, 2020

knz commented Jan 3, 2020

maddyblue commented Jan 4, 2020

knz commented Jan 6, 2020

aaron-crl commented Jan 6, 2020

knz commented Sep 11, 2020

pgwire: tolerate unknown HBA configs from future versions #43717

pgwire: tolerate unknown HBA configs from future versions #43717

Conversation

knz commented Jan 3, 2020 • edited Loading

cockroach-teamcity commented Jan 3, 2020

knz commented Jan 3, 2020

maddyblue left a comment

Choose a reason for hiding this comment

knz commented Jan 3, 2020 via email

maddyblue commented Jan 3, 2020

knz commented Jan 3, 2020 via email

maddyblue commented Jan 3, 2020

knz commented Jan 3, 2020

maddyblue commented Jan 3, 2020

knz commented Jan 3, 2020

maddyblue commented Jan 4, 2020

knz commented Jan 6, 2020

aaron-crl commented Jan 6, 2020

knz commented Sep 11, 2020

knz commented Jan 3, 2020 •

edited

Loading