-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pgwire: tolerate unknown HBA configs from future versions #43717
Conversation
I am preparing a test for this change, but I cannot include it in this PR because I want this PR to be back-ported to 19.2 and the test needs new infrastructure which we can't backport. |
c18c2b4
to
8b18b09
Compare
If a new feature is added in the next release, it can cause the HBA cluster setting to contain values not recognized by the current release. Rules that don't match are to be ignored. Release note (bug fix): Using the 'gss' option in a HBA configuration using an CCL license will not any more cause the cluster to stop accepting client connections when the nodes are restarted with a non-CCL (pure BSL) binary. Release note (bug fix): Using a new HBA feature from a new version of CokroachDB during an upgrade will not any more cause all the previous version nodes to stop accepting connections.
8b18b09
to
c2116cb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to think through the security implications of this. This could would cause the authentication method to be valid but differ between two different cockroach versions. That is, the 20.1 and (backported) 19.2 nodes could both accept an incoming connection, but for different reasons, with the 19.2 node accepting the connection due to a later matching rule. Could an attacker use this knowledge to gain unauthorized entry into the DB? Is it possible that the DB admin wrote the later HBA rules under the assumption that earlier rules matched based on user/IP matching, and the later rules can thus assume those users/IPs are no longer possibly in scope? Consider:
host all root all blah
host all all all trust
This should force all connections of the root user to use the blah
auth method, and allow everyone else through. If a DB admin is doing a cluster upgrade and adds a rule like this (or fat-fingers and uses an auth method like cret
) then with this patch the root user is no longer protected as assumed by the DB admin.
I have two different ideas.
1)I could implement the "reject" method in the same PR so it can also be backported.
This way if an admin introduces a rule using a new auth method, they can immediately follow the new rule with the same condition and the "reject" method.
2) we could add a "method option" in the optional last column to indicate what to do if the method is not recognized, e.g. "newmethod fallback=reject" (other poasible value: "next") and have the old version use the fallback if it doesn't know the main method.
This way the operator can control precisely what happens during the upgrade.
What do you think?
--
Verstuurd vanaf mijn Android apparaat met K-9 Mail. Excuseer mijn beknoptheid.
|
I think that our defaults should be safe, and users should have to opt-out of them. I like your second idea because it explicitly requires admins to say what to do (and if omitted, we can do the safest thing which is to fail fast). I'm ok with the |
There is also option 3) keep the current behavior but be careful to gate new HBA features behind a cluster setting. This is more co.Plex and more error prone, with the risk of hosing a cluster as described in the issue linked above.
…--
Verstuurd vanaf mijn Android apparaat met K-9 Mail. Excuseer mijn beknoptheid.
|
Option 3 is also nice. What complexity is there? Doesn't seem too complex to have the HBA validator have certain auth methods gated behind cluster versions and disallow them until the version is bumped. Or is there something else I'm missing? |
The complexity is that there is no obstacle for the future implementer from forgetting to gate the new setting. I'll sleep over this. I think it would also help if @aaron-crl would give us some informed opinion about best practices here. |
You could do things like make the auth methods a data structure where a required parameter is the minimum cluster version. That might fix this problem, and make it very obvious exactly how to add new auth methods in the future. |
That's one way to do this. I am still not fully happy about this, because of the other case explained in #43716: if a user uses a CCL method in a rule then tries to start their cluster without support for it (e.g. a pure BSL binary) everything will break. Maybe we could do both though; combine option 3 (on the entry side, to prevent dubious configurations to start with) with option 2 (as a fallback if a faulty config gets into the cluster setting). |
I like that. Helps both us and users. |
I like this idea - see #43731 |
Undefined behavior in the security state machine should result in a secure system state (in this case I feel this means closed, and a loggable error). (i.) I like putting a minimum supported version in the auth method too (and a maximum) as that will also allow controls to be aged out (not just added). (ii.) Creating hba.conf configurations that interpret authentication rules differently on different hosts can result in broken auth as @mjibson mentioned and should be avoided or strongly discouraged. How common are mixed-version clusters? Would it be acceptable to expect the administrator to initially craft an hba.conf that meets the lowest common denominator for authentication configurations? Once all nodes support the "new" authentication, the hba.conf can be updated to reflect "new" authentication configuration. As an additional thought, if an administrator wants to ignore unsupported auth rules, this could be added as a startup flag with a big "WARNING! This can result in broken authentication and may render the cluster nonfunctional or allow unauthenticated access! Don't use this unless you know what you are doing." message. |
43731: pgwire,hba: introduce the 'trust' and 'reject' auth methods r=knz a=knz First commits from #43734 and #43726. The 'trust' and 'reject' methods, as their name implies, unconditionally allow and deny authentication of matching connections. This patch introduces them as a prerequisite to later work on Unix socket authentication, but also to introduce a bit of infrastructure that binds new auth methods to a minimum required cluster version. This new infrastructure ensures that future new auth methods do not risk hosting a mixed-version cluster. (See discussion on #43717.) The patch also introduces support for the 'local' rule prefix, which is still unused. Co-authored-by: Raphael 'kena' Poss <[email protected]>
Closing this - we're operating with version checks now (i.e. can't use a new feature until all nodes recognize it). If a cluster uses a CCL feature (e.g. |
Fixes #43716.
If a new feature is added in the next release, it can cause
the HBA cluster setting to contain values not recognized
by the current release.
Rules that don't match are to be ignored.
Release note (bug fix): Using the 'gss' option in a HBA configuration
using an CCL license will not any more cause the cluster to stop
accepting client connections when the nodes are restarted
with a non-CCL (pure BSL) binary.
Release note (bug fix): Using a new HBA feature from a new version of
CokroachDB during an upgrade will not any more cause all the previous
version nodes to stop accepting connections.