Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pgwire: tolerate unknown HBA configs from future versions #43717

Closed
wants to merge 1 commit into from

Conversation

knz
Copy link
Contributor

@knz knz commented Jan 3, 2020

Fixes #43716.

If a new feature is added in the next release, it can cause
the HBA cluster setting to contain values not recognized
by the current release.

Rules that don't match are to be ignored.

Release note (bug fix): Using the 'gss' option in a HBA configuration
using an CCL license will not any more cause the cluster to stop
accepting client connections when the nodes are restarted
with a non-CCL (pure BSL) binary.

Release note (bug fix): Using a new HBA feature from a new version of
CokroachDB during an upgrade will not any more cause all the previous
version nodes to stop accepting connections.

@knz knz requested a review from maddyblue January 3, 2020 16:27
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz
Copy link
Contributor Author

knz commented Jan 3, 2020

I am preparing a test for this change, but I cannot include it in this PR because I want this PR to be back-ported to 19.2 and the test needs new infrastructure which we can't backport.

@knz knz force-pushed the 20200103-hba-upgrade branch from c18c2b4 to 8b18b09 Compare January 3, 2020 16:29
If a new feature is added in the next release, it can cause
the HBA cluster setting to contain values not recognized
by the current release.

Rules that don't match are to be ignored.

Release note (bug fix): Using the 'gss' option in a HBA configuration
using an CCL license will not any more cause the cluster to stop
accepting client connections when the nodes are restarted
with a non-CCL (pure BSL) binary.

Release note (bug fix): Using a new HBA feature from a new version of
CokroachDB during an upgrade will not any more cause all the previous
version nodes to stop accepting connections.
@knz knz force-pushed the 20200103-hba-upgrade branch from 8b18b09 to c2116cb Compare January 3, 2020 16:31
Copy link
Contributor

@maddyblue maddyblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to think through the security implications of this. This could would cause the authentication method to be valid but differ between two different cockroach versions. That is, the 20.1 and (backported) 19.2 nodes could both accept an incoming connection, but for different reasons, with the 19.2 node accepting the connection due to a later matching rule. Could an attacker use this knowledge to gain unauthorized entry into the DB? Is it possible that the DB admin wrote the later HBA rules under the assumption that earlier rules matched based on user/IP matching, and the later rules can thus assume those users/IPs are no longer possibly in scope? Consider:

host all root all blah
host all all all trust

This should force all connections of the root user to use the blah auth method, and allow everyone else through. If a DB admin is doing a cluster upgrade and adds a rule like this (or fat-fingers and uses an auth method like cret) then with this patch the root user is no longer protected as assumed by the DB admin.

@knz
Copy link
Contributor Author

knz commented Jan 3, 2020 via email

@maddyblue
Copy link
Contributor

I think that our defaults should be safe, and users should have to opt-out of them. I like your second idea because it explicitly requires admins to say what to do (and if omitted, we can do the safest thing which is to fail fast). I'm ok with the fallback=method|next option. I'm also ok with an option that's like fallthrough or something which would be equivalent to fallback=next and lose the ability to specific exactly the fallback auth method. One benefit of the fallthrough option is that it could apply to more things than just auth method. This PR has stuff for hostnames which don't fit well into the fallback=method option, but would be fine with fallthrough.

@knz
Copy link
Contributor Author

knz commented Jan 3, 2020 via email

@maddyblue
Copy link
Contributor

Option 3 is also nice. What complexity is there? Doesn't seem too complex to have the HBA validator have certain auth methods gated behind cluster versions and disallow them until the version is bumped. Or is there something else I'm missing?

@knz
Copy link
Contributor Author

knz commented Jan 3, 2020

The complexity is that there is no obstacle for the future implementer from forgetting to gate the new setting.

I'll sleep over this.

I think it would also help if @aaron-crl would give us some informed opinion about best practices here.

@maddyblue
Copy link
Contributor

You could do things like make the auth methods a data structure where a required parameter is the minimum cluster version. That might fix this problem, and make it very obvious exactly how to add new auth methods in the future.

@knz
Copy link
Contributor Author

knz commented Jan 3, 2020

You could do things like make the auth methods a data structure where a required parameter is the minimum cluster version.

That's one way to do this. I am still not fully happy about this, because of the other case explained in #43716: if a user uses a CCL method in a rule then tries to start their cluster without support for it (e.g. a pure BSL binary) everything will break.

Maybe we could do both though; combine option 3 (on the entry side, to prevent dubious configurations to start with) with option 2 (as a fallback if a faulty config gets into the cluster setting).

@maddyblue
Copy link
Contributor

I like that. Helps both us and users.

@knz
Copy link
Contributor Author

knz commented Jan 6, 2020

You could do things like make the auth methods a data structure where a required parameter is the minimum cluster version.

I like this idea - see #43731

@aaron-crl
Copy link

Undefined behavior in the security state machine should result in a secure system state (in this case I feel this means closed, and a loggable error).

(i.) I like putting a minimum supported version in the auth method too (and a maximum) as that will also allow controls to be aged out (not just added).

(ii.) Creating hba.conf configurations that interpret authentication rules differently on different hosts can result in broken auth as @mjibson mentioned and should be avoided or strongly discouraged.

How common are mixed-version clusters? Would it be acceptable to expect the administrator to initially craft an hba.conf that meets the lowest common denominator for authentication configurations? Once all nodes support the "new" authentication, the hba.conf can be updated to reflect "new" authentication configuration.

As an additional thought, if an administrator wants to ignore unsupported auth rules, this could be added as a startup flag with a big "WARNING! This can result in broken authentication and may render the cluster nonfunctional or allow unauthenticated access! Don't use this unless you know what you are doing." message.

craig bot pushed a commit that referenced this pull request Jan 8, 2020
43731: pgwire,hba: introduce the 'trust' and 'reject' auth methods r=knz a=knz

First commits from #43734 and #43726.

The 'trust' and 'reject' methods, as their name implies,
unconditionally allow and deny authentication of matching connections.

This patch introduces them as a prerequisite to later work on Unix
socket authentication, but also to introduce a bit of infrastructure
that binds new auth methods to a minimum required cluster
version. This new infrastructure ensures that future new auth methods
do not risk hosting a mixed-version cluster. (See discussion on #43717.)

The patch also introduces support for the 'local' rule prefix, which
is still unused.


Co-authored-by: Raphael 'kena' Poss <[email protected]>
@knz
Copy link
Contributor Author

knz commented Sep 11, 2020

Closing this - we're operating with version checks now (i.e. can't use a new feature until all nodes recognize it).

If a cluster uses a CCL feature (e.g. gss) and the binary is "downgraded" to OSS-only, they get a clean error. From that point they can get back to CCL, change the setting, then come back to OSS-only.

@knz knz closed this Sep 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pgwire: new methods in HBA config break mixed-version clusters
4 participants