Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: maybe surface unix socket #31113

Closed
mberhault opened this issue Oct 8, 2018 · 12 comments · Fixed by #43848
Closed

server: maybe surface unix socket #31113

mberhault opened this issue Oct 8, 2018 · 12 comments · Fixed by #43848
Assignees
Labels
A-kv-server Relating to the KV-level RPC server C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-community Originated from the community

Comments

@mberhault
Copy link
Contributor

Requested on gitter.

The unix socket can be a reasonably way to communicate with a CockroachDB node when running on the same machine. We added it a while back for easier testing (easier than dynamic port allocation) but have never advertised or documented it. The flag (--socket) is currently hidden.

We should consider surfacing the flag in the CLI and documenting it. We first need to add some additional testing for it, including exercising in test clusters.

CC: @bdarnell for thoughts.

@mberhault mberhault added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) E-easy Easy issue to tackle, requires little or no CockroachDB experience O-community Originated from the community A-kv-server Relating to the KV-level RPC server S-3 Medium-low impact: incurs increased costs for some users (incl lower avail, recoverable bad data) labels Oct 8, 2018
@tbg tbg removed the S-3 Medium-low impact: incurs increased costs for some users (incl lower avail, recoverable bad data) label Oct 9, 2018
@bdarnell
Copy link
Contributor

bdarnell commented Oct 9, 2018

I'm generally in favor of exposing/encouraging unix sockets as an option. I think the biggest remaining work for this flag has more to do with documentation (especially how to get various clients to connect via the unix socket, which tends to involve a lot of magic) than with testing.

Also, we should think about the big picture of how the flag would be used. Unix sockets are great for single-node clusters, in which case we might want to be able to disable the TCP ports completely when they are used. Does it ever really make sense to use unix sockets in a multi-node cluster (where the TCP ports are necessary for inter-node traffic and therefore you're probably not using dynamic ports)?

@knz
Copy link
Contributor

knz commented Oct 9, 2018

it's possible unix sockets are faster to communicate with. so a local client for dump/restore eg would get more throughput using the unix socket than over TCP.

@inieves
Copy link

inieves commented Oct 16, 2018

Please don't make usage of unix domain sockets and TCP/IP sockets mutually exclusive. They have different use cases that might co-exist.

To add some more detail:

nginx -> {insert your favorite cockroachdb compatible language} -> cockroachdb
all on a single machine/container/vm/whatHaveYou, efficiently over unix domain sockets

Then, regardless of how many instances of the above are running, could be 1 node or multinode, the cockroachdb should be able to take incoming TCP/IP for admin. And also for multi-node, they should be able to talk with each other over TCP/IP at least... possibly even unix domain sockets (for some super freaky use cases)

@bdarnell
Copy link
Contributor

What specific use case do you have in mind for using both unix sockets and TCP?

@inieves
Copy link

inieves commented Oct 16, 2018

heh... i was working on those, sorry, i realize I was to brief... updated above.

@inieves
Copy link

inieves commented Oct 16, 2018

Worth mentioning, im actually doing this right now, literally coding it up right now, to make use of it... not he super freaky part, but the unix domain sockets mixed with TCP/IP.

Alot of devs use TCP/IP sockets just because that's all they know. But that is completely not the right way to glue processes together if they are guaranteed to exist on the same operating system. That would be like using UPS to send a note to a colleague down the hall.

Also worth mentioning, if latency is at all important, or high number of concurrent connections, than it is absolutely imperative to not push packets around TCP/IP stack if it can at all be avoided.

@inieves
Copy link

inieves commented Oct 18, 2018

Heads up...

Unlike mysql (whereby the user can specify the full path of a socket file), postgres only allows the user to specify the directory in which the socket file will end up... And then in the case of postgres, the actual name of the socket file is chosen (through some heuristic) by postgres, not the user. This affects how the user will later connect to the db... the user will either supply the full path of the socket file in the case of mysql... or the path of the directory where the socket file is, in the case of postgres.

cockroach is attempting to be line compatible with postgres... that means that the user should not tell cockroach the full path of a socket file, but rather the path of the directory in which the socket file resides. But this is not the case... The user is currently required to give cockroach the full path of a socket file.

This makes connecting to cockroach via driver (such as PHP PDO) over socket nearly impossible...

e.g. I have told cockroach that I would like a socket to be created at: /var/run/cockroach/roach.sock
When I later try to connect to cockroach via PHP PDO, I get barfed on with the following message... notice the socket path that PDO goes looking for...

Failed Connection: SQLSTATE[08006] [7] could not connect to server: Not a directory Is the server running locally and accepting connections on Unix domain socket "/var/run/cockroach/roach.sock/.s.PGSQL.5432"?

By nearly, I mean that if I tell cockroach to create a socket for me with the path: /var/run/cockroach/.s.PGSQL.5432
And then I tell the PDO driver to look for a socket in /var/run/cockroach, then I might get lucky and it might all work... through sneaky naming.

You probably want to consider using the postgres socket naming semantics, not the mysql socket naming semantics.

@inieves
Copy link

inieves commented Oct 18, 2018

I was able to work-around the socket naming issue.
I ran into a subsequent issue. Because I am running cockroach in secure mode, it is expecting connections to be SSL connections. But socket connections coming from PHP PDO are not SSL...

The socket connections should probably not require SSL, regardless of secure/insecure mode of cockroach.

@inieves
Copy link

inieves commented Oct 19, 2018

I can add more detail, based on libpq official documentation:

I found this: "sslmode is ignored for Unix domain socket communication." at https://www.postgresql.org/docs/10/static/libpq-connect.html.

I am following up on Postgres slack channel about this... But actually it is somewhat reasonable... unix domain sockets are usually trusted, so SSL encryption would seem to be unnecessary security with a negative performance overhead...

To be more compatible with Postgres, I would think that socket connections to cockroach must not have SSL, even if operating in secure mode.

@Fornax96
Copy link

I would like to use this too. The lack of TCP/IP and TLS overhead sounds like it could improve performance a lot.

@jasobrown
Copy link
Contributor

Another use for the unix domain socket is when fronting CockroachDB with an on-instance TLS terminator (think envoy or haproxy) that processes custom x509 certs for specialized authN. Instead of attempting to put the custom cert handling into cockroach, we can have the proxy do that via a custom plugin, then pass all traffic through to cockroach over the domain socket on the same instance.

@knz knz added the A-cli label Sep 24, 2019
craig bot pushed a commit that referenced this issue Jan 3, 2020
43701: sqlsmith: add support for interleaved tables r=mjibson a=mjibson

This commit adds interleaved table support to sqlsmith. When running
with the rand-tables configuration, there's a 50% chance of all tables
but the first one to get interleaved into a random other table.

Release note: None

43709: pgwire: use datadriven-based testing for HBA configs r=knz a=knz

(I'm currently working on #31113 and updating this test makes my life easier.)

This patch introduces a datadriven test runner for HBA config tests.
It also replaces the previous `TestHBA` by more exhaustive datadriven
input files, with comments that better explain the narrative of the
test.

Release note: None

43710: pgwire: improve some HBA error messages r=knz a=knz

First commit from #43709.

Before:

```
> set cluster setting server.host_based_authentication.configuration = 'host db all 0.0.0.0/32 cert';
ERROR: database must be specified as all
```

```
> set cluster setting server.host_based_authentication.configuration = 'host all all myhost cert';
ERROR: host addresses not supported
```

```
> set cluster setting server.host_based_authentication.configuration = 'host all all 0.0.0.0/32 sdfsf';
ERROR: unknown auth method "sdfsdf"
```

After:
```
> set cluster setting server.host_based_authentication.configuration = 'host db all 0.0.0.0/32 cert';
ERROR: unimplemented: per-database HBA rules are not supported
SQLSTATE: 0A000
HINT: You have attempted to use a feature that is not yet implemented.
--
Use the special value 'all' (without quotes) to match all databases.
```

```
> set cluster setting server.host_based_authentication.configuration = 'host all all myhost cert';
ERROR: unimplemented: hostname-based HBA rules are not supported
SQLSTATE: 0A000
HINT: You have attempted to use a feature that is not yet implemented.
--
List the numeric CIDR notation instead, for example: 127.0.0.1/8.
```

```
> set cluster setting server.host_based_authentication.configuration = 'host all all 0.0.0.0/32 sdfsdf'
ERROR: unimplemented: unknown auth method "sdfsdf"
SQLSTATE: 0A000
HINT: You have attempted to use a feature that is not yet implemented.
--
Supported methods: cert, cert-password, password
```

Release note (sql change): CockroachDB will now provide more
descriptive error messages and a error hint when an unsupported rule
is provided via `server.host_based_authentication.configuration`.

43711: pgwire: split the authentication code in its own files r=knz a=knz

First two commits from #43709 and #43710.

This patch splits the pgwire authentication in its own files
and adds missing explanatory comments.

No functional changes.

43713: pgwire/hba: fix a bug in the parsing logic r=knz a=knz

Release note (bug fix): There was a bug in the parsing logic for
server.host_based_authentication.configuration, where both
single-character strings, and quoted strings containing spaces and
separated by commas were not properly parsed. This would cause
e.g. rules for usernames consisting of a single characters, or
usernames containing spaces, to apply improperly.

Co-authored-by: Jordan Lewis <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
craig bot pushed a commit that referenced this issue Jan 7, 2020
43726: pgwire: clarify the special cases as pre-defined HBA rules r=knz a=knz

First commit from #43779.
This PR is needed ahead of fixing #31113, which will require supporting `local` HBA rules.

Prior to this patch, the authentication code was relatively complex to
understand:

- the code path to use the HBA configuration would only be activated
  if the cluster setting was set. There was a special case if it
  wasn't. The special case was in the per-conn auth logic.
- the special case for the root login escape was also present in the
  per-conn auth logic.
- every time the cluster setting would be updated, the result of
  parsing the setting was cached as-is, and the textual strings
  re-interpreted over and over upon every connection.

This complexity can be interpreted as a defect, for the main reason
that **the per-conn auth code should be as simple as possible** to
facilitate security audits and future maintainance. Additionally, an
argument could be made that the per-conn re-interpretation of the
result was a performance mishap.

This patch aims to simplify the per-conn auth logic by folding all the
special casing as pre-defined rules in the HBA configuration:

- the default logic (when the cluster setting is empty or invalid)
  becomes a predefined HBA configuration with just two rules:

         host all root all cert
         host all all  all cert-password

- each time the config is loaded from a cluster setting, the
  root escape is implemented by force-inserting `host all root all
  cert` at the start of the configuration.

With this in place, the auth logic can be simplified to always
and exclusively use the HBA rules.

This special casing can also be inspected in the output of
`/debug/hba_conf`.

Additionally, this patch optimizes the code by pre-normalizing upfront
when the setting is updated. Normalizing includes:

- unicode-normalizing and case-folding usernames, since the username
  upon new connection is also normalized and case-folded.
- expanding lists of multiple usernames into multiple rules, so
  that the checking code can be simplified to only check one username
  per rule.

Release note (security): The authentication code for new SQL
connections has been simplified to always use the HBA configuration
defined per `server.host_based_authentication.configuration`.

The format of this file generally follows that of `pg_hba.conf` as
defined here:
https://www.postgresql.org/docs/current/auth-pg-hba-conf.html.

Upon each configuration change, CockroachDB auto-magically inserts the
entry `host all root all cert` as a first rule, to ensure the root
user can always log in with a valid client certificate.

If the configuration is set to empty, or found to be invalid in the
cluster setting, the following default configuration is automatically
used:

         host all root all cert
         host all all  all cert-password

At any moment the current configuration on each node can be inspected
using the `/debug/hba_conf` URL on the HTTP endpoint.

The list of valid authentication methods is currently:

- `cert`, for certificate-based authentication over a SSL connection
  exclusively;
- `cert-password`, which allows either cert-based or password-based
  authentication over a SSL connection;
- `password` for password-based authentication over a SSL connection;
- `gss` for Kerberos-based authentication over a SSL connection,
  enabled when running a CCL binary and an Enterprise license.

In effect CockroachDB treats all the `host` rules as `hostssl`, and
behaves as per a default of `hostnossl all all all reject`.

It is not currently possible to define authentication rules over
non-SSL connections: as of this writing, non-SSL connections are only
possible when running with `--insecure`, and on insecure nodes all the
authentication logic is entirely disabled.

This behavior remains equivalent to previous CockroachDB versions,
and this change is only discussed here for clarity.


Co-authored-by: Raphael 'kena' Poss <[email protected]>
craig bot pushed a commit that referenced this issue Jan 9, 2020
43843: pgwire/hba: parse connection type as bit field, not string r=knz a=knz

Ahead of #31113.

This makes the in-memory data more compact and introduces proper conn
type matching code.

Release note: None

Co-authored-by: Raphael 'kena' Poss <[email protected]>
@knz knz removed the E-easy Easy issue to tackle, requires little or no CockroachDB experience label Jan 9, 2020
@knz knz self-assigned this Jan 9, 2020
craig bot pushed a commit that referenced this issue Jan 9, 2020
43837: pgwire: refactor+simplify the connection set-up code  r=knz a=knz

Ahead of handling #31113.
 
The early SQL connection start-up sequence was a bit too deep in
nested conditionals.

The conditionals historically grew as special cases on top of special
cases; however taking a step back it turns out the overall flow of the
start-up is quite simple and the code should reflect that. (This makes
it both more readable and more easy to troubleshoot and extend.)

No functional change.

Co-authored-by: Raphael 'kena' Poss <[email protected]>
@craig craig bot closed this as completed in b1b3674 Jan 9, 2020
@inieves
Copy link

inieves commented Jan 9, 2020

bravo!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-server Relating to the KV-level RPC server C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-community Originated from the community
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants