connector: use tcp socket for communicating with connectors #682

mdibaiee · 2022-09-20T11:53:10Z

Description:

use a tcp socket for communicating with connectors
I could not test UDS due to Support for sharing unix sockets docker/for-mac#483, but adding it should not be too hard if necessary
We need a way to migrate our existing deployments to this new method without disruption in service, so I may have to change this code so that it supports both methods somehow, open to suggestions on that
At the moment this pull-request means network-tunnel doesn't work anymore since I'm bypassing connector-proxy completely. Moving network-tunnel to an independent binary should also be worked before we can fully migrate.

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

(anything that might help someone review this PR)

This change is

williamhbaker · 2022-09-21T16:22:04Z

go/connector/run.go

+	if err != nil {
+		return fmt.Errorf("splitting socket address: %w", err)
+	}
+	cmdStdin.Write([]byte(fmt.Sprintf("tcp %s:%s\n", DockerHostInternal, port)))


Would it be possible to provide this as an argument to the connector rather than sending it over stdin? I'm not sure if it's needed to be used as a "ready" signal here with the section above waiting to receive a connection on the socket before doing anything with it. Using an argument instead of stdin seems simpler to me, but not an absolute must-do (especially if it doesn't work).

williamhbaker · 2022-09-21T17:55:29Z

A couple of general thoughts/questions:

I would tend to think supporting connectors running both ways is going to be necessary (Do materialization connectors use this same code path? I am assuming they do). Maybe we could have the runtime query the container somehow before deciding how to interact with it - something simple like a docker run --entrypoint … on the image to check for the existence of a sentinel file that only connectors supporting sockets have - then branching off to the appropriate method to handle the interactions.
I’m curious if you considered having the connector start the TCP socket and the runtime connect to that? That feels like it would be more intuitive to me, but I haven’t thought it through extensively and there might be all kinds of tricky problems it would create.

willdonnelly · 2022-09-21T18:04:48Z

[...] having the connector start the TCP socket and the runtime connect to that

Combined with Johnny's point last week about having a "Ready" message on connector stdout, it feels like there could be a coherent migration plan along the lines of:

Modify Flow to ignore a READY ...\n message from connector stdout.
Modify connectors to emit READY stdio\n at startup.
Add logic to Flow to wait for and process the READY message, such that READY tcp://1.2.3.4:5678\n causes it to use gRPC over the specified socket instead of stdin/stdout.
Modify connectors to open a TCP socket and emit READY tcp://[...]\n to signal Flow to connect over TCP. Potentially also support for Unix sockets?
Optionally, remove support for the gRPC-over-stdio transport after it's not being used anywhere.

jgraettinger · 2022-09-21T18:14:42Z

An issue with READY is that docker run should be in on the game: it has to be told explicitly to publish the port in question.

Another option, slightly less elegant but simpler (and more transferable to firecracker?), would be to use an image label which marks that the container supports TCP transport and identifies the specific port on which it'll be listening. The runtime looks for this label, port-forwards it in the docker run invocation and dials it with a retry backoff and reasonable timeout.

mdibaiee · 2022-09-23T10:37:35Z

I'm not sure if I yet see the supporting reason for having the connector listen on a port rather than the runtime (other than it being more intuitive, which I personally didn't have the same sense).

williamhbaker · 2022-09-23T12:54:33Z

I'm not sure if I yet see the supporting reason for having the connector listen on a port rather than the runtime (other than it being more intuitive, which I personally didn't have the same sense).

One thought: I think it would make connector development & testing simpler. It would be nice to be able to start up a connector container and interact with it via its TCP socket, as opposed to needing to start a TCP server prior to starting the container, and then interacting with the connector through that TCP server.

jgraettinger · 2022-09-23T14:06:32Z

Another reason is security model: we don't want to give untrusted code a capability to dial out to trusted code. The trusted code should dial into the untrusted sandbox.

mdibaiee · 2022-09-26T13:33:40Z

Okay, now:

If a docker container includes the label FLOW_TCP_PORT, then communication is done through that port, with runtime port-forwarding that port, and dialing to it. One note here is: at the moment I'm port-forwarding to the same port on the runtime side, however I'm not sure if this is okay. Will there potentially be multiple connector runs in the same environment, in which case I'll port-forward to a random port, or is it safe to port-forward as-is?
If there is no such label, we continue using stdio for communication (hence all end-to-end tests passing)

There is still no network-tunnel binary, but with this we can start migrating simple airbyte connectors, and our own connectors by updating our libraries. We can meanwhile work on the network tunnel binary, and once that's in, migrate the advanced connectors which need the tunnel.

go/connector/run.go

willdonnelly

LGTM

go/connector/run.go

jgraettinger · 2022-09-26T17:54:42Z

If a docker container includes the label FLOW_TCP_PORT, then communication is done through that port, with runtime port-forwarding that port, and dialing to it. One note here is: at the moment I'm port-forwarding to the same port on the runtime side, however I'm not sure if this is okay. Will there potentially be multiple connector runs in the same environment, in which case I'll port-forward to a random port, or is it safe to port-forward as-is?

🤔 there would be multiple connectors running, so we'll have to account for this.

Is it possible for us to not port-forward, but instead to have prior knowledge of the container name (with DNS resolution provided by docker) or of the container IP, which can be directly dialed at the advertised port?

mdibaiee · 2022-09-26T18:03:21Z

🤔 there would be multiple connectors running, so we'll have to account for this.

Is it possible for us to not port-forward, but instead to have prior knowledge of the container name (with DNS resolution provided by docker) or of the container IP, which can be directly dialed at the advertised port?

I will look into this to see if it's possible 🤔

go/connector/run.go

mdibaiee · 2022-09-27T10:47:49Z

Thank you all for your comments,

I checked, it's not possible to connect to the container using its IP, the network is shared between containers, but not with the host (unless you do --network=host which is another story): https://stackoverflow.com/questions/38872328/connecting-to-docker-container-using-container-ip
So now, I request a random port to be assigned for port-forwarding, and then find that port after the container is run, this introduces another layer of retry mechanism where we wait for the container to run and tell us about its ContainerID, so we can ask for its port.
I address some of the other comments in cleaning up the code and addressing some edge cases, thanks for those 👍🏽

mdibaiee force-pushed the mahdi/connector-socket branch from 67e17a3 to 18ea9fe Compare September 21, 2022 10:32

connector: use tcp socket for communicating with connectors

808a659

mdibaiee force-pushed the mahdi/connector-socket branch from 18ea9fe to 808a659 Compare September 21, 2022 11:06

mdibaiee changed the title ~~connector: use uds socket for communicating with connectors~~ connector: use tcp socket for communicating with connectors Sep 21, 2022

mdibaiee marked this pull request as ready for review September 21, 2022 12:51

mdibaiee requested review from jgraettinger, psFried, willdonnelly and williamhbaker and removed request for jgraettinger September 21, 2022 13:35

williamhbaker reviewed Sep 21, 2022

View reviewed changes

connector: support both tcp and stdio

0a18b43

mdibaiee force-pushed the mahdi/connector-socket branch from a3e771d to 0a18b43 Compare September 26, 2022 12:31

williamhbaker reviewed Sep 26, 2022

View reviewed changes

go/connector/run.go Outdated Show resolved Hide resolved

willdonnelly approved these changes Sep 26, 2022

View reviewed changes

go/connector/run.go Outdated Show resolved Hide resolved

go/connector/run.go Outdated Show resolved Hide resolved

go/connector/run.go Outdated Show resolved Hide resolved

go/connector/run.go Outdated Show resolved Hide resolved

williamhbaker approved these changes Sep 26, 2022

View reviewed changes

go/connector/run.go Show resolved Hide resolved

connector: port-forward to random port on host and inspect it after

bc0fe0f

connector/run: add --interactive for stdio connectors

d862dd2

mdibaiee force-pushed the mahdi/connector-socket branch from 1a2c77f to d862dd2 Compare September 27, 2022 14:15

This was referenced Sep 27, 2022

connector-proxy: make stdin/stdout streams independent to avoid deadlocks #628

Closed

move connector-proxy to airbyte-to-flow, use socket instead of stdio #685

Closed

mdibaiee merged commit 4f52237 into master Sep 28, 2022

mdibaiee deleted the mahdi/connector-socket branch September 28, 2022 07:16

oliviamiannone added the docs complete / NA No (more) doc work related to this PR label Oct 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

connector: use tcp socket for communicating with connectors #682

connector: use tcp socket for communicating with connectors #682

mdibaiee commented Sep 20, 2022 •

edited

Loading

williamhbaker Sep 21, 2022

williamhbaker commented Sep 21, 2022

willdonnelly commented Sep 21, 2022 •

edited

Loading

jgraettinger commented Sep 21, 2022

mdibaiee commented Sep 23, 2022

williamhbaker commented Sep 23, 2022

jgraettinger commented Sep 23, 2022

mdibaiee commented Sep 26, 2022

willdonnelly left a comment

jgraettinger commented Sep 26, 2022

mdibaiee commented Sep 26, 2022

mdibaiee commented Sep 27, 2022

connector: use tcp socket for communicating with connectors #682

connector: use tcp socket for communicating with connectors #682

Conversation

mdibaiee commented Sep 20, 2022 • edited Loading

williamhbaker Sep 21, 2022

Choose a reason for hiding this comment

williamhbaker commented Sep 21, 2022

willdonnelly commented Sep 21, 2022 • edited Loading

jgraettinger commented Sep 21, 2022

mdibaiee commented Sep 23, 2022

williamhbaker commented Sep 23, 2022

jgraettinger commented Sep 23, 2022

mdibaiee commented Sep 26, 2022

willdonnelly left a comment

Choose a reason for hiding this comment

jgraettinger commented Sep 26, 2022

mdibaiee commented Sep 26, 2022

mdibaiee commented Sep 27, 2022

mdibaiee commented Sep 20, 2022 •

edited

Loading

willdonnelly commented Sep 21, 2022 •

edited

Loading