Handling TCP traffic restricted by a firewall #10637

j-xiong · 2024-12-13T18:32:11Z

Comments copied from #10534:

There are some specific use cases where we may not want one side of communication to initiate connections, namely when we know that one side of our configuration is being heavily restricted by a firewall. To prevent indefinite hangs with certain operations, such as RMA reads and writes, introduce a provider specific flag to trigger an error if there is not already an established connection. In this case, the application can force the connection from the other direction.

shefty reviewed 2 weeks ago
man/fi_tcp.7.md
FI_TCP_NO_CONNECT
: This flag indicates that operations should fail if there is no
existing connection to the remote peer. In such case, an FI_ENOTCONN
error should be expected.
Member
@shefty shefty 2 weeks ago
I would not make this a flag that's checked on every operation (i.e. it's okay to connect if it's a send, but not a write?). It makes more sense either applied to the entire rdm endpoint or to a specific peer.

Member
Author
@ooststep ooststep 2 days ago •
in the target problematic scenario, the rdm is used for some peers where this flag is needed and some peers where it would not be, so applying to the entire rdm was undesirable.

Member
@shefty shefty 2 days ago
This doesn't make sense. An RDM endpoint is unconnected. Exposing low-level connection implementation details is not desirable. There could be multiple connections to the same peer. The connection might be in another process (e.g. Pony Express or SNAP or whatever it's called). It could be in the kernel (RDS).

This still isn't a per operation flag. At best it's per peer, but even that use case seems questionable. It's like putting half of an RDM endpoint behind a firewall, but the other half ignores it. Apply firewall semantics to the entire RDM endpoint. If some peers are outside the firewall, but some are inside, require 2 endpoints with some sort of per EP configuration.

Member
@soumagne soumagne 2 days ago
I think I also agree with the first part (ie. exposing low-level connection is a bad idea) though providing 2 types of endpoints is also difficult for the user. To provide some context as to what problem we were trying to solve, in a client-server configuration where the client is behind a firewall, we have cases where for example client A would send a message to server A, and server A would then attempt to do an emulated tcp RMA to client B through an fi_write() call (without prior connection established from client B to server A). In that case, it seems that with the tcp provider server A remains stuck attempting to establish a connection and doing an fi_cancel on the RMA does not seem to be able to complete, as it's not supported currently by the tcp provider. So what we wanted was a way of having some completion and error being returned when server A is not able to reach client B. I'm opened to other means but having to manage 2 endpoints seems also cumbersome, I would also be happy though if we don't have to expose any connection logic.

@jolivier23 jolivier23 2 days ago •
@shefty Just to add a bit of context beyond what Jerome said, this use case came from Parallelstore (Google's DAOS service). We control only the server side of the equation and rely on the user to do client configuration (as we don't control their VMs, network configuration, and processes). Opening the firewall to server to client connections is not common for services and requires them to do it explicitly or their writes will simply hang. We want to remove this requirement as it is becoming a major scale issue for onboarding new customers because users don't read documentation.

Member
@shefty shefty 2 days ago
Client A is telling a server to issue an RMA write to client B? A knows the memory information on client B, and B is expecting this transfer? Is this the model used by the app? Because if so, weird.

In any case, the entire RDM ep is behind a firewall, but connections can only go one direction? If this is the case, then I think some generic EP setting might be able to convey this, such that the implementation can adjust. The server could probably just fail the operation (asynchronously makes the most sense) in this situation.

I assume A is somehow notified what happened between the server and B?

Member
@soumagne soumagne 2 days ago
oh yes sorry I was trying to keep it simple and in the end that didn't make much sense, the real model is client A tells server A, which tells server B about its memory information, which then issues an RMA to client A.

Member
@soumagne soumagne 2 days ago
In any case, the entire RDM ep is behind a firewall, but connections can only go one direction? If this is the case, then I think some generic EP setting might be able to convey this, such that the implementation can adjust. The server could probably just fail the operation (asynchronously makes the most sense) in this situation.

yeah I think adding a setting to the server EP would probably be fine in this case, although we'd still want servers to connect to each other using the same endpoint and not use a separate endpoint for intra-communication.

Member
@shefty shefty 2 days ago
I think we need to figure out the model for how firewalls should be handled, ignoring the implementation. Maybe this is related to NAT as well.

An EP is assigned a single address. Can traffic to that address go through the firewall? Can traffic from that address go through the firewall?

Assume the implementation is using UDP (or some other connectionless transport). Does that change anything about how the firewall must be configured?

If I'm understanding this now, what's being requested from the API is: Traffic is allowed to reach an EP (through a firewall), but traffic from that EP cannot reach the target ... unless it's already received some message from the target. This assumes the EP is responsible for remembering which target addresses it has received data from, and that those addresses still reference the correct target process.

From the viewpoint of the API, this still seems weird. I think the weirdness comes from the firewall acting on the lower-level connections, which the EP hides.

This is some per peer setting. But if the application is responsible for providing the value of this setting, then it has an idea how to manage this. In this example, server B could see if it knows A. If not, fail sending to it.

One alternative is to configure the EP with firewall data only, and let the provider determine what to do with it. In this case, a request to send to X checks the firewall config to see if it's possible. Maybe X is another server within the firewall. If so, connect and send. If X is outside the firewall, fail the send. I don't know what the firewall data would look like but providing it via some configuration file seems reasonable. I.e. don't make this an application problem.

@jolivier23 jolivier23 2 days ago
That alternative is a bit is tricky. The client may be older than the server and we want old clients to continue behaving as they do now (e..g if the user has already configured their firewall correctly and the client doesn't support getting back an error and retrying, we want it to just continue working).

Member
@shefty shefty yesterday
For the alternative, I would only configure the EPs at the server, as that's where the actual problem occurs. The client behavior is unchanged.

@jolivier23 jolivier23 11 hours ago
I think the problem is really on the client. The client endpoint is the one behind the firewall and can't be reached (unless the server has already connected to it). Can we encode something into the client side URI that would indicate it can't be reached? The advantage to that approach from our end is that then the client could have complete control over telling the server whether or not it can handle the error (e.g. can support getting back an error indicating that the server couldn't connect and handle it appropriately)

Member
@shefty shefty 1 hour ago
It's the server's behavior that should change.

From the viewpoint of the client, everything works. It wants to talk to server A and can do so. That server A wants to pass off the response to some other system that the client doesn't know about is related to the storage architecture, not the client SW.

I don't think pushing this detail into the apps is the best option. But you can work-around this in the client by having the client send some sort of 'hello' message to every storage server during initialization -- to poke holes through the firewall. That pushes the burden onto every client app that might want to use DAOS.

Alternatively, you can configure the server SW to be firewall aware, so that it avoids forwarding requests to servers not already communicating with the client.

Or, change the protocol around handling firewalls. Have server A tell the client to retry its request with server B, rather than forwarding it internally.

There are likely other options for this. But I would avoid picking one which encoded these details in the SW API.

Member
@shefty shefty 1 hour ago
It would probably be better if this thread were copied into an issue for continued discussion, rather than attaching it to this PR.

jolivier23 · 2024-12-15T21:42:08Z

Just to follow up on the various options you posed.

I'm not sure how you don't expose this to the client (and by client, I mean the DAOS storage stack client, not the application).

First of all, our protocol has us sending the write to a server target that has been algorithmically identified as the "leader" from the client side based on what we are writing. The client can't send the same request to another server as it would break this expectation. The client can potentially contact all server targets to initiate connections but this is unreliable. The server could restart, for example, without the client knowing it happened and leave the connection in a broken state. Older clients will not be able to handle any new error (and if they are working, there is no reason to do anything becuase the firewall will have already been correctly configured). If a new client, however, receives some error (can't connect) or some such, it can ping relevant server targets before retrying.

As for server not forwarding requests to other servers not already connected to the client, how would the server know whether the client has already talked to the other servers?

Also, the connection is potentially per client. One could envision a scenario where one set of VMs has the firewall setup to allow connections and another set does not. It just seems simpler to me if the client can say, "server can/can't connect to me".

jolivier23 · 2024-12-15T21:50:06Z

Another key, and I'm not sure I've made this clear. We control the server only. We always release new clients but the older ones the user already has are just expected to work so we can't introduce new errors on the server side because a user might, at any time, create a new instance and will expect their existing clients to just work.

shefty · 2024-12-16T06:08:24Z

I don't know the application. I'm only brainstorming ideas. As far as I can tell, the most efficient solution requires the handling at the application level. By the time libfabric can fail a request, a bunch of work has already been done.

This is the problem description as I currently understand it:

Client 1 sends a request to Server A.  Server A forwards the request to Server B,
which is expected to respond to Client 1.

I was assuming existing client SW would need to just work, but server SW could change. Unless the client is already communicating a flag which means: "Hey, server that I'm talking to, if you forward this to another server for processing, which will directly send me the response, just know that the other server must already have open communication with me for this to work", then I don't see how it works. In any case, such a flag is something encoded into the protocol above libfabric.

Server A doesn't need to know if Server B is communicating with Client 1. A just needs to tell B to check and, if not, A and/or B do something else.

Basically, the only thing libfabric can do is fail some operation with EUNREACHABLE or some similar error. It's up to the server SW to figure out how to handle the client's request. And it seems that Server B can determine if it knows about Client 1 prior to posting a data transfer. If the client's address is already in its AV, it should be good to go. If the client address is not, then B should only insert it if it is dealing with a message received directly from the client. If you have a magic flag from the client which is unset (meaning it's okay to connect), then B ignores these checks and just sends.

Ideally, any solution must work with other endpoint implementations, such as UDP/IP.

If the provider has a firewall config file, it can be checked when addresses are inserted into the AV. This could result in AV insert failing. Another option is to pass the 'no-connect' flag with AV insert. That too could result in insert failing if a connection didn't already exist. In either case, this is a per peer address setting, not a per data transfer setting.

jolivier23 · 2024-12-16T16:57:28Z

Right, existing client should just work.

To be absolutely clear, my application is just DAOS. The two phase commit protocol in DAOS will send an update request to the transaction leader for the redundancy group (e.g. parity target for an erasure coded object) and that server target forwards the request to the other targets in the group. Each target will do a bulk_get to transfer the data from the original client.

Per peer address setting would be fine, I think. It seems reasonable that a client enpoint will know at startup whether or not it can accept server initiated connections. In fact, that would be preferrable if we could somehow configure things so that the client endpoint is "unreachable" from a given set of endpoints (daos server targets in this case) at client startup.

What I initially proposed here was to have the server fail any operation that requires it to connect to such a client in which case, DAOS will send a ping RPC to the non-leader targets before retrying the update.

j-xiong mentioned this issue Dec 13, 2024

prov/tcp: introduce TCP_NO_CONNECT flag #10534

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling TCP traffic restricted by a firewall #10637

Handling TCP traffic restricted by a firewall #10637

j-xiong commented Dec 13, 2024

jolivier23 commented Dec 15, 2024

jolivier23 commented Dec 15, 2024

shefty commented Dec 16, 2024

jolivier23 commented Dec 16, 2024