-
Notifications
You must be signed in to change notification settings - Fork 389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prov/tcp: introduce TCP_NO_CONNECT flag #10534
base: v1.22.x
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -98,6 +98,9 @@ typedef void xnet_profile_t; | |
#define XNET_MIN_MULTI_RECV 16384 | ||
#define XNET_PORT_MAX_RANGE (USHRT_MAX) | ||
|
||
/* provider specific op flags */ | ||
#define TCP_NO_CONNECT FI_TCP_NO_CONNECT | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any reason not to use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The provider specific flags are typically defined within the provider header files without any FI prefixing and used internally. I wanted to have a definition here as a placeholder so that it's obvious there are existing provider op flags that need to be accounted for. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ooststep This is true when the API flag differs from the provider flag, or the scope of the flags are different. E.g. the API flags cover a 64-bit range, but the provider flag maps to some 16-bit range in the protocol. |
||
|
||
extern struct fi_provider xnet_prov; | ||
extern struct util_prov xnet_util_prov; | ||
void xnet_init_infos(void); | ||
|
@@ -304,8 +307,8 @@ struct xnet_rdm { | |
|
||
int xnet_rdm_ep(struct fid_domain *domain, struct fi_info *info, | ||
struct fid_ep **ep_fid, void *context); | ||
ssize_t xnet_get_conn(struct xnet_rdm *rdm, fi_addr_t dest_addr, | ||
struct xnet_conn **conn); | ||
ssize_t xnet_get_conn(struct xnet_rdm *rdm, fi_addr_t addr, | ||
struct xnet_conn **conn, uint64_t flags); | ||
struct xnet_ep *xnet_get_rx_ep(struct xnet_rdm *rdm, fi_addr_t addr); | ||
void xnet_freeall_conns(struct xnet_rdm *rdm); | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not make this a flag that's checked on every operation (i.e. it's okay to connect if it's a send, but not a write?). It makes more sense either applied to the entire rdm endpoint or to a specific peer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the target problematic scenario, the rdm is used for some peers where this flag is needed and some peers where it would not be, so applying to the entire rdm was undesirable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't make sense. An RDM endpoint is unconnected. Exposing low-level connection implementation details is not desirable. There could be multiple connections to the same peer. The connection might be in another process (e.g. Pony Express or SNAP or whatever it's called). It could be in the kernel (RDS).
This still isn't a per operation flag. At best it's per peer, but even that use case seems questionable. It's like putting half of an RDM endpoint behind a firewall, but the other half ignores it. Apply firewall semantics to the entire RDM endpoint. If some peers are outside the firewall, but some are inside, require 2 endpoints with some sort of per EP configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I also agree with the first part (ie. exposing low-level connection is a bad idea) though providing 2 types of endpoints is also difficult for the user. To provide some context as to what problem we were trying to solve, in a client-server configuration where the client is behind a firewall, we have cases where for example client A would send a message to server A, and server A would then attempt to do an emulated tcp RMA to client B through an fi_write() call (without prior connection established from client B to server A). In that case, it seems that with the tcp provider server A remains stuck attempting to establish a connection and doing an fi_cancel on the RMA does not seem to be able to complete, as it's not supported currently by the tcp provider. So what we wanted was a way of having some completion and error being returned when server A is not able to reach client B. I'm opened to other means but having to manage 2 endpoints seems also cumbersome, I would also be happy though if we don't have to expose any connection logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shefty Just to add a bit of context beyond what Jerome said, this use case came from Parallelstore (Google's DAOS service). We control only the server side of the equation and rely on the user to do client configuration (as we don't control their VMs, network configuration, and processes). Opening the firewall to server to client connections is not common for services and requires them to do it explicitly or their writes will simply hang. We want to remove this requirement as it is becoming a major scale issue for onboarding new customers because users don't read documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the alternative, I would only configure the EPs at the server, as that's where the actual problem occurs. The client behavior is unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem is really on the client. The client endpoint is the one behind the firewall and can't be reached (unless the server has already connected to it). Can we encode something into the client side URI that would indicate it can't be reached? The advantage to that approach from our end is that then the client could have complete control over telling the server whether or not it can handle the error (e.g. can support getting back an error indicating that the server couldn't connect and handle it appropriately)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the server's behavior that should change.
From the viewpoint of the client, everything works. It wants to talk to server A and can do so. That server A wants to pass off the response to some other system that the client doesn't know about is related to the storage architecture, not the client SW.
I don't think pushing this detail into the apps is the best option. But you can work-around this in the client by having the client send some sort of 'hello' message to every storage server during initialization -- to poke holes through the firewall. That pushes the burden onto every client app that might want to use DAOS.
Alternatively, you can configure the server SW to be firewall aware, so that it avoids forwarding requests to servers not already communicating with the client.
Or, change the protocol around handling firewalls. Have server A tell the client to retry its request with server B, rather than forwarding it internally.
There are likely other options for this. But I would avoid picking one which encoded these details in the SW API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably be better if this thread were copied into an issue for continued discussion, rather than attaching it to this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#10637