-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
doc: Add concept document for Bidirectional Data Transfer #1398
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
# Overview: Endpoint Topologies | ||
|
||
Bidirectional data transfers involve transmissions that can be sent by either the provider or consumer during the | ||
transfer's lifetime. The provider sends data over a forward channel, while the client uses a response channel to send | ||
data related to the forward transmission. For example, a provider sends parts data over the forward channel, while the | ||
consumer sends data related to errors in the forward transmission via the response channel. | ||
|
||
Bidirectional data transfers should be modeled using a single Dataspace Protocol *offer* and *contract agreement*. In | ||
other words, a single offer represents the ability to send both forward and response messages, while an active contract | ||
agreement can be used to initiate the transfer. | ||
|
||
Bidirectional flows can be implemented using a variety of wire protocols, for example, HTTP or a messaging layer. | ||
However, all scenarios correspond to one of two endpoint topologies: | ||
|
||
- The consumer offers the forward channel endpoint, and the provider offers the response channel endpoint. | ||
- The provider offers both the forward and response channel endpoints. | ||
|
||
The Dataspace Protocol (DSP) defines two categories of data transfer: *push* and *pull*. The endpoint topologies | ||
correlate to these categories as follows: | ||
|
||
| Provider Push | Consumer Pull | | ||
|---------------------------------------------------------------------------------------------|------------------------------------------------------------| | ||
| Consumer offers the forward channel endpoint; provider offers the response channel endpoint | Provider offers the forward and response channel endpoints | | ||
|
||
**In each case, the provider always offers the response channel.** | ||
|
||
## The Data Plane | ||
|
||
The Data Plane establishes data transfer communication channels and endpoints using a *wire protocol*. There are many | ||
ways to do this, two of which are described below. | ||
|
||
**HTTP Endpoints** | ||
|
||
The forward and response channels are separate endpoints. The endpoints may be static, where all messages in a | ||
particular direction are sent to the same endpoint, which then uses a correlation mechanism to process them, for | ||
example, `https://test.com/forwardChannel` and `https//test.com/responseChannel`. Or, the endpoints may be dynamic, | ||
where a path part contains a correlation ID, for example, `https://test.com/transferId/forwardChannel` | ||
and `https://test.com/transferId/responseChannel`. | ||
|
||
**Queues and Pub/Sub** | ||
|
||
In this scenario, the forward channel is a *queue* or a pub/sub *topic* while the response channel is a *queue*. This is | ||
a typical architecture used when designing systems with Message-Oriented-Middleware. | ||
|
||
### Required Changes to the Data Plane Framework | ||
|
||
The required changes to the Data Plane Framework to support bidirectional data transfers are minimal. | ||
|
||
#### Response Endpoint `DataAddress` | ||
|
||
The `DataAddress` in the `DataFlowResponseMessage` must contain a `https://w3id.org/edc/v0.0.1/ns/responseChannel` | ||
property of type `DataAddress`. This `DataAddress` follows the same format as the outer `DataAddress` and represents the | ||
response channel endpoint. For example, it may contain authorization data the consumer uses to access the response | ||
channel endpoint. | ||
|
||
#### The DataPlaneManager | ||
|
||
The `DataPlaneManagerImpl` and its collaborators will need to be refactored to generate response | ||
channel `DataAddresses`: | ||
|
||
- `DataPlaneManagerImpl` must be modified to return an EDR in the case of a provider PUSH. This EDR will only contain | ||
a `https://w3id.org/edc/v0.0.1/ns/responseChannel` entry. The manager will delegate to `DataPlaneAuthorizationService` | ||
to generate the response. | ||
- `DataPlaneAuthorizationServiceImpl` must be enhanced to support `responseChannel` generation. This should be keyed off | ||
of the transfer type. As part of this process, a `DataPlaneAuthorizationServiceImpl.createEndpointDataReference` must | ||
generate a `responseChannel` endpoint by delegating to a new | ||
method `PublicEndpointGeneratorService.generateCallbackFor(sourceDataAddress).` Access Tokens can be generated | ||
from `DataPlaneAccessTokenService`. | ||
|
||
#### Technical Considerations | ||
jimmarino marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The above changes can work with both DSP pull and push scenarios. However, it is important to note a potential race | ||
condition that could be introduced in PUSH transfers. Namely, provider-pushed data could potentially arrive before the | ||
DSP start message containing the response channel `DataAddress` is received by the client. This is due to the nature of | ||
asynchronous communications. In this case, the client would either need to skip sending a response or store the response | ||
messages to send when it receives the response channel `DataAddress`. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some remarks:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Comments:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do I understand that right, a data transfer with a response channel would require a new transfer type, i.e., it duplicates the amount of transfer types, right? As far as I understood the original requirement, the channel is about giving feedback on the received data. E.g., to indicate, that the data quality is poor. Is this really related to the data transfer, as actually, there is an observed mismatch on the consumer side between the expectations based on the offer and the concrete data received. Wouldn't that be actually a concept on the DSP level, as the feedback is about contract fulfillment. If the data is broken or incomplete, the consumer could simply reinitiate the data transfer, so that is not really a reason to use the response channel, right? So it is really about a higher level concept on the received data, imho. On the other hand side, there could be many data transfers on the same contract, so if one transfer lead to poor quality, there is reason to not mark the whole contract with the feedback issue. Still, the concept only describes a form of sending data back to the provider, but the intention of the requirement was to give feedback on the received data. In my opinion, this still requires a reaction on the data on the provider side. Something like a label on the data transfer or a special state. Even, if the message is not formalized at all, an indicator, that there is feedback on the data transfer should be part of the concept. In the current state, the relation between the send data and the metadata on the feedback channel gets lost after it is received. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, the concept is about how to represent a bidirectional data transfer. It does not involve qualities of service such as reliability, which involve retransmission (for example, all reliable messaging protocols require idempotency). Qualities of service are implemented by the underlying wire protocol used for the forward and response channels, for example, AMQP. The response channel would never be used to send quality of service information back to the prodivder. Rather, one use could be to send information about errors in the data sent via the forward channel. The scope of this concept should be only to describe how forward and back channels are established between a consumer and producer. It should not discuss what purposes clients and producers use those channels for. That is the job of the particular transfer protocol that would use this feature. |
||
The response channel lifetime is tied to the forward channel. For example, when the forward channel is closed, the | ||
response channel will also be closed. | ||
|
||
## Catena-X Standardization and Tractus-X Support | ||
|
||
To achieve interoperability, Catena-X would need to standardize a bidirectional transfer type similar to its support of | ||
HTTP push/pull and S3 types. This could then be implemented in Tractus-X EDC. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly is sourceDataAddress in this case? I stuggle with the synchronicity between push and pull scenario here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To understand this requires specialized EDC knowledge. The sourceDataAddress is the reference to the backend asset being transferred. This address is internal to the EDC deployment and not available externally (e.g. to a consumer). The endpoint generation service is responsible for interpreting the address and mapping a publically available endpoint that is associated with retrieving the data.