-
Notifications
You must be signed in to change notification settings - Fork 14
docs: improved filter specification #562
Changes from 8 commits
9dca1ca
d2d80b6
4f43a25
29ae394
cd3e880
7c47eb0
12793c5
0f02d70
7f5cc79
d24b589
2c74eed
cb35b95
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -16,7 +16,9 @@ contributors: | |||||
|
||||||
# Content filtering | ||||||
|
||||||
**Protocol identifier***: `/vac/waku/filter/2.0.0-beta1` | ||||||
**Protocol identifiers**: | ||||||
- _filter-subscribe_: `/vac/waku/filter-subscribe/2.0.0-beta1` | ||||||
- _filter-push_: `/vac/waku/filter-push/2.0.0-beta1` | ||||||
|
||||||
Content filtering is a way to do [message-based | ||||||
filtering](https://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern#Message_filtering). | ||||||
|
@@ -61,77 +63,161 @@ The following are not considered as part of the adversarial model: | |||||
## Protobuf | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I prefer the term "Wire format". It is more generic. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree! Note that this PR specifically avoids format/heading/layout changes so that only the suggested wire format change can be reviewed. Once done, next step would be to adhere to the template and add theory, implementation, etc. sections. |
||||||
|
||||||
```protobuf | ||||||
message FilterRequest { | ||||||
bool subscribe = 1; | ||||||
string topic = 2; | ||||||
repeated ContentFilter contentFilters = 3; | ||||||
|
||||||
message ContentFilter { | ||||||
string contentTopic = 1; | ||||||
syntax = "proto3"; | ||||||
|
||||||
// 12/WAKU2-FILTER rfc: https://rfc.vac.dev/spec/12/ | ||||||
package waku.filter.v2; | ||||||
|
||||||
// Protocol identifier: /vac/waku/filter-subscribe/2.0.0-beta1 | ||||||
message FilterSubscribeRequest { | ||||||
enum FilterSubscribeType { | ||||||
SUBSCRIBER_PING = 0; | ||||||
alrevuelta marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
SUBSCRIBE = 1; | ||||||
UNSUBSCRIBE = 2; | ||||||
UNSUBSCRIBE_ALL = 3; | ||||||
} | ||||||
|
||||||
string request_id = 1; | ||||||
FilterSubscribeType filter_subscribe_type = 2; | ||||||
|
||||||
// Filter criteria | ||||||
optional string pubsub_topic = 10; | ||||||
alrevuelta marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does it mean we can only have subscriptions for one pubsub topic for a given remote node? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Answered elsewhere, but adding here for completeness: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't this NOT be optional? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that this is only optional within the context of protobuf field design: some types of FilterSubscribeRequests (e.g. SUBSCRIBER_PING, UNSUBSCRIBE_ALL) SHOULD NOT have a populated pubsub_topic. Within the context of the protocol itself this field is conditionally mandatory (which would require optional in the protobuf), e.g. the pubsub_topic MUST be populated, if the filter_subscribe_type is SUBSCRIBE or UNSUBSCRIBE. |
||||||
repeated string content_topics = 11; | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should write a recommended maximum number of content topics per subscription request. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree and good point! Will add that to the implementation suggestions in a follow-up PR. |
||||||
} | ||||||
|
||||||
message MessagePush { | ||||||
repeated WakuMessage messages = 1; | ||||||
message FilterSubscribeResponse { | ||||||
string request_id = 1; | ||||||
uint32 status_code = 10; | ||||||
alrevuelta marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
optional string status_desc = 11; | ||||||
} | ||||||
|
||||||
message FilterRPC { | ||||||
string requestId = 1; | ||||||
FilterRequest request = 2; | ||||||
MessagePush push = 3; | ||||||
// Protocol identifier: /vac/waku/filter-push/2.0.0-beta1 | ||||||
message MessagePush { | ||||||
WakuMessage waku_message = 1; | ||||||
jm-clius marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
optional string pubsub_topic = 2; | ||||||
} | ||||||
``` | ||||||
|
||||||
#### FilterRPC | ||||||
|
||||||
A node MUST send all Filter messages (`FilterRequest`, `MessagePush`) wrapped inside a | ||||||
`FilterRPC` this allows the node handler to determine how to handle a message as the Waku | ||||||
Filter protocol is not a request response based protocol but instead a push based system. | ||||||
|
||||||
The `requestId` MUST be a uniquely generated string. When a `MessagePush` is sent | ||||||
the `requestId` MUST match the `requestId` of the subscribing `FilterRequest` whose filters | ||||||
matched the message causing it to be pushed. | ||||||
|
||||||
#### FilterRequest | ||||||
|
||||||
A `FilterRequest` contains an optional topic, zero or more content filters and | ||||||
a boolean signifying whether to subscribe or unsubscribe to the given filters. | ||||||
True signifies 'subscribe' and false signifies 'unsubscribe'. | ||||||
|
||||||
A node that sends the RPC with a filter request and `subscribe` set to 'true' | ||||||
requests that the filter node SHOULD notify the light requesting node of messages | ||||||
matching this filter. | ||||||
|
||||||
A node that sends the RPC with a filter request and `subscribe` set to 'false' | ||||||
requests that the filter node SHOULD stop notifying the light requesting node | ||||||
of messages matching this filter if it is currently doing so. | ||||||
|
||||||
The filter matches when content filter and, optionally, a topic is matched. | ||||||
Content filter is matched when a `WakuMessage` `contentTopic` field is the same. | ||||||
|
||||||
A filter node SHOULD honor this request, though it MAY choose not to do so. If | ||||||
it chooses not to do so it MAY tell the light why. The mechanism for doing this | ||||||
is currently not specified. For notifying the light node a filter node sends a | ||||||
MessagePush message. | ||||||
|
||||||
Since such a filter node is doing extra work for a light node, it MAY also | ||||||
account for usage and be selective in how much service it provides. This | ||||||
mechanism is currently planned but underspecified. | ||||||
|
||||||
#### MessagePush | ||||||
|
||||||
A filter node that has received a filter request SHOULD push all messages that | ||||||
match this filter to a light node. These [`WakuMessage`'s](./waku-message.md) are likely to come from the | ||||||
`relay` protocol and be kept at the Node, but there MAY be other sources or | ||||||
protocols where this comes from. This is up to the consumer of the protocol. | ||||||
|
||||||
A filter node MUST NOT send a push message for messages that have not been | ||||||
requested via a FilterRequest. | ||||||
|
||||||
If a specific light node isn't connected to a filter node for some specific | ||||||
period of time (e.g. a TTL), then the filter node MAY choose to not push these | ||||||
messages to the node. This period is up to the consumer of the protocol and node | ||||||
implementation, though a reasonable default is one minute. | ||||||
## Filter-Subscribe | ||||||
|
||||||
A filter service node MUST support the _filter-subscribe_ protocol | ||||||
to allow filter clients to subscribe, modify, refresh and unsubscribe a desired set of filter criteria. | ||||||
The combination of different filter criteria for a specific filter client node is termed a "subscription". | ||||||
A filter client is interested in receiving messages matching the filter criteria in its registered subscriptions. | ||||||
|
||||||
Since a filter service node is consuming resources to provide this service, | ||||||
it MAY account for usage and adapt its service provision to certain clients. | ||||||
An incentive mechanism is currently planned but underspecified. | ||||||
|
||||||
### Filter Subscribe Request | ||||||
|
||||||
A client node MUST send all filter requests in a `FilterSubscribeRequest` message. | ||||||
This request MUST contain a `request_id`. | ||||||
The `request_id` MUST be a uniquely generated string. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What happens if an user sends two There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The latest request received would There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, qualifying here: the latest request received would be considered in addition to the parameters of the first. It's possible to build a subscription by sending multiple There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This would be undefined behaviour as the client would not follow the spec in this case. It is a MUST for the The IETF says: "Be strict in what you send but tolerant in what you accept" (I'm paraphrazing, the original was better 😅 ), but here, accepting the second filter would contradict the spec. We could add a line making this explicit: When receiving a second request with the same (oh, I wrote this in a not up-to-date browser window where your comments were not there yet @jm-clius 😅 ) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
True, but we've now removed |
||||||
Each request MUST include a `filter_subscribe_type`, indicating the type of request. | ||||||
|
||||||
### Filter Subscribe Response | ||||||
|
||||||
In return to any `FilterSubscribeRequest`, | ||||||
a filter service node SHOULD respond with a `FilterSubscribeResponse` with a `requestId` matching that of the request. | ||||||
This response MUST contain a `status_code` indicating if the request was successful or not. | ||||||
Successful status codes are in the `2xx` range. | ||||||
Client nodes SHOULD consider all other status codes as error codes and assume that the requested operation had failed. | ||||||
In addition, the filter service node MAY choose to provide a more detailed status description in the `status_desc` field. | ||||||
|
||||||
### Filter matching | ||||||
|
||||||
In the description of each request type below, | ||||||
the term "filter criteria" refers to the combination of `pubsub_topic` and a set of `content_topics`. | ||||||
The request MAY include filter criteria, conditional to the selected `filter_subscribe_type`. | ||||||
If the request contains filter criteria, | ||||||
it MUST contain a `pubsub_topic` | ||||||
and the `content_topics` set MUST NOT be empty. | ||||||
A `WakuMessage` matches filter criteria when its `content_topic` is in the `content_topics` set | ||||||
and it was published on a matching `pubsub_topic`. | ||||||
|
||||||
### Filter Subscribe Types | ||||||
|
||||||
The following filter subscribe types are defined: | ||||||
|
||||||
#### SUBSCRIBER_PING | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I still need to finish seeing the point of this message 🤔 In my head: If a client sends a subscribe message, with the server's response, it will be sure that there is a subscription. And additionally, we are already sending a "subscription refresh" request from the client. On the other hand, I see the value in updating and pruning the subscriptions if the subscriber does not respond to a server-sent ping. But this will overlap with the peer management and the connection's keep-alive mechanisms (e.g., libp2p's ping). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I was also unsure how useful this would be, so wanted to get the most common clients' perspective to understand if this has any value. (See e.g. point 4 in the description)
But the server provides no guarantees to maintain this subscription over time. The client's responsibility is to have some redundancy and a ping mechanism to ensure that its subscriptions remain active.
Indeed why I wasn't sure if the lightweight ping is necessary, but note that refreshing a complex subscription with many content topics will be expensive, whereas this envisions a much more frequent ping mechanism. It could be a special case of subscription refresh with empty filter criteria, but IMO that would be abusing the mechanism and I'd rather have something explicit.
This functions on a different level - it's not about whether there is an active connection. For filter we assume that connections may drop from time to time. For whichever reason, the service node may remove your subscription (e.g. due to a restart, reaching capacity, etc.) and the client needs a lightweight way to ping its subscription - not strictly to keep it alive, but to maintain a minimum set of subscriptions across multiple service nodes. |
||||||
|
||||||
A filter client that sends a `FilterSubscribeRequest` with `filter_subscribe_type` set to `SUBSCRIBER_PING` | ||||||
requests that the service node SHOULD indicate if it has any active subscriptions for this client. | ||||||
The filter client SHOULD exclude any filter criteria from the request. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What happens if a client sends the ping together with a subscription list? Should the node respond with a failure? Should it just ignore it? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point! Have specified in cd3e880. |
||||||
The filter service node SHOULD respond with a success code if it has any active subscriptions for this client | ||||||
or an error code if not. | ||||||
The filter service node SHOULD ignore any filter criteria in the request. | ||||||
|
||||||
#### SUBSCRIBE | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What happens if a node is not subscribed to a network (a.k.a. I think we should specify the "common" errors. I do not know where, but it is necessary so all the implementations use the same error codes. Something like IETF's RFC 9110: https://www.rfc-editor.org/rfc/rfc9110.html#name-status-codes There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree re specifying the common errors. I think this should likely be in a separate BCP RFC with a separate lifecycle (can be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are some failure scenarios that I find problematic:
In my opinion, these cases will influence the "subscription expiration timeout" in the server and the "subscription refresh cadence" in the client. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Indeed! Both valid scenarios which will happen. Within the filter protocol there is no mandated guarantee for filter service nodes to maintain, recreate or even just honour existing subscriptions from subscribers. Of course good filter service nodes will be implemented to at least honour existing subscriptions for as long as they can. This is why in the implementation section (follow-up PR) I will describe suggested methods for the client to ensure it maintains reliable filter subscriptions and most tools in this version of the protocol is designed to allow the client to do just that.
This of course still does not lead to a "fault-tolerant" or "reliable" filter protocol, which IMO should be a next step and could combine concepts from |
||||||
|
||||||
A filter client that sends a `FilterSubscribeRequest` with `filter_subscribe_type` set to `SUBSCRIBE` | ||||||
requests that the service node SHOULD push messages matching this filter to the client. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If a subscriber only specifies the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See my comment on the |
||||||
The filter client MUST include the desired filter criteria in the request. | ||||||
A client MAY use this request type to _modify_ an existing subscription | ||||||
by providing _additional_ filter criteria in a new request. | ||||||
Comment on lines
+162
to
+163
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe it is a bit nitpicking, but:
Suggested change
I would model every "pubsub_topic+content_topic" as an individual subscription. So a node, identified by its PeerId/NodeId, can have multiple transactions. You either add a new subscription or remove an existing subscription. There is no updating of the subscription. A subscription request with multiple IMO This simplifies the implementation a lot. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes! This is the plan in terms of how it will be implemented. After trying to settle on terminology I came up with the earlier stated:
Each filter criterion/transaction will indeed be modelled as a "pubsub_topic+content_topic" which can only be added or removed. I found that people most often spoke about that action as "updating my subscription", but am not set on the terminology. |
||||||
A client MAY use this request type to _refresh_ an existing subscription | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I understand that subscriptions have an expiration time in the server that should remove a subscription if the node has not refreshed it via Is that correct? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, within the filter service node the protocol does not mandate any management of subscriptions. It would be up to the client to ensure it maintains (redundant) subscriptions across filter service nodes. The implementation section (next) will recommend a maximum interval between ping for clients. Filter service nodes are specified so that they in general "keep as many subscriptions as they can, for as they long as their resources allow". The point being that a client should not expect its subscriptions to be reliable (we want to build a network where it is, of course, this is about specification). |
||||||
by providing _the same_ filter criteria in a new request. | ||||||
The filter service node SHOULD respond with a success code if it successfully honored this request | ||||||
or an error code if not. | ||||||
The filter service node SHOULD respond with an error code and discard the request | ||||||
if the subscribe request does not contain valid filter criteria, | ||||||
i.e. both a `pubsub_topic` _and_ a non-empty `content_topics` set. | ||||||
|
||||||
#### UNSUBSCRIBE | ||||||
|
||||||
A filter client that sends a `FilterSubscribeRequest` with `filter_subscribe_type` set to `UNSUBSCRIBE` | ||||||
requests that the service node SHOULD _stop_ pushing messages matching this filter to the client. | ||||||
The filter client MUST include the filter criteria it desires to unsubscribe from in the request. | ||||||
A client MAY use this request type to _modify_ an existing subscription | ||||||
by providing _a subset of_ the original filter criteria to unsubscribe from in a new request. | ||||||
The filter service node SHOULD respond with a success code if it successfully honored this request | ||||||
or an error code if not. | ||||||
Comment on lines
+172
to
+180
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What happens if I a client sends an There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, thanks for picking this up! It should definitely be clarified in the specs. My view is that Of course, LMK if you believe from client perspective if a different semantic would work better, but the main idea of Filter is to operate on content topics. |
||||||
The filter service node SHOULD respond with an error code and discard the request | ||||||
if the unsubscribe request does not contain valid filter criteria, | ||||||
i.e. both a `pubsub_topic` _and_ a non-empty `content_topics` set. | ||||||
|
||||||
#### UNSUBSCRIBE_ALL | ||||||
|
||||||
A filter client that sends a `FilterSubscribeRequest` with `filter_subscribe_type` set to `UNSUBSCRIBE_ALL` | ||||||
requests that the service node SHOULD _stop_ pushing messages matching _any_ filter to the client. | ||||||
The filter client SHOULD exclude any filter criteria from the request. | ||||||
The filter service node SHOULD remove any existing subscriptions for this client. | ||||||
It SHOULD respond with a success code if it successfully honored this request | ||||||
or an error code if not. | ||||||
|
||||||
## Filter-Push | ||||||
|
||||||
A filter client node MUST support the _filter-push_ protocol | ||||||
to allow filter service nodes to push messages matching registered subscriptions to this client. | ||||||
|
||||||
A filter service node SHOULD push all messages | ||||||
matching the filter criteria in a registered subscription | ||||||
to the subscribed filter client. | ||||||
Comment on lines
+199
to
+201
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should a client worry about service nodes that push messages when not subscribed to? The only way a client can protect itself would be to track the peer id of nodes it subscribed too, and ensure push messages are coming from those nodes. I think we should mention this possible attack vector in the spec. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Indeed! Very good point. Thanks. In the next PR I want to adhere to the RFC template, which includes a section on security, attack vectors, etc. |
||||||
These [`WakuMessage`s](./waku-message.md) are likely to come from [`11/WAKU2-RELAY`](https://rfc.vac.dev/spec/11/), | ||||||
but there MAY be other sources or protocols where this comes from. | ||||||
This is up to the consumer of the protocol. | ||||||
|
||||||
If a message push fails, | ||||||
the filter service node MAY consider the client node to be unreachable. | ||||||
If a specific filter client node is not reachable from the service node for a period of time, | ||||||
the filter service node MAY choose to stop pushing messages to the client and remove its subscription. | ||||||
This period is up to the service node implementation. | ||||||
We consider `1 minute` to be a reasonable default. | ||||||
|
||||||
### Message Push | ||||||
|
||||||
Each message MUST be pushed in a `MessagePush` message. | ||||||
Each `MessagePush` MUST contain one (and only one) `waku_message`. | ||||||
If this message was received on a specific `pubsub_topic`, | ||||||
it SHOULD be included in the `MessagePush`. | ||||||
A filter client SHOULD NOT respond to a `MessagePush`. | ||||||
Since the filter protocol does not include caching or fault-tolerance, | ||||||
this is a best effort push service with no bundling | ||||||
or guaranteed retransmission of messages. | ||||||
A filter client SHOULD verify that each `MessagePush` it receives | ||||||
originated from a service node where the client has an active subscription | ||||||
and that it matches filter criteria belonging to that subscription. | ||||||
|
||||||
--- | ||||||
# Future Work | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't finish to see why we need to split the protocol ID into two different protocols 🤔 In my head:
Based on that, it makes sense to announce (via libp2p's identify protocol) the filter protocol ID. A subscriber's peer managers will use the protocol ID to list this node as capable of managing filter subscriptions and pushing the subscriptions.
But what is the point of having a separate "receiver" protocol ID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand you correctly, this is because the "data path" requires a connection/stream being negotiated from the filter service node to the client who requested the subscription. This requires the client to have some protocol mounted which can be negotiated for this stream (where it is acting like a "service"). If this is simply the same as the subscribe protocol ID, the subscriber's peer manager will incorrectly list other subscribers (not capable of managing subscriptions/pushing) as possible service nodes. Not sure if this is what you meant?
I agree re similarities to gossipsub, but note that this is not currently a true "bidirectional" protocol. It does negotiate streams/connections in two directions, but other than gossipsub the capabilities of each "side" is not the same. For gossipsub each node acts as both a server and a client, whereas filter still has a server-side and client-side which needs to be differentiated.