This repository has been archived by the owner on Jun 14, 2024. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
docs: improved filter specification #562
docs: improved filter specification #562
Changes from 1 commit
9dca1ca
d2d80b6
4f43a25
29ae394
cd3e880
7c47eb0
12793c5
0f02d70
7f5cc79
d24b589
2c74eed
cb35b95
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the term "Wire format". It is more generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree! Note that this PR specifically avoids format/heading/layout changes so that only the suggested wire format change can be reviewed. Once done, next step would be to adhere to the template and add theory, implementation, etc. sections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean we can only have subscriptions for one pubsub topic for a given remote node?
I'd assume with topic sharding, a light client may need to subscribe to several pubsub topics (community topic + 1:1 topic at least) and if I understand correctly, any subscription request replaces the previous one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answered elsewhere, but adding here for completeness:
No, a client can subscribe to content topics within as many pubsub topics as it likes. The limitation here is only that each request can only subscribe to content topics belonging to the same, single pubsub topic. A client can simply send a subsequent request with content topics belonging to another pubsub topic if it's interested in those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this NOT be optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is only optional within the context of protobuf field design: some types of FilterSubscribeRequests (e.g. SUBSCRIBER_PING, UNSUBSCRIBE_ALL) SHOULD NOT have a populated pubsub_topic. Within the context of the protocol itself this field is conditionally mandatory (which would require optional in the protobuf), e.g. the pubsub_topic MUST be populated, if the filter_subscribe_type is SUBSCRIBE or UNSUBSCRIBE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should write a recommended maximum number of content topics per subscription request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree and good point! Will add that to the implementation suggestions in a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this recommendation may lead to issue:
/content/topic/1
to Bob/content/topic/2
to BobIf only one subscription is to be maintain, what should Alice do?
A. unsubscribe and re-subscribe for both content topic?
B. update the subscription to add the new content topic?
A may lead to missed messages
B does not seem possible from the current spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see the confusion that the current phrasing may lead to. B is indeed possible and the recommended way of doing it.
Note that:
And this can be modified as you go along by subscribing and unsubscribing from content topics.
The idea is that a service node should not maintain multiple sets of such filter criteria for the same client. This used to be the case with the previous spec, where the same client could register the same filter criteria several times at the same service node, each time using a different request ID. I think I should remove this confusing wording.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this line in 29ae394 as it didn't add to the functionality and would just confuse readers. The point should be that any client can modify their filter criteria as many times as they want, adding and removing content topics on the fly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry my initial comment was not valid, I intended to delete it as I read the rest of the specs. I thin it's clear that Alice should update her subscription as I finished to read the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if an user sends two
FilterSubscribeRequest
with the samerequest_id
? Would that generate an error? or would the latest request received overwrite the subscription parameters of the first one?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest request received would
overwrite the subscription parameters of the firstbe considered in addition to the previous one - if itSUBSCRIBE
s to a different set of content topics as the first, the client will now be subscribed to the content topics of both requests. We suggest now that service nodes only keep one subscription per peer (i.e. perpeerId
) and simply service requests as they come in (without keeping anyrequestId
state - just making sure responses use the samerequestId
).Reusing a
request_id
is now a spec violation on the side of the client, so the client would have to deal with the ambiguity of two responses on the samerequest_id
(what if one fail and one succeeds?). It could then use the refresh workflow to ensure the desired filter is installed (by just repeating its SubscribeRequest).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, qualifying here: the latest request received would be considered in addition to the parameters of the first. It's possible to build a subscription by sending multiple
FilterSubscribeRequest
s variously adding and removingcontent_topic
s. This is an NB change - namely,SUBSCRIBE
no longer replaces the entire subscription, it simply subscribes a client to the included filter criteria in addition to what it's already subscribed to.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be undefined behaviour as the client would not follow the spec in this case. It is a MUST for the
request_id
to be unique.Imo, it would be most consistent if the implementation simply dropped the second request with the same id.
The IETF says: "Be strict in what you send but tolerant in what you accept" (I'm paraphrazing, the original was better 😅 ), but here, accepting the second filter would contradict the spec.
We could add a line making this explicit:
When receiving a second request with the same
request_id
, the request is silently dropped.(oh, I wrote this in a not up-to-date browser window where your comments were not there yet @jm-clius 😅 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but we've now removed
requestId
as some identifier for the subscription and we don't (want to) keep track ofrequestId
s in the service node.SUBSCRIBE
now simply means subscribing to the included filter criteria (not replacing/removing any existing subscription - just extending it). TherequestId
, from the service node perspective, now simply ties the response to the request and is discarded directly afterwards by the service node. If it receives a subsequentSUBSCRIBE
with the samerequestId
it will simply try to honor it with no reference to previousrequestId
s.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should be more restrictive in the spec and make the
pubsub_topic
mandatory. So, there should be always apubsub_topic
and a list ofcontent_topic
.// Filter criteria string pubsub_topic = 10; repeated string content_topics = 11;
In a scenario where we apply sharding to the Waku relay networks, a node will be subscribed to "several" pub-sub topics. If the gateway node subscribes to a network with the same content topic as our network of interest, the user may receive mixed or duplicated messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree with you.
This spec was written when there was still ambiguity on whether a content topic strictly match to only one pubsub topic or could span several shards (in which case the mixed/duplicate messages would be desired).
I think we're luckily now moving in a direction where we can see a content topic as existing only within the context of a specific pubsub topic (and in future may even abstract the underlying pubsub topic away). In other words, we can see the
(pubsubTopic, contentTopic)
vector as the minimal unique filter criterion. Pinging @kaiserd to double check direction here and will then update spec accordingly (needs changes in a couple of places).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still need to finish seeing the point of this message 🤔
In my head: If a client sends a subscribe message, with the server's response, it will be sure that there is a subscription. And additionally, we are already sending a "subscription refresh" request from the client.
On the other hand, I see the value in updating and pruning the subscriptions if the subscriber does not respond to a server-sent ping. But this will overlap with the peer management and the connection's keep-alive mechanisms (e.g., libp2p's ping).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was also unsure how useful this would be, so wanted to get the most common clients' perspective to understand if this has any value. (See e.g. point 4 in the description)
But the server provides no guarantees to maintain this subscription over time. The client's responsibility is to have some redundancy and a ping mechanism to ensure that its subscriptions remain active.
Indeed why I wasn't sure if the lightweight ping is necessary, but note that refreshing a complex subscription with many content topics will be expensive, whereas this envisions a much more frequent ping mechanism. It could be a special case of subscription refresh with empty filter criteria, but IMO that would be abusing the mechanism and I'd rather have something explicit.
This functions on a different level - it's not about whether there is an active connection. For filter we assume that connections may drop from time to time. For whichever reason, the service node may remove your subscription (e.g. due to a restart, reaching capacity, etc.) and the client needs a lightweight way to ping its subscription - not strictly to keep it alive, but to maintain a minimum set of subscriptions across multiple service nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if a client sends the ping together with a subscription list? Should the node respond with a failure? Should it just ignore it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Have specified in cd3e880.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if a node is not subscribed to a network (a.k.a.
pubsub_topic
)? Does the node receive an error? Which error code?I think we should specify the "common" errors. I do not know where, but it is necessary so all the implementations use the same error codes. Something like IETF's RFC 9110: https://www.rfc-editor.org/rfc/rfc9110.html#name-status-codes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree re specifying the common errors. I think this should likely be in a separate BCP RFC with a separate lifecycle (can be
draft
and extended while this one isstable
. This RFC aims to be as minimal as possible in order to define the smallest set of rules that allow for stable operation.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some failure scenarios that I find problematic:
In my opinion, these cases will influence the "subscription expiration timeout" in the server and the "subscription refresh cadence" in the client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed! Both valid scenarios which will happen.
Within the filter protocol there is no mandated guarantee for filter service nodes to maintain, recreate or even just honour existing subscriptions from subscribers. Of course good filter service nodes will be implemented to at least honour existing subscriptions for as long as they can. This is why in the implementation section (follow-up PR) I will describe suggested methods for the client to ensure it maintains reliable filter subscriptions and most tools in this version of the protocol is designed to allow the client to do just that.
e.g.
n
degree of redundancy (tradeoff bandwidth vs reliability)This of course still does not lead to a "fault-tolerant" or "reliable" filter protocol, which IMO should be a next step and could combine concepts from
store
and acknowledgements of message UID etc.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a subscriber only specifies the
pubsub_topic
and nocontent_topic
, would they receive all messages that match the pubsub topic regardless of the content type?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment on the
UNSUBSCRIBE
above. In short, pubsub topic is only seen as a qualifier for the content topic. In other words, having no content topics SHOULD be treated as a an empty SUBSCRIBE and SHOULD be ignored. This indeed needs to be clarified.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it is a bit nitpicking, but:
I would model every "pubsub_topic+content_topic" as an individual subscription. So a node, identified by its PeerId/NodeId, can have multiple transactions. You either add a new subscription or remove an existing subscription. There is no updating of the subscription.
A subscription request with multiple
content_topics
is a convenient way of batching multiple subscriptions in a single request without sending multiple requests, one percontent_topic
.IMO This simplifies the implementation a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! This is the plan in terms of how it will be implemented.
After trying to settle on terminology I came up with the earlier stated:
Each filter criterion/transaction will indeed be modelled as a "pubsub_topic+content_topic" which can only be added or removed. I found that people most often spoke about that action as "updating my subscription", but am not set on the terminology.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that subscriptions have an expiration time in the server that should remove a subscription if the node has not refreshed it via
SUBSCRIBE
within the time.Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, within the filter service node the protocol does not mandate any management of subscriptions. It would be up to the client to ensure it maintains (redundant) subscriptions across filter service nodes. The implementation section (next) will recommend a maximum interval between ping for clients. Filter service nodes are specified so that they in general "keep as many subscriptions as they can, for as they long as their resources allow". The point being that a client should not expect its subscriptions to be reliable (we want to build a network where it is, of course, this is about specification).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if I a client sends an
UNSUBSCRIBE
request containing apubsub_topic
and nocontent_topics
? (since arepeated
field can contain 0 items)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks for picking this up! It should definitely be clarified in the specs. My view is that
SUBSCRIBE
andUNSUBSCRIBE
(in fact the idea of subscriptions) should be related to content topics, qualified by a (perhaps now mandatory) pubsub topic. That means that only content topics will be considered in subscribe and unsubscribe requests. Not having any content topics will be seen as an empty unsubscribe request and SHOULD be ignored.Of course, LMK if you believe from client perspective if a different semantic would work better, but the main idea of Filter is to operate on content topics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should a client worry about service nodes that push messages when not subscribed to?
This could be the case if for example a content or pubsub topic is RLN protected because processing of messages is costly.
To DDOS a client, a node could unsolicited messages, even tho the client never subscribed to this service node.
The only way a client can protect itself would be to track the peer id of nodes it subscribed too, and ensure push messages are coming from those nodes.
This may also be needed done the line for some scoring, to keep track of the reliability of a node.
I think we should mention this possible attack vector in the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed! Very good point. Thanks. In the next PR I want to adhere to the RFC template, which includes a section on security, attack vectors, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How should an implementation manage delivery failures? Should it drop the subscription? Should it retry to push every message?
I think that defining a push failure strategy is a complex scenario that the spec cannot leave open to the implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I've added this in 7c47eb0.
The idea is to specify this is in such a way that it does not preclude service nodes from attempting retransmissions, but does not build a dependency on this mechanism into the minimal protocol definition either.
In addition we should provide a BCP where we're opinionated on implementation. Good catch!