-
Notifications
You must be signed in to change notification settings - Fork 14
Calculating ID of the messages for status-go and deterministic WakuMessage
bytes
#563
Comments
This looks fragile since it's user-set, so how do you handle duplicates (malicious) etc, it becomes a bit meaningless to have a UUID that anyone can set to anything, since you can't make any real decision on it based on uniqueness, and if you do, then timing attacks are possible etc. 3 sounds like the safest solution, 1 is also a solution that has definitely been used before, and if we don't expect the protocol to change much, then it's most likely safe |
…Message protobuffer we are sending See vacp2p/rfc#563
…Message protobuffer we are sending See vacp2p/rfc#563
We ran into one of the issues described above: In status-go when I build the message, it has this format (I'm using json just so it's easy to read) {
"payload":"some_payload",
"contentTopic":"90b65333bbb741b063375c346d27b4cc67ca7006",
"timestamp":1671558324044743455,
"version":2
} However when I attempted to retrieve the message from the store node, the WakuMessage arrived with this format {
"payload":"some_payload",
"contentTopic":"90b65333bbb741b063375c346d27b4cc67ca7006",
"timestamp":1671558324044743455,
"rate_limit_proof":{} <--
} Notice how there's a |
…Message protobuffer we are sending See vacp2p/rfc#563
Thanks, @richard-ramos, for such an elaborate problem description and solutions 💯 Let me list here the actions items that I extract:
Once the message ID RFC work starts, I think that we should move the discussion to the WAKU-MESSAGE-ID RFC PR. |
Sorry for replying so late. I'll try to elaborate a bit more on 2. The idea is to rename The need for a random
The idea is to identify messages with a key derived as
Note that one of the benefits of this solution is that is already implemented in Noise (requires some minor tweaks). Indeed, there, the field In the unauthenticated case, instead, an attacker in-the-middle could change the tag and the payload to whatever it wants, but this is expected when cryptography is not employed. Collisions in the |
…Message protobuffer we are sending See vacp2p/rfc#563
@fryorcraken I'd suggest we close this issue as part of the 10K epic, given that deterministic message hashing solved the immediate requirement. Future plans to include a Message UID in messages, also for distributed store sync, is tracked in waku-org/pm#9 |
Let's close. MUID logic has been defined and implemented and used in store implementation to remove dupe messages. |
In the status-go we need an unique identifier for each
WakuMessage
we receive, to determine whether we have seen this message before or not to attempt to decrypt it and store it into the DB. Currently it's being calculated as this:This ends up generating a 32 bytes hash that could be used as an ID, however, @cammellos pointed out a fundamental flaw with this approach. Protobuffer serialization is not deterministic (It's not part of the protocol):
In WakuV1 envelopes (aka the protocol messages), this is not a problem since the IDs are calculated only once after the message is dispatched, and if we need to retrieve this message from a mailserver, we can be sure that the ID will be the same because the mailserver returns the raw bytes of the envelope it received, so calculating the sha256 over these raw bytes will generate the same hash obtained when sending the messages.
In WakuV2 this can be a problem, since the hash of the protobuffer at the moment of sending the message might not match the hash of the protobuffer returned by the storenodes, which is a real possibility as we have already seen before differences between Nim / JS and Go protobuffer encoding.
To solve this issue I see some approaches we could follow, and would like your feedback on these and additional ideas or solutions
1. Calculate the ID based on the content of the message:
This is the easiest solution. It could make the protocol not upgradable, in case a field is added or removed. However, this might not be a big problem as I wouldn't expect the
WakuMessage
protobuffer to get rid of these fields anytime soon, but it is still a possibility to take into consideration2. Generate a UUID.
In the
WakuMessage
we could add an optionalUUID
attribute. This is generated by the author of theWakuMessage
and should be inserted in the storenode database along with the other fields. An advantage of having theUUID
field defined (for waku users in general, not only for status) could be that if noTimestamp
is defined, we could require that anUUID
value to be defined, and that way we ensure that there is no duplicated messages stored in the database. @s1fr0 can expand on this idea as he is the original author.2. Have the store, lightpush and filter protocol use raw message bytes instead
Currently our protocols handles
WakuMessages
like this:WakuMessage
object, and then inserts the attributes of this message into the database. When a client does a history request, the storenode executes DB query, and createsWakuMessage
instances for each of records returned by the query, and send them back as a response.WakuMessage
, then create aMessagePush
with thisWakuMessage
before sending it to the subscribersWakuMessage
before broadcasting it via relay.Since in all these protocols we are doing unmarshalling of the serialized
WakuMessage
, we are potentially creating different results when we marshal them again, either due to difference between protobuffer encoding across programming languages, or because theWakuMessage
protobuffer version might be different across nodes. This happens with RLN: We are not storing the fields related to the RLN proof in the database; and when using filter full nodes or lightpush nodes that do not have RLN enabled, they will not forward the proof of the WakuMessages to the subscribers, or broadcast it via relay. This is a real problem we saw while working with js-rln when some nodes were running zerokit and others were running kilic/rln (@fryorcraken and @s1fr0 probably remember this case).I believe that nodes should not modify the WakuMessages: the bytes of a message should be the same across all peers, and the way the protocols work right now allow the messages to be modified either by ignoring attributes due to difference of version / features, or due to marshalling. With this goal in mind, I propose the following solution:
In Store protocol:
payload
column that stores only thepayload
attribute of theWakuMessage
, to instead have a BLOBdata
column which would store the full raw bytes of theWakuMessages
as they are received in the storenode without them being decoded. This has a cost in space of course.WakuMessages
, we returnbytes
, which would be obtained from the newdata
column.This also has the advantage of moving the marshalling responsibility from the storenodes to the clients. This has been identified as a bottleneck in WakuV1 before (See here), and I'd expect the same to happen with V2, so with this change there should be a performance improvement
In Filter protocol:
waku_message
in theMessagePush
protobuffer also tobytes
. This will contain the raw waku message as it is received in relay, without decoding it.In Lightpush protocol:
message
in thePushRequest
protobuffer also tobytes
. This will contain the raw waku message as it is received in the lightpush request, without decoding it.Any of these solutions could work for the problem we have in status-go, as it would be able to deterministically calculate the message ID, however I believe that solution 3 should be implemented, specially now that newer versions of Store waku-org/waku-proto#10 and Filter protocol #562 are being worked on, since doing these changes mean that we shouldn't have to worry about nodes having different versions of
WakuMessage
protobuffer that could introduce or remove attributes, since we'd be handling the raw bytes of the message.cc: @fryorcraken, @jm-clius, @LNSD, @cammellos
The text was updated successfully, but these errors were encountered: