Split sid into two fields, use map for vector clock #206

richardhuaaa · 2024-09-10T19:06:06Z

It's slowly becoming more apparent that the sid concept isn't very useful compared to having separate node_id and sequence_id fields:

We've removed gateway sequence ID's, so there's much less potential for confusion between local and remote sequence ID's
We're using a separate uint32 node ID field in various places already, including target_originator, publisher_node_id and inside EnvelopesQuery
Updating and manipulating vector clocks are now a lot more important on both server and client, and using a map<node_id, sequence_id> type is more ergonomic compared to a list of sids.

Protobufs use a varint under the hood, so there shouldn't be much size difference, exactly as @neekolas pointed out originally. This should also make the spec easier to understand and explain, and gives us more bits to work with for both node ID's and sequence ID's.

One con:

With the current server implementation using postgres, which only understands int32 and int64, there is some potential for error converting between signed and unsigned types. I'll make sure to add server code to check for overflows, with the idea that we would be migrating long before we use up the entire bit-space.

neekolas

It makes sense. The bitmasks were neat, but if we don't need them life is simpler.

github-actions · 2024-09-10T19:44:28Z

🎉 This PR is included in version 3.69.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

- Modify server logic for queries and subscriptions to support vector clock cursors - Implement the changes from xmtp/proto#206, including splitting SID into two fields, using a dict for the vector clock, and modifying node ID from uint16 to uint32 I was worried about type-conversions between int32 and uint32 and overflow issues, but it looks like with two's complement, [overflowed int32's behave the same as uint32's after conversion](https://go.dev/play/p/pQmLPP56nzx) (or in other words, the increment operation does exactly the same thing to the underlying byte representation either way). In other words, our postgres schema can only support signed int32's, but it seems like we can just pretend they are uint32's and have the same effect. We do still need to worry about the postgres sequence overflowing, as well as the max value of uint32/uint64 - when implementing the node-to-node syncing logic, my plan is to regularly log the node's vector clock, so that we can see it coming from a long way away.

feat: split sid into two fields, use map for vector clock

db28f6d

richardhuaaa requested review from neekolas and mkysel September 10, 2024 19:06

richardhuaaa requested a review from a team as a code owner September 10, 2024 19:06

neekolas approved these changes Sep 10, 2024

View reviewed changes

richardhuaaa merged commit 9fffa2b into main Sep 10, 2024
4 checks passed

richardhuaaa deleted the rich/vector-clock-map branch September 10, 2024 19:42

github-actions bot added the released label Sep 10, 2024

richardhuaaa mentioned this pull request Sep 10, 2024

Make server logic vector clock-aware xmtp/xmtpd#153

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split sid into two fields, use map for vector clock #206

Split sid into two fields, use map for vector clock #206

richardhuaaa commented Sep 10, 2024

neekolas left a comment

github-actions bot commented Sep 10, 2024

Split sid into two fields, use map for vector clock #206

Split sid into two fields, use map for vector clock #206

Conversation

richardhuaaa commented Sep 10, 2024

neekolas left a comment

Choose a reason for hiding this comment

github-actions bot commented Sep 10, 2024