Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split sid into two fields, use map for vector clock #206

Merged
merged 1 commit into from
Sep 10, 2024

Conversation

richardhuaaa
Copy link
Contributor

It's slowly becoming more apparent that the sid concept isn't very useful compared to having separate node_id and sequence_id fields:

  1. We've removed gateway sequence ID's, so there's much less potential for confusion between local and remote sequence ID's
  2. We're using a separate uint32 node ID field in various places already, including target_originator, publisher_node_id and inside EnvelopesQuery
  3. Updating and manipulating vector clocks are now a lot more important on both server and client, and using a map<node_id, sequence_id> type is more ergonomic compared to a list of sids.

Protobufs use a varint under the hood, so there shouldn't be much size difference, exactly as @neekolas pointed out originally. This should also make the spec easier to understand and explain, and gives us more bits to work with for both node ID's and sequence ID's.

One con:

  • With the current server implementation using postgres, which only understands int32 and int64, there is some potential for error converting between signed and unsigned types. I'll make sure to add server code to check for overflows, with the idea that we would be migrating long before we use up the entire bit-space.

@richardhuaaa richardhuaaa requested a review from a team as a code owner September 10, 2024 19:06
Copy link
Collaborator

@neekolas neekolas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense. The bitmasks were neat, but if we don't need them life is simpler.

@richardhuaaa richardhuaaa merged commit 9fffa2b into main Sep 10, 2024
4 checks passed
@richardhuaaa richardhuaaa deleted the rich/vector-clock-map branch September 10, 2024 19:42
Copy link

🎉 This PR is included in version 3.69.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

richardhuaaa added a commit to xmtp/xmtpd that referenced this pull request Sep 11, 2024
- Modify server logic for queries and subscriptions to support vector
clock cursors
- Implement the changes from xmtp/proto#206,
including splitting SID into two fields, using a dict for the vector
clock, and modifying node ID from uint16 to uint32

I was worried about type-conversions between int32 and uint32 and
overflow issues, but it looks like with two's complement, [overflowed
int32's behave the same as uint32's after
conversion](https://go.dev/play/p/pQmLPP56nzx) (or in other words, the
increment operation does exactly the same thing to the underlying byte
representation either way). In other words, our postgres schema can only
support signed int32's, but it seems like we can just pretend they are
uint32's and have the same effect.

We do still need to worry about the postgres sequence overflowing, as
well as the max value of uint32/uint64 - when implementing the
node-to-node syncing logic, my plan is to regularly log the node's
vector clock, so that we can see it coming from a long way away.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants