Validate node ID, sequence ID and timestamp on synced payloads #311

richardhuaaa · 2024-12-06T17:23:15Z

This adds initial validation on synced payloads.

Still to be done:

Misbehavior reporting system
Payload-specific validation

#302

Summary by CodeRabbit

New Features
- Enhanced query logic to ensure unique results for originator nodes.
- Added a method to access OriginatorNs value in the OriginatorEnvelope struct.
- Improved stream management and envelope processing in the sync worker.
Bug Fixes
- Refined error handling for stream connections and envelope validation.
Refactor
- Restructured stream and envelope handling to improve data encapsulation and processing logic.

coderabbitai · 2024-12-06T17:23:24Z

Walkthrough

The pull request introduces modifications to SQL queries and their corresponding Go functions, primarily focusing on the SelectVectorClock query in pkg/db/queries.sql and pkg/db/queries/queries.sql.go. Key changes include the addition of a DISTINCT ON clause to ensure unique results based on originator_node_id and updates to the SelectVectorClockRow struct to accommodate new fields. Additionally, a new method is added to the OriginatorEnvelope struct, and significant restructuring occurs in the syncWorker to improve stream handling and envelope processing.

Changes

File Path	Change Summary
`pkg/db/queries.sql`	- Modified `SelectVectorClock` query to use `DISTINCT ON` for unique `originator_node_id` values and added ordering. - Updated `SelectStagedOriginatorEnvelopes` to include an `ORDER BY` clause.
`pkg/db/queries/queries.sql.go`	- Updated `SelectVectorClock` function to include `OriginatorEnvelope` in results. - Adjusted `SelectVectorClockRow` struct to include new field.
`pkg/envelopes/originator.go`	- Added method `OriginatorNs()` to `OriginatorEnvelope` struct for accessing `OriginatorNs` value.
`pkg/sync/syncWorker.go`	- Introduced `originatorStream` type for stream connection details. - Updated methods to utilize `originatorStream`, enhancing envelope validation and insertion logic.

Possibly related PRs

Send envelopes in batches #253: The changes in the main PR regarding the SelectVectorClock query and its handling of OriginatorEnvelope are related to the modifications in the subscribeWorker.go file, which also deals with the OriginatorEnvelope type and its processing in the context of sending envelopes.

Suggested reviewers

mkysel
neekolas

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

richardhuaaa · 2024-12-06T17:23:34Z

Validate node ID, sequence ID and timestamp on synced payloads #311 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

mkysel · 2024-12-06T17:29:27Z

pkg/sync/syncWorker.go

+	var lastSequenceID uint64 = 0
+	var lastNs int64 = 0
+	if stream.lastEnvelope != nil {
+		lastSequenceID = stream.lastEnvelope.OriginatorSequenceID()
+		lastNs = stream.lastEnvelope.OriginatorNs()
+	}
+	if env.OriginatorSequenceID() != lastSequenceID+1 || env.OriginatorNs() < lastNs {
+		// TODO(rich) Submit misbehavior report and continue
+		s.log.Error("Received out of order envelope")
+	}
+
+	if env.OriginatorSequenceID() > lastSequenceID {
+		stream.lastEnvelope = env
+	}


do we check this anywhere at stream startup? What if we rebuild the stream after network failure?

If you look at where the originatorStream is constructed, we are fetching the last payload out of the database before stream construction

coderabbitai

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (6)

pkg/sync/syncWorker.go (3)
36-41: Consider adding documentation comments for the new originatorStream struct

The newly introduced originatorStream struct encapsulates stream-related data. Adding a comment explaining its purpose and usage would improve code readability and maintainability.

Line range hint 280-320: Handle potential nil pointer dereference in setupStream

In the setupStream function, when iterating over result to assign lastEnvelope, there is no check for a nil lastEnvelope. If envUtils.NewOriginatorEnvelopeFromBytes returns an error, lastEnvelope could be nil, potentially leading to a nil pointer dereference later.

Consider adding error handling to address this issue:
 originatorStream := &originatorStream{nodeID: nodeID, stream: stream}
 for _, row := range result {
     if uint32(row.OriginatorNodeID) == nodeID {
         lastEnvelope, err := envUtils.NewOriginatorEnvelopeFromBytes(row.OriginatorEnvelope)
         if err != nil {
             return nil, err
         }
+        if lastEnvelope == nil {
+            s.log.Error("Last envelope is nil")
+            continue
+        }
         originatorStream.lastEnvelope = lastEnvelope
     }
 }
Line range hint 324-339: Handle errors returned by validateAndInsertEnvelope

In the listenToStream function, the validateAndInsertEnvelope method is called, but any errors it returns are not handled. This could lead to silent failures during envelope validation and insertion.

Consider modifying the code to handle errors:
 for _, env := range envs.Envelopes {
-    s.validateAndInsertEnvelope(stream, env)
+    err := s.validateAndInsertEnvelope(stream, env)
+    if err != nil {
+        s.log.Error("Failed to validate and insert envelope", zap.Error(err))
+        // Decide whether to continue or return the error
+    }
 }
pkg/envelopes/originator.go (1)
61-64: Add documentation for the new OriginatorNs method

The OriginatorNs method provides access to the originator's timestamp but lacks a comment explaining its purpose. Adding documentation would enhance code readability.

Consider adding a comment:
// OriginatorNs returns the timestamp of the originator envelope in nanoseconds.
func (o *OriginatorEnvelope) OriginatorNs() int64 {
    return o.UnsignedOriginatorEnvelope.OriginatorNs()
}
pkg/db/queries/queries.sql.go (2)
342-356: Update struct tags for SelectVectorClockRow

The SelectVectorClockRow struct now includes the OriginatorEnvelope field but lacks struct tags that might be necessary for JSON marshalling or other operations. Ensure consistency with other structs.

Consider adding struct tags if applicable:
type SelectVectorClockRow struct {
    OriginatorNodeID     int32  `json:"originator_node_id"`
    OriginatorSequenceID int64  `json:"originator_sequence_id"`
    OriginatorEnvelope   []byte `json:"originator_envelope"`
}
Line range hint 368-374: Handle potential errors when scanning rows

In the SelectVectorClock method, ensure that all errors during row scanning are handled. Currently, if rows.Scan returns an error, it immediately returns, but there might be resources to clean up or additional logging needed.

Consider enhancing error handling:
for rows.Next() {
    var i SelectVectorClockRow
    if err := rows.Scan(&i.OriginatorNodeID, &i.OriginatorSequenceID, &i.OriginatorEnvelope); err != nil {
        rows.Close()
        s.log.Error("Failed to scan row", zap.Error(err))
        return nil, err
    }
    items = append(items, i)
}

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 82f38f2 and e832958.

⛔ Files ignored due to path filters (1)

pkg/proto/mls/message_contents/content.pb.go is excluded by !**/*.pb.go, !pkg/proto/**

📒 Files selected for processing (4)

pkg/db/queries.sql (1 hunks)
pkg/db/queries/queries.sql.go (2 hunks)
pkg/envelopes/originator.go (1 hunks)
pkg/sync/syncWorker.go (8 hunks)

🔇 Additional comments (1)

pkg/db/queries.sql (1)

49-57: Verify the impact of removing the GROUP BY clause

The previous version of the SelectVectorClock query used GROUP BY to aggregate sequence IDs. Changing to DISTINCT ON affects how duplicates are handled. Ensure that this modification does not introduce unintended side effects.

Run the following script to compare results between the old and new queries:

✅ Verification successful

Let me analyze the SQL queries statically since we can't execute them in the sandbox environment.

Based on the code context found, I can now provide a definitive response about the query change.

The change from GROUP BY to DISTINCT ON is safe and functionally equivalent

The new query using DISTINCT ON with ORDER BY originator_sequence_id DESC is actually more precise than the old version as it:

Maintains the same grouping by originator_node_id
Ensures we get the highest sequence ID per node (via ORDER BY ... DESC)
Additionally preserves the originator_envelope data which is used in ToVectorClock conversion

Evidence from the codebase:

The query results are used in ToVectorClock (pkg/db/types.go) which only needs the latest sequence ID per node
The syncWorker.setupStream and subscribeWorker implementations rely on this vector clock for synchronization
The ON CONFLICT DO NOTHING in envelope insertion (pkg/db/queries/queries.sql.go) ensures data consistency

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Compare results of the old and new SelectVectorClock queries.

# Test: Execute both queries and diff the results. Expect: Results should match in terms of latest sequence IDs per node.

# Old query
psql -c "
SELECT
    originator_node_id,
    max(originator_sequence_id)::BIGINT AS originator_sequence_id
FROM
    gateway_envelopes
GROUP BY
    originator_node_id;
" > old_query_results.txt

# New query
psql -c "
SELECT DISTINCT ON (originator_node_id)
    originator_node_id,
    originator_sequence_id
FROM
    gateway_envelopes
ORDER BY
    originator_node_id,
    originator_sequence_id DESC;
" > new_query_results.txt

# Compare results
diff old_query_results.txt new_query_results.txt

Length of output: 610

Script:

#!/bin/bash
# Let's check if there are any tests or documentation that might help us understand the behavior
rg -g '!*.sql' "SelectVectorClock|gateway_envelopes" -A 5 -B 5

Length of output: 5578

pkg/sync/syncWorker.go

pkg/db/queries.sql

mkysel · 2024-12-06T17:30:47Z

pkg/db/queries.sql

@@ -46,13 +46,15 @@ DELETE FROM staged_originator_envelopes
 WHERE id = @id;

 -- name: SelectVectorClock :many
-SELECT
+SELECT DISTINCT ON (originator_node_id)


can you help me understand this change?

We are now returning the last envelope as well, so using GROUP BY is not going to work. What we want is one row per originator_node_id, not just the largest sequence ID value.

There's a good section on DISTINCT ON in https://www.postgresql.org/docs/current/sql-select.html if that helps!

neekolas

I’m going to suggest we merge this after we turn off auto-deploy to the testnet, given how big the change is.

richardhuaaa · 2024-12-18T20:38:13Z

The golang race test seems to be failing on main (unrelated to this PR), can investigate that separately

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

pkg/sync/syncWorker.go (1)

358-371: Enhance validation error handling and logging.

The sequence ID and timestamp validation is correct, but the error handling could be more informative.

Consider this improvement:

 var lastSequenceID uint64 = 0
 var lastNs int64 = 0
 if stream.lastEnvelope != nil {
     lastSequenceID = stream.lastEnvelope.OriginatorSequenceID()
     lastNs = stream.lastEnvelope.OriginatorNs()
 }
-if env.OriginatorSequenceID() != lastSequenceID+1 || env.OriginatorNs() < lastNs {
-    // TODO(rich) Submit misbehavior report and continue
-    s.log.Error("Received out of order envelope")
+if env.OriginatorSequenceID() != lastSequenceID+1 {
+    s.log.Error("Sequence ID mismatch",
+        zap.Uint64("expected", lastSequenceID+1),
+        zap.Uint64("received", env.OriginatorSequenceID()))
+    // TODO(rich) Submit misbehavior report and continue
+} else if env.OriginatorNs() < lastNs {
+    s.log.Error("Timestamp regression",
+        zap.Int64("lastNs", lastNs),
+        zap.Int64("receivedNs", env.OriginatorNs()))
+    // TODO(rich) Submit misbehavior report and continue
 }

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e832958 and 4ff914e.

📒 Files selected for processing (1)

pkg/sync/syncWorker.go (7 hunks)

🧰 Additional context used

📓 Learnings (1)

pkg/sync/syncWorker.go (1)

Learnt from: richardhuaaa
PR: xmtp/xmtpd#311
File: pkg/sync/syncWorker.go:379-394
Timestamp: 2024-12-06T17:51:56.809Z
Learning: In `pkg/sync/syncWorker.go`, the `insertEnvelope` function's database insert operation is intended to be idempotent; it should do nothing if a duplicate already exists.

🔇 Additional comments (3)

pkg/sync/syncWorker.go (3)

36-40: LGTM! Well-structured stream encapsulation.

The originatorStream struct effectively encapsulates stream-related data and state, providing a clean abstraction for managing node-specific stream information.

Line range hint 279-319: LGTM! Robust stream initialization with proper state management.

The method effectively:

Initializes the vector clock from the database
Sets up the subscription with correct node filtering
Properly initializes the lastEnvelope state for sequence validation

Line range hint 378-401: LGTM! Well-implemented idempotent envelope insertion.

The method properly handles:

Envelope serialization
Database insertion
Error cases
Duplicate detection (intentionally idempotent as per requirements)

pkg/sync/syncWorker.go

neekolas · 2024-12-18T21:31:04Z

pkg/sync/syncWorker.go

+	for _, row := range result {
+		if uint32(row.OriginatorNodeID) == nodeID {
+			lastEnvelope, err := envUtils.NewOriginatorEnvelopeFromBytes(row.OriginatorEnvelope)
+			if err != nil {


Are there cases where we would actually want to continue here? I worry about an invalid ClientEnvelope blocking replication forever because it errors.

Maybe it's the originator's fault for not validating properly before putting it in the DB and we should stop? Would only be an issue if the validation rules changed.

Just want to make sure we've considered all the failure modes.

Yes, it's a tricky call. I think it's similar to the general programming question of whether we should assert things that should always be true. I'm a bit more in the camp of a simpler programming model where things fail if our assumptions are violated, and we fix them - there are also other places in the codebase that rely on the DB deserialization working (e.g. when a query is made) and it would be preferable not to mask them in our responses to clients.

Maybe it's the originator's fault for not validating properly before putting it in the DB and we should stop? Would only be an issue if the validation rules changed.

Yes, I think the assumption is that the originator should have serialized it properly before writing to the DB. Note we're not doing any validation on the envelope here - we're just checking that it can be deserialized. So it shouldn't be a problem if the validation rules changed, but it could happen if we make a breaking proto change without preparing it properly.

Maybe the safest thing is to leave it as-is and if we have nodes that get jammed up we know we have a problem to fix.

Note we're not doing any validation on the envelope here - we're just checking that it can be deserialized

FWIW, we do a little bit of validation on the ClientEnvelope to make sure the topic matches the payload type. But that all should be checked before it ever goes in the DB.

richardhuaaa added 3 commits December 6, 2024 08:48

Validate node ID, sequence ID and timestamp on synced payloads

2d898bd

Refactor validation into separate method

367567f

Fix bug

69efcc7

richardhuaaa requested review from fbac, mkysel and neekolas December 6, 2024 17:24

richardhuaaa marked this pull request as ready for review December 6, 2024 17:24

richardhuaaa requested a review from a team as a code owner December 6, 2024 17:24

Fix lint

e832958

mkysel reviewed Dec 6, 2024

View reviewed changes

coderabbitai bot reviewed Dec 6, 2024

View reviewed changes

pkg/sync/syncWorker.go Show resolved Hide resolved

pkg/sync/syncWorker.go Show resolved Hide resolved

pkg/db/queries.sql Show resolved Hide resolved

mkysel reviewed Dec 6, 2024

View reviewed changes

neekolas reviewed Dec 6, 2024

View reviewed changes

richardhuaaa added 3 commits December 18, 2024 11:50

Regenerate

6e5f49f

Merge remote-tracking branch 'origin/main' into rich/sync-validation

af1e4a3

Tidy

4ff914e

richardhuaaa requested review from mkysel and neekolas December 18, 2024 20:37

coderabbitai bot reviewed Dec 18, 2024

View reviewed changes

pkg/sync/syncWorker.go Show resolved Hide resolved

pkg/sync/syncWorker.go Show resolved Hide resolved

neekolas reviewed Dec 18, 2024

View reviewed changes

richardhuaaa requested a review from neekolas December 18, 2024 22:25

neekolas approved these changes Dec 18, 2024

View reviewed changes

richardhuaaa merged commit f223c36 into main Dec 19, 2024
7 of 8 checks passed

richardhuaaa deleted the rich/sync-validation branch December 19, 2024 18:44

neekolas mentioned this pull request Jan 2, 2025

Validation of originator payloads (sid, timestamp, signature), with misbehavior reports #302

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate node ID, sequence ID and timestamp on synced payloads #311

Validate node ID, sequence ID and timestamp on synced payloads #311

richardhuaaa commented Dec 6, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 6, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

richardhuaaa commented Dec 6, 2024 •

edited

Loading

mkysel Dec 6, 2024

richardhuaaa Dec 6, 2024

coderabbitai bot left a comment

mkysel Dec 6, 2024

richardhuaaa Dec 6, 2024

neekolas left a comment

richardhuaaa commented Dec 18, 2024

coderabbitai bot left a comment

neekolas Dec 18, 2024 •

edited

Loading

richardhuaaa Dec 18, 2024

neekolas Dec 18, 2024

Validate node ID, sequence ID and timestamp on synced payloads #311

Validate node ID, sequence ID and timestamp on synced payloads #311

Conversation

richardhuaaa commented Dec 6, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

Summary by CodeRabbit

coderabbitai bot commented Dec 6, 2024 • edited Loading

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

richardhuaaa commented Dec 6, 2024 • edited Loading

mkysel Dec 6, 2024

Choose a reason for hiding this comment

richardhuaaa Dec 6, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

mkysel Dec 6, 2024

Choose a reason for hiding this comment

richardhuaaa Dec 6, 2024

Choose a reason for hiding this comment

neekolas left a comment

Choose a reason for hiding this comment

richardhuaaa commented Dec 18, 2024

coderabbitai bot left a comment

Choose a reason for hiding this comment

neekolas Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

richardhuaaa Dec 18, 2024

Choose a reason for hiding this comment

neekolas Dec 18, 2024

Choose a reason for hiding this comment

richardhuaaa commented Dec 6, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 6, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

richardhuaaa commented Dec 6, 2024 •

edited

Loading

neekolas Dec 18, 2024 •

edited

Loading