Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Hub/Shuttle is streaming delete events but when queried hubs for latest state it says that its in merge state #2025

Open
chetankashetti opened this issue Jun 4, 2024 · 7 comments
Labels
s-triage Needs to be reviewed, designed and prioritized

Comments

@chetankashetti
Copy link

chetankashetti commented Jun 4, 2024

What is the bug?
Basically we are receiving delete events from Shuttle but when we check for hub using the getAllVerificationMessagesByFid it says its in added state. we are assuming whenever MESSAGE_TYPE_VERIFICATION_ADD_ETH_ADDRESS appears with
merge state -> we add to our system
delete state -> remove from our system
prune state -> remove from our system
revoke state -> remove from our system
correct us if we're wrong.
for ex. FID - 284564. If you observe we're adding a a timestamp when we receive called event_timestamp. delete event is coming without any remove event.

{
	"data": {
		"type": "MESSAGE_TYPE_VERIFICATION_ADD_ETH_ADDRESS",
		"fid": 284564,
		"timestamp": 1717338302000,
		"network": "FARCASTER_NETWORK_MAINNET",
		"verificationAddAddressBody": {
			"address": "Kjn9lOE5lSw5WOaRcrqRjY9TNXE=",
			"claimSignature": "jTZWmE8MJlONRBzItIqo2ahEHucShtDpSVEKKME++y5xb+a26BDFfhdInlrPBSrxqjevwlc3zFub7bpWE00+zxw=",
			"blockHash": "lGlqnrWQvyrLS6y4EUhjd3GwoqTPbRGYFrOWuc1E3Og=",
			"verificationType": 0,
			"chainId": 0,
			"protocol": "PROTOCOL_ETHEREUM"
		}
	},
	"hash": "DyGKwXxmmpWnJnJXq9MFnV64Ty8=",
	"hashScheme": "HASH_SCHEME_BLAKE3",
	"signature": "sDemPPBPGKUYAZly17eHknYCMLCyzUUVEPJN8zYCQI95v8JiUbPQs8ycw3oU8WxLjDT0uv//j/QQhgv4zjqbDw==",
	"signatureScheme": "SIGNATURE_SCHEME_ED25519",
	"signer": "9YfDCLVjOCC1TSedks0ovIhQ0QxSp9jadZq7OEtSexk=",
	"eventTimestamp": 1717338302000,
	"operation": "merge"
}
{
	"data": {
		"type": "MESSAGE_TYPE_VERIFICATION_ADD_ETH_ADDRESS",
		"fid": 284564,
		"timestamp": 1716794998000,
		"network": "FARCASTER_NETWORK_MAINNET",
		"verificationAddAddressBody": {
			"address": "Kjn9lOE5lSw5WOaRcrqRjY9TNXE=",
			"claimSignature": "tJnBlHSeoxYxsE29j4P/GvWXQpNOTlB5MbGdTpiHqEk3pfek71oIjhvrNa0WnAN2BRcTWZRyrvrzwE5hLY19Vhw=",
			"blockHash": "moftXqHGrPPyGWmBv523i7EA4ME7urpLZqyvJZ9INEU=",
			"verificationType": 0,
			"chainId": 0,
			"protocol": "PROTOCOL_ETHEREUM"
		}
	},
	"hash": "TsGP1MUa9rTjaNY42whm4ztP0KI=",
	"hashScheme": "HASH_SCHEME_BLAKE3",
	"signature": "AjgNy7sXqXN8lYKesgKzRM5soF0rNL1kXlIOoIA6Zhu3rEpkUSpyfmzxJfku0RRERvCHh4qKZ0T4Rrfjw6/zCQ==",
	"signatureScheme": "SIGNATURE_SCHEME_ED25519",
	"signer": "9YfDCLVjOCC1TSedks0ovIhQ0QxSp9jadZq7OEtSexk=",
	"eventTimestamp": 1717338306784,
	"operation": "delete"
}

There are thousands of such FIDs which are getting affected with this issue. please try to investigate this. as its our top priority.
Feel free to reach me out if more details are needed in this issue. farcaster-dc: chetansk

We are live streaming all the events to kafka where we noticed this. you can stream using shuttle which can
image

This is happening for lot of FIDs where delete events are coming but seems like there's no actual deletion happened.

@github-actions github-actions bot added the s-triage Needs to be reviewed, designed and prioritized label Jun 4, 2024
@sanjayprabhu
Copy link
Contributor

sanjayprabhu commented Jun 4, 2024

Hmm, so, what's seems to be happening here is that there are two verification add messages for the same address and fid.

one from May 27, with message hash "TsGP1MUa9rTjaNY42whm4ztP0KI" and
one from June 2 with message hash "DyGKwXxmmpWnJnJXq9MFnV64Ty8="

When the June 2 message is merged, it correctly overwrites and deletes the May 2 one, because hubs have a unique constraint on the address. There is no remove message. A remove is message is not required to cause a delete, because a newer message will overwrite an older one (especially for verifications and UserDataAdd).

It looks like the events have made it into your kafka stream out of order. There's not enough information here to say what caused this. If you could provide the createdat and deletedat timestamps for these two messages from shuttle's messages table for these two messages it will help narrow down whether it's happening in the hub, shuttle or at some point after.

Note that the message timestamp is different. Even though the delete event timestamp is newer, the timestamp of the message is older than the timestamp of the message in the merge event. So, a workaround would be to check for this.

To summarize,

  1. delete events don't require a Remove message, and can happen when a newer message overwrites an older one
  2. the events are appearing out of order in the queue, but I'm not sure what's causing this, I'm pretty sure hub and shuttle emit in the right order, but we'd need data from the messages table to confirm
  3. checking for the message timestamp can be a potential workaround while the root cause is determined for 2) above.

@chetankashetti
Copy link
Author

chetankashetti commented Jun 5, 2024

Thanks for the response @sanjayprabhu
Having only delete message is totally fine. but there shouldn't multiple delete/merges?
we have these records in our messages table

[
  {
    "id": "018f2e40-a33f-8cbc-27ca-81dab82afe3d",
    "created_at": "2024-04-30 09:05:39.124098+00",
    "updated_at": "2024-04-30 09:05:39.124098+00",
    "timestamp": "2024-04-16 23:07:19+00",
    "deleted_at": "2024-05-27 07:30:01.195+00",
    "pruned_at": null,
    "revoked_at": null,
    "fid": 284564,
    "type": 7,
    "hash_scheme": 1,
    "signature_scheme": 1,
    "hash": "20 bytes",
    "signer": "32 bytes",
    "body": "{\"address\":\"0x2a39fd94e139952c3958e69172ba918d8f533571\",\"claimSignature\":\"0x6d3460cbbc770e42913d6ae8942f47d96c9ee57e79ba62c9aaf6061c522329f906f37b1f6449730bca0567b90987d865efbc21423202304063fe3c77597248141c\",\"blockHash\":\"0x97c6ab49b01c3af5347959ae5456d8d95e7731a4f4330800b11dcd380cae1502\",\"protocol\":\"ethereum\"}",
    "raw": "267 bytes"
  },
  {
    "id": "018fb8f4-c995-709e-8540-0be48b4d8ca3",
    "created_at": "2024-05-27 07:30:01.178892+00",
    "updated_at": "2024-05-27 07:30:01.178892+00",
    "timestamp": "2024-05-27 07:29:58+00",
    "deleted_at": "2024-06-02 14:25:06.781+00",
    "pruned_at": null,
    "revoked_at": null,
    "fid": 284564,
    "type": 7,
    "hash_scheme": 1,
    "signature_scheme": 1,
    "hash": "20 bytes",
    "signer": "32 bytes",
    "body": "{\"address\":\"0x2a39fd94e139952c3958e69172ba918d8f533571\",\"claimSignature\":\"0xb499c194749ea31631b04dbd8f83ff1af59742934e4e507931b19d4e9887a84937a5f7a4ef5a088e1beb35ad169c0376051713599472aefaf3c04e612d8d7d561c\",\"blockHash\":\"0x9a87ed5ea1c6acf3f2196981bf9db78bb100e0c13bbaba4b66acaf259f483445\",\"protocol\":\"ethereum\"}",
    "raw": "267 bytes"
  },
  {
    "id": "018fd956-f8f5-4435-0ccc-53b232703349",
    "created_at": "2024-06-02 14:25:06.780531+00",
    "updated_at": "2024-06-02 14:25:06.780531+00",
    "timestamp": "2024-06-02 14:25:02+00",
    "deleted_at": null,
    "pruned_at": null,
    "revoked_at": null,
    "fid": 284564,
    "type": 7,
    "hash_scheme": 1,
    "signature_scheme": 1,
    "hash": "20 bytes",
    "signer": "32 bytes",
    "body": "{\"address\":\"0x2a39fd94e139952c3958e69172ba918d8f533571\",\"claimSignature\":\"0x8d3656984f0c26538d441cc8b48aa8d9a8441ee71286d0e949510a28c13efb2e716fe6b6e810c57e17489e5acf052af1aa37afc25737cc5b9bedba56134d3ecf1c\",\"blockHash\":\"0x94696a9eb590bf2acb4bacb81148637771b0a2a4cf6d119816b396b9cd44dce8\",\"protocol\":\"ethereum\"}",
    "raw": "267 bytes"
  }
]

@sanjayprabhu
Copy link
Contributor

sanjayprabhu commented Jun 5, 2024

Having only delete message is totally fine. but there shouldn't multiple delete/merges?

It's valid at the protocol level to keep sending messages. Newer messages will always overwrite older messages. You can keep changing your UserData.DISPLAY_NAME 50 times, and it will generate 50 merges and 49 deletes. Only the message with the latest timestamp is valid, and in the messages table there will be only one that's not marked deleted.

The above example seems to be for a different FID, but we can walk through that

018f2e40-a33f-8cbc-27ca-81dab82afe3d was created on April 30
018fb8f4-c995-709e-8540-0be48b4d8ca3 created on May 27, came in and deleted the message above (above messages was deleted_at a few ms after this one was created_at)
018fd956-f8f5-4435-0ccc-53b232703349 came in on June 2 and in turn deleted the above message (in this case, it looks like the delete event is .5ms after the create event)

I'm not sure why the delete event is after (even if it's by less than 1 ms), I checked the code and the callbacks should be called in order (delete first, then merge):

https://github.com/farcasterxyz/hub-monorepo/blob/main/packages/shuttle/src/shuttle/hubEventProcessor.ts#L52-L62

One potential reason is the they are not part of the same event, that's unlikely, but I'll double check to make sure.

You can add some logs in your MessageHandler to confirm, but the delete callback should be happening before the merge.

@sanjayprabhu
Copy link
Contributor

Note, these are all different message hashes. The user is submitting multiple verification messages for the same eth address. The hub is doing the right thing by deleting the old message and merging the latest message and generating a event for each.

The events are for unique messages, they are not duplicates

@chetankashetti
Copy link
Author

3. checking for the message timestamp can be a potential workaround while the root cause is determined for 2) above.

The problem with relying on the timestamp is, merge and delete events of same message hash will have the same timestamp. This can cause problem in our system because if they arrive in different order we won't know which is latest.
so I think we can't rely on the timestamp. for different message hash of same address will have diff timestamp which i agree.

In the above example as you rightly said,
For the FID: 284564
first message: 018f2e40-a33f-8cbc-27ca-81dab82afe3d
created_at -> 2024-04-30 09:05:39.124098+00
deleted_at -> 2024-05-27 07:30:01.195+00

second message: 018fb8f4-c995-709e-8540-0be48b4d8ca3
created_at -> 2024-05-27 07:29:58+00
deleted_at -> 2024-06-02 14:25:06.781+00

third message -> 018fd956-f8f5-4435-0ccc-53b232703349
created_at -> 2024-06-02 14:25:02+00

we are not modifying anything in our handler, it processes in the order we receive. As you pointed out delete_at of 018fb8f4-c995-709e-8540-0be48b4d8ca3 has higher timestamp than creation of the message 018fd956-f8f5-4435-0ccc-53b232703349.
So the order events streamed from hub seems to be wrong. if we consume in this order we delete in our system instead of add. that is resulting in inconsistent data.

@sds
Copy link
Member

sds commented Jul 9, 2024

Hey @chetankashetti, sorry for the delay.

EDIT See below comment instead.

@sds
Copy link
Member

sds commented Jul 9, 2024

One other point worth mentioning: if you look at how Shuttle is implemented (make sure you're running the latest version, as some code may have changed), you can see that Shuttle always processes the deleted messages before the merged message:

if (deletedMessages.length > 0) {
await Promise.all(
deletedMessages.map(async (deletedMessage) => {
const isNew = await MessageProcessor.storeMessage(deletedMessage, trx, "delete", log);
const state = this.getMessageState(deletedMessage, "delete");
await handler.handleMessageMerge(deletedMessage, trx, "delete", state, isNew, wasMissed);
}),
);
}
const isNew = await MessageProcessor.storeMessage(message, trx, operation, log);
const state = this.getMessageState(message, operation);
await handler.handleMessageMerge(message, trx, operation, state, isNew, wasMissed);

Given this, you can safely always treat handleMessageMerge with a state parameter of deleted as safe to delete, and one with state of created as safe to create. This is much simpler than what I was previously discussing above, so I've updated my prior comment to avoid further confusion.

Just know that if you create your own handlers (I know you're not doing this—I'm just mentioning for completeness) and you don't use the state / operation parameters, you will have to reinvent the CRDT logic locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
s-triage Needs to be reviewed, designed and prioritized
Projects
None yet
Development

No branches or pull requests

3 participants