Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add new message_hash column #2127

Closed
wants to merge 1 commit into from

Conversation

ABresting
Copy link
Contributor

Description

Adding the message_hash (messageHash) as a new column in the Waku archive protocol. To compute the messageHash, we refer to rfc_guide. computeDigest function has been modified to support the conjunction of pubSubToic without breaking the existing test cases. In future PRs, the existing id attribute/column of the databases (SQLite and PostgreSQL) shall be removed with due testing and pagination support.

The change will help attain a message index type attribute which will be used in sync-related protocols for archive Waku node.

Changes

  • The message_hash attribute should be present SQLite.
  • The message_hash attribute should be present Postgres.

Issue

#2112

@github-actions
Copy link

This PR may contain changes to database schema of one of the drivers.

If you are introducing any changes to the schema, make sure the upgrade from the latest release to this change passes without any errors/issues.

Please make sure the label release-notes is added to make sure upgrade instructions properly highlight this change.

@github-actions
Copy link

You can find the image built from this PR at

quay.io/wakuorg/nwaku-pr:2127

Built from 7d9408e

@ABresting ABresting self-assigned this Oct 13, 2023
Copy link
Contributor

@alrevuelta alrevuelta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach looks good. Can we add a simple unit test?

Also not sure on the migration strategy, since it modifies the schema.

@@ -18,14 +19,16 @@ import

type MessageDigest* = MDigest[256]

proc computeDigest*(msg: WakuMessage): MessageDigest =
proc computeDigest*(msg: WakuMessage, pubSubTopic: string = DefaultPubsubTopic): MessageDigest =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since every message would come through a pubSub topic, what is the point of the default here? (Feels like it would only lead to issues in the future if we forget to pass in a pubSub topic and then get inconsistent behaviour)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that is for the test cases...would it make sense to rather fix the tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but the pubSub topic needs to be passed to the computeDigest function as it is not a WakuMessage attribute.

The default is used since we are using the existing function in so many test cases already, one way to get rid of that (without changing all the other test cases) is:

  • extend the function with a default (what I've used) or,
  • add another function with some sort of similar name and redirect it from the queue driver and Waku archive code for production workflow

better of the two seems to be the former. WDYT @vpavlin?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is "update the tests" not an option?:)

@ABresting
Copy link
Contributor Author

Approach looks good. Can we add a simple unit test?

I am wondering what a simple unit test for messageHash looks like. Just like id it is computationally generated, and there is no existing test case for id as well.

If I allow my imagination to run, Hash implies something unique, so maybe a hash collision test? to make sure that the hash computed is indeed unique? IDK if this is an overkill..? WDYT @alrevuelta?

@vpavlin
Copy link
Member

vpavlin commented Oct 16, 2023

I am wondering what a simple unit test for messageHash looks like. Just like id it is computationally generated, and there is no existing test case for id as well.

You could just test that if a different pubsub topic is used, or there is different metadata field value that the hashes are different.

I'd say it is less about testing the hash function output, but rather making sure the digest proc inputs actually influence the output.

I.e. if someone comes back and changes the compueDigest to previous version by removing the pubSub from it, your test should catch the regression.

Testing if the hash collides does not make sense, unless we are actualyl implementing the hash algo

Copy link
Collaborator

@Ivansete-status Ivansete-status left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I think is a good step forward!

Nevertheless, would it be possible to split it into three PRs?

  1. Changing the Postgres schema.
  2. Changing the SQLite schema.
  3. Changing the queue schema.

With respect to Postgres, we will need to coordinate the change with the infra team so that the shards.test fleet is properly updated beforehand.

Changing the SQLite will require adding a new "migration" script where we actually perform the schema change, and we will need to test it manually and check that the schema is being properly updated.

@ABresting
Copy link
Contributor Author

closing this PR so that new PRs are split more targetted towards the specific components such as Postgres, Sqlit, Queue schema etc. Thanks, @Ivansete-status for the suggestions and feedback/review from Alvaro and Vaclav!

@ABresting ABresting closed this Oct 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants