Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document how to use k-anonymity with content topic #146

Closed
fryorcraken opened this issue Dec 7, 2023 · 4 comments
Closed

Document how to use k-anonymity with content topic #146

fryorcraken opened this issue Dec 7, 2023 · 4 comments
Labels
documentation Improvements or additions to documentation

Comments

@fryorcraken
Copy link
Contributor

Discourage developers to use PII in the content topic (such as a public key) but instead create buckets (such as first 4 bytes of hash of public key).

Attached to that some recommended practice. E.g if 1000 users, then use one content topic, if 10,000, use 10, etc.

@LordGhostX
Copy link
Contributor

@fryorcraken I've opened #148 to tackle this issue.

Changed the structure of the content topic naming considerations section and included the creation of buckets you mentioned above

I need some clarity on the recommended practice. Why should developers use 1 for 1k and 10 for 10k?

Also, are there any more recommended practices

@fryorcraken
Copy link
Contributor Author

If an app uses a single content topic, then all users' traffic will be in this topic.
So any user using req-res protocols (filter, store) will receive all the app messages.

This can be fine if the traffic is low. Do not that RLN does apply so there are limit on traffic.

However, if this too much traffic to handle on mobile or App, then developers can create a bucket.

Starting with an unique identifier, depending on the app it could be:

  • id / public key of recipient
  • id / public key of sender
  • id of a room or group or other sort of app domain topic.

One can do a hash of this id

import { sha256 } from "@noble/hashes/sha256";
import { bytesToHex } from "@waku/utils/bytes";

const hash = sha256
  .create()
  .update(/* id */)
  .digest();
// a56145270ce6b3bebd1dd012b73948677dd618d496488bc608a3cb43ce3547dd
console.log(bytesToHex(hash))

by including the first char of the hash in the content topic: /my-app/0/a/proto then the user traffic gets divided in 16: 16 possible value.

Which means that statiscally, users will only download a 16th of what they were downloading a single content topic was used.
Note: this is just a statistic, as the distribution may not be even.

Depending on the expected traffic and actual number of users or id then the developer can decide to include several first bytes in the content topic. or even increase it over time.

this is the k value of k-anonymity.

k is equal to the number of id for whom the first char of the hash is "a".

if an application has 10,000 users. then using a single pubsub topic will give k = 10,000.

Using the first char of the hash id will give k = 10,000/16 = 625

@LordGhostX
Copy link
Contributor

Weekly Update

@chair28980 chair28980 added the documentation Improvements or additions to documentation label Dec 12, 2023
@LordGhostX
Copy link
Contributor

Weekly Update

@github-project-automation github-project-automation bot moved this to Done in Waku Jan 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
Status: Done
Development

No branches or pull requests

3 participants