Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/ottl] Add murmur3 #34077

Open
kaisecheng opened this issue Jul 15, 2024 · 9 comments
Open

[pkg/ottl] Add murmur3 #34077

kaisecheng opened this issue Jul 15, 2024 · 9 comments
Labels

Comments

@kaisecheng
Copy link
Contributor

Component(s)

pkg/ottl

Is your feature request related to a problem? Please describe.

OTTL doesn't have murmur3 hash function, which is widely use for non-cryptographic purposes with low collision rate

Describe the solution you'd like

Use spaolacci/murmur3 Sum128 to hash input string and return hex string fingerprint

Describe alternatives you've considered

No response

Additional context

No response

@kaisecheng kaisecheng added enhancement New feature or request needs triage New item requiring triage labels Jul 15, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

kaisecheng added a commit to kaisecheng/opentelemetry-collector-contrib that referenced this issue Jul 18, 2024
@evan-bradley evan-bradley removed the needs triage New item requiring triage label Jul 18, 2024
@evan-bradley
Copy link
Contributor

@kaisecheng Thanks for opening this and being willing to take on the implementation. I'm not familiar with MurmurHash, could you offer more details about what use cases you have in mind? I see that a handful of applications use it for non-cryptographic hashing, but I'd like to have a clear use-case in mind for including it in OTTL. We currently support 64-bit FNV-1a hashes through the FNV function, so I think we at least already support a non-cryptographic hash.

@kaisecheng
Copy link
Contributor Author

@evan-bradley Thanks for looking into this issue.
While FNV is useful, MurmurHash3 offers distinct advantages that make it essential for many use cases. It's generally faster than FNV, especially for longer inputs, and provides a more uniform distribution. MurmurHash3 is widely used for data deduplication, consistent hashing, and as a hash function in data pipelines.
I believe supporting MurmurHash3 in OTTL would align with industry practices and significantly ease migration for users with existing systems that rely on it.

@TylerHelmuth
Copy link
Member

@kaisecheng can you provide some examples of other industry tools or data pipelines that utilize this has function?

@kaisecheng
Copy link
Contributor Author

@TylerHelmuth Major data processing frameworks like Spark, Flink, and Apache Beam use MurmurHash3 for fingerprinting in deduplication operations, stream processing, and consistent hashing.
MurmurHash3 in OTTL could be used to generate fingerprints of telemetry data for routing, deduplication, and determining which shard a piece of data belongs to. This is particularly useful for customers who manage database sharding themselves, and would facilitate seamless integration with existing data pipelines that rely on MurmurHash3

@kaisecheng
Copy link
Contributor Author

@TylerHelmuth ☝️ Any thoughts on adding MurmurHash3? Adding a faster hash function with a more uniform distribution is a good enhancement

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Oct 15, 2024
@TylerHelmuth
Copy link
Member

@evan-bradley and I discussed during kubecon and we are ok with adding this function.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants