Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add murmur2_random support for message partitioning. #884

Merged
merged 10 commits into from
Feb 3, 2021
Merged

Conversation

divo
Copy link
Contributor

@divo divo commented Jan 29, 2021

Add murmur2_random support for message partitioning compatibility with Java producers.

I'm not 100% certain on the new configuration parameter. The caller shouldn't need to know how to instantiate the partitioner, so this is what I came up with. Open to suggestions.

The test failures seem to be unrelated to the changes. Not sure what to do there.

@dasch

@divo divo requested a review from dasch January 29, 2021 05:54
Object.const_get(partitioner_klass).new
else
Partitioner.new
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should introduce a new concept next to Partitioner. Instead, Partitioner should be configured with the hash/digest algorithm, with the default being crc32 but murmur2 being an option. The "random" part would just always be the behavior when no key is present.

So: can we introduce a new parameter to Partitioner#initialize called e.g. hash_function, which allows either crc32 (default) or murmur2. Clients that wish to use murmur2 would need to instantiate Partitioner with that value and pass the instance to producer.

In Racecar and other high-level frameworks, we can add configuration keys for controlling this declaratively.

Copy link
Contributor Author

@divo divo Feb 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good. I was taking inspiration from librdkafka but your idea will also work well with the opt-in behaviour below.

Just to be clear, you would like the Partitioner instance to be passed in, ala sasl_oauth_token_provider, and not the hash_function as a symbol ala compression_codec? I'm only asking because the latter seems like a better fit here, but I don't have much experience writing gems.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer the former, since we're talking about varying the hash algo used by Partitioner, but not changing its fundamental strategy. Also, I prefer a minimal change in API – the high level libraries can easily make this a config key without any extra APIs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for breaking it down for me. I've removed the API change and moved it inside the Partitioner

@@ -28,6 +28,7 @@ Gem::Specification.new do |spec|
spec.require_paths = ["lib"]

spec.add_dependency 'digest-crc'
spec.add_dependency "digest-murmurhash"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the dependencies of this gem? The dependency policy for ruby-kafka is pretty strict, so for opt-in behavior the default would be to give an error message when the library cannot be detected and ask the user to install it themselves (would love optional dependencies in gems, but 🤷)

If the gem is really trivial and doesn't include C code then we can add it as a hard dep, but only then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see how the various compression codecs are used. Would a similar setup to that be ok here?

The gem is fairly trivial but it's basically just C code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's C code then I think it should be opt-in, so that would be a good change 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

Copy link
Contributor

@dasch dasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Can you also 1) update the README to document this, and perhaps add docs to Partitioner describing the setup, and 2) update the CHANGELOG?

@divo
Copy link
Contributor Author

divo commented Feb 1, 2021

@dasch Thank you, and yes of course. I'm still thinking about the naming of the Digest module but if your happy with it that's fine.

@divo
Copy link
Contributor Author

divo commented Feb 2, 2021

@dasch I've updated the documentation, let me know what you think.

Copy link
Contributor

@dasch dasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants