Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to bind KafkaClientMetrics for each decaton consumer #132

Merged
merged 7 commits into from
Dec 8, 2021

Conversation

mauhiz
Copy link
Contributor

@mauhiz mauhiz commented Nov 22, 2021

Motivation

  • micrometer 1.4+ has deprecated the JMX-based based metrics gathering ( KafkaConsumerMetrics ) in favour of converting Consumer exposed Metrics. See this PR.
  • decaton metrics don't cover some of the generic consumer metrics like records lag

Changes

Always bind KafkaClientMetrics, tagged with the right subscription id.
This should not be a breaking change since there is no reasonable way to have already bound KafkaClientMetrics for these consumers elsewhere.
Note : not doing this for KafkaProducerSupplier which is overridable / customizable.

(I'd like advice on how to deal with the test failure - I don't understand why we have to manage micrometer registry cleanup manually)

@kawamuray
Copy link
Contributor

Thanks for PR!

I think one of the downsides of enabling KafkaClientMetrics by default is it would explode number of metrics in registry provided by decaton?
Decaton provides its own registry to let users easy control what to do with it, like exposing directly, or combining it with their own registry to mix the resulting metrics set.
While I think most of users will need kafka client's metrics in addition to decaton's metrics, I thought most of them using metrics.reporters to bypass decaton layers and inject client directly their own reporter, which sounds more suitable way than decaton to take care of it, but micormeter apparently doesn't provide out-of-the-box support for using metrics.reporters?
In case, I think supporting KafkaClientMetrics in decaton is good, but would it be better to expose ConsumerSupplier interface instead? or at least a flag to enable (disable) KafkaClientMetrics binding is needed IMO.

(I'd like advice on how to deal with the test failure - I don't understand why we have to manage micrometer registry cleanup manually)

That's because each decaton subscription creates metrics with its own label sets, so unless we clean these metrics up at the end of subscription's lifetime, there will be a garbage created which will keep accumulating. It wouldn't become a significant problem in most of decaton use-cases where decaton subscription dies along with application instance, but still should be cleaned up as possible as we can as a manner.

@mauhiz
Copy link
Contributor Author

mauhiz commented Nov 24, 2021

is it would explode number of metrics in registry provided by decaton?

yes, assuming people don't already monitor their Kafka client via JMX adapted (namely, micrometer's deprecated KafkaConsumerMetrics ).

micrometer apparently doesn't provide out-of-the-box support for using metrics.reporters

If I understand what you mean correctly, isn't that exactly the KafkaClientMetrics this PR is introducing?

expose ConsumerSupplier

My first thought, but it was not convenient since it doesn't have the subscriptionId (and not sure it should).

flag to enable/disable KafkaClientMetrics

Sounds good. Will add this flag.

clean these metrics up at the end of subscription's lifetime

I see, if we go the way I suggest in this PR I will fix to cleanup the metrics
What do other maintainers think?

@mauhiz mauhiz force-pushed the feature-poc-KafkaClientMetrics branch from 8741661 to cefcab2 Compare November 24, 2021 02:10
@mauhiz
Copy link
Contributor Author

mauhiz commented Nov 24, 2021

I couldn't find a way to cleanup KafkaClientMetrics on subscription close. So, making it a more free option by exposing a builder for it, and leaving the binding up to the user.

@mauhiz mauhiz changed the title Bind micrometer KafkaClientMetrics to each decaton consumer Allow to bind KafkaClientMetrics for each decaton consumer Nov 24, 2021
@kawamuray
Copy link
Contributor

If I understand what you mean correctly, isn't that exactly the KafkaClientMetrics this PR is introducing?

No, they favored the way using #metrics() method of Consumer to obtain metrics collection and wrap it over. What I said is to use metric.reporters property, which you specify the class name that implements MetricsReporter interface, so you can give it through properties you give to SubscriptionBuilder#consumerConfig().

@kawamuray
Copy link
Contributor

I couldn't find a way to cleanup KafkaClientMetrics on subscription close.

Indeed with the current version, but looks like it has added in latest?

https://github.com/micrometer-metrics/micrometer/blob/main/micrometer-core/src/main/java/io/micrometer/core/instrument/binder/kafka/KafkaMetrics.java#L295

@mauhiz
Copy link
Contributor Author

mauhiz commented Nov 25, 2021

metric.reporters property

Okay, I didn't know there was such a thing. I don't know why would they use or not use it :D

added in latest

a-ha, nice. Let me try to integrate it.

@mauhiz mauhiz force-pushed the feature-poc-KafkaClientMetrics branch from cefcab2 to 7036afb Compare November 25, 2021 08:40
@mauhiz
Copy link
Contributor Author

mauhiz commented Nov 25, 2021

PTAL.
Note that these new metrics are prefixed with decaton. but that is fine, since they have an extra dimension (subscriptionId) they are not mergeable with generic consumer metrics

@mauhiz
Copy link
Contributor Author

mauhiz commented Nov 26, 2021

wow, many tests started timing out... why..

@kawamuray
Copy link
Contributor

Just guessing, by increased number of metrics ...? We might need profiling to be sure.

@mauhiz
Copy link
Contributor Author

mauhiz commented Dec 1, 2021

It turns out that calling properties in TestUtils would override the properties defined right before in builderConfigurer.accept(builder).

I added commits to

  • avoid overwriting properties in SubscriptionBuilder - I can't see any good reason to do it
  • add timeouts of 20s in some places of the tests that were blocking forever, for easier debugging


public void bindClientMetrics(Consumer<String, byte[]> consumer) {
if (kafkaClientMetrics != null) {
closeClientMetrics();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which case would this be effective?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, none. Indeed SubscriptionMetrics are only instantiated in ProcessorSubscription ctor. Removing

@@ -59,8 +59,8 @@
presetRetryProducerConfig.put(ProducerConfig.LINGER_MS_CONFIG, "100");
}

@Setter(AccessLevel.NONE)
private ProcessorProperties.Builder<ProcessorProperties> propertiesBuilder;
private final ProcessorProperties.Builder<ProcessorProperties> propertiesBuilder =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid overwriting properties in SubscriptionBuilder - I can't see any good reason to do it

The intention is to provide a way to "clear" (reset) something set previously. So basically these methods taking collection of items replaces existing set rather than adding.
To keep this behavior consistent to other methods like consumerConfig and backward, I think we shouldn't change this.
I'm happy to add another kind of method like addProperties() (addProperty()) which supports it and being more explicit tho.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's what one would expect of a builder, but happy to oblige with your design constraints

@@ -62,6 +72,7 @@ private static Properties defaultProducerProps(String bootstrapServers) {
}

public static final String DEFAULT_GROUP_ID = "test-group";
public static final Duration DEFINITELY_TOO_SLOW = Duration.ofSeconds(20);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add timeouts of 20s in some places of the tests that were blocking forever, for easier debugging

curious why backtrace provided when test itself times out (@Test(timeout = xxx)) isn't sufficient to know at where it stuck?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because it was not exactly stuck. Producer and Consumer were running and doing nothing.

@mauhiz mauhiz requested a review from kawamuray December 6, 2021 04:31
Copy link
Contributor

@kawamuray kawamuray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thread.currentThread().interrupt();
throw new RuntimeException(e);
}
LockSupport.parkNanos(TimeUnit.MILLISECONDS.toNanos(100));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the intention of this change? Seems like waiting just 100ms here isn't an actual problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just avoid having to deal with InterruptedException. No strong preference anyways

Copy link
Contributor

@ocadaruma ocadaruma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

LGTM except kawamuray's point.

@kawamuray kawamuray merged commit 75d8840 into line:master Dec 8, 2021
@mauhiz mauhiz deleted the feature-poc-KafkaClientMetrics branch December 8, 2021 08:37
@mauhiz
Copy link
Contributor Author

mauhiz commented Dec 9, 2021

Thanks for the reviews! Is the next release already scheduled?

@kawamuray
Copy link
Contributor

It should be in next few weeks, but if you're in hurry, I can make a release now.

@mauhiz
Copy link
Contributor Author

mauhiz commented Dec 13, 2021

I'm not exactly in a hurry but finer-grained releases usually make happier users!

@kawamuray kawamuray added the new feature Add a new feature label Dec 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature Add a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants