Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics #98

Closed
NiteshKant opened this issue Apr 21, 2014 · 25 comments
Closed

Metrics #98

NiteshKant opened this issue Apr 21, 2014 · 25 comments
Milestone

Comments

@NiteshKant
Copy link
Member

In order to use RxNetty effectively in production, we would require to have some metrics providing insights into the clients and servers created by RxNetty. Currently, there isn't any metrics published by RxNetty.

Pluggability

Since, RxNetty strives to be not opinionated on many aspects but for the fact that it uses Netty for the network stack and RxJava for async interfaces. It would not be prudent for us to bind to a particular metrics framework like servo, yammer metrics, etc.
Instead we should provide hooks into the system for anyone to be able to provide a plugin for various metrics framework.

Hooks

There are a few ways of providing hooks for the metrics system.

Event based

This essentially means that we publish events for any interesting state change inside RxNetty via Observables, from where people can infer these metrics.
Current RxClient connection pool implementation provides such a hook here

Advantages

The advantage of this approach is that we do not have to maintain any state inside RxNetty and would not face problems around holding this metrics around overflow which without a metrics framework would create more work for us.
One subtle advantage is that we do not have to upfront decide for everyone, which metrics are useful, people can infer newer metrics from the available events.

Disadvantages

The disadvantage is that for every plugin, people have to re-invent the wheel from the point of view of understanding these events and correlating them to infer these metrics. This gets complex when the metrics are to be inferred from a set of events instead of one.

Data based

We can also provide a data based plugin mechanism, wherein we store all the metrics of interest and provide a hook to transform and send these metrics to the metric framework of choice.

Advantages

  • Lesser things to do for every plugin provider.
  • No one has to understand the infrastructure to create the metrics.

Disadvantages

  • We would have to create basic metrics constructs like rolling counters, percentiles, etc. which comes for free with
    any metrics framework.

Required metrics

Client

All required client side metrics are defined in the issue #96

Server

All required server side metrics are defined in the issue #97

@NiteshKant NiteshKant self-assigned this Apr 21, 2014
@benjchristensen
Copy link
Member

The disadvantage is that for every plugin
We would have to create basic metrics constructs like rolling counters

For both of these we can have contrib modules that provide all of this by default.

For example, a default metrics implementation can use Numerus for counters, percentiles etc (https://github.com/Netflix-Skunkworks/Numerus) and then we can have publishers for Servo and Yammer Metrics.

@NiteshKant
Copy link
Member Author

@benjchristensen sounds good. So your preference is the event based mechanism?

@allenxwang
Copy link

It seems to me that the event based solution is better, provided that we provide default implementations based on a metrics system like Servo in an extension. Then there is no need to re-invent the wheel.

For example, TrackableStateChangeListener can be easily extended into a class for pool stats once we add Servo annotations and we can put such class under rxnetty-servo.

@benjchristensen
Copy link
Member

I think the event based one makes sense. Keep in mind too that I found Servo to not have all of the types needed for things like RollingPercentile, so Numerus was created with these types.

It was done separate from Servo instead of adding to Servo so they could be used with or without the library for publishing, which is what we use Servo and Yammer Metrics for on Hystrix.

Mantis is also using Numerus for metrics, then allows publishing of metrics via Servo and Yammer, same as we're discussing here.

@NiteshKant
Copy link
Member Author

Awesome. I will work on the event based system then.

@NiteshKant NiteshKant added this to the 0.3.4 milestone Apr 28, 2014
@NiteshKant NiteshKant modified the milestones: 0.3.5, 0.3.4 May 19, 2014
@NiteshKant
Copy link
Member Author

Moving this to milestone 0.3.5.
Had to invest time on issue #117 which requires a release now.

@NiteshKant
Copy link
Member Author

This design has been updated a few times based on the discussion/feedback in the following comments

Design Proposal

After a few preliminary discussions on the design in issue #96 I am proposing the following design.
(The code for the proposal can be found here)

Metric Events

Code

Any metric can be derived from these events. These events should always be referred to as constants (enum pattern) to avoid creation of new objects per event callback.

public interface MetricsEvent<T extends Enum> {

    T getType();

    boolean isTimed();

    boolean isError();
}

Metric Event Listeners

Code

A listener that can be added to a Client/Server to listen for metrics related events.

public interface MetricEventsListener<E extends MetricsEvent<?>> {

    int NO_DURATION = -1;
    Object NO_VALUE = null;
    Throwable NO_ERROR = null;
    TimeUnit NO_TIME_UNIT = null;

    /**
     * Event callback for any {@link MetricsEvent}. The parameters passed are all the contextual information possible for
     * any event. There presence or absence will depend on the type of event.
     *
     * @param event Event for which this callback has been invoked. This will never be {@code null}
     * @param duration If the passed event is {@link MetricsEvent#isTimed()} then the actual duration, else
     * {@link #NO_DURATION}
     * @param timeUnit The time unit for the duration, if exists, else {@link #NO_TIME_UNIT}
     * @param throwable If the passed event is {@link MetricsEvent#isError()} then the cause of the error, else
     * {@link #NO_ERROR}
     * @param value If the passed event requires custom object to be passed, then that object, else {@link #NO_VALUE}
     */
    void onEvent(E event, long duration, TimeUnit timeUnit, Throwable throwable, Object value);

    /**
     * Marks the end of all event callbacks. No methods on this listener will ever be called once this method is called.
     */
    void onCompleted();
}

Metric Event Publishers

Code

Source of Metric Events that can have zero or more listeners registered with them.

public interface MetricEventsPublisher<E extends MetricsEvent<?>> {

    Subscription subscribe(MetricEventsListener<? extends E> listener);

}

Registering Metric Event Listeners

There are two primary ways of registering Metric Event Listeners with the targeted Clients and Servers.

Global Associations

One can register a MetricEventsListenerFactory with RxNetty to associate a new listener created from this factory for every client and server created by RxNetty.
This would usually be done at the application level (for instance any application created using karyon) to provide a metrics framework like servo, yammer metrics

The Factory definition looks like this:

public interface MetricEventsListenerFactory {

    MetricEventsListener<ClientMetricsEvent<ClientMetricsEvent.EventType>> forClient(@SuppressWarnings("rawtypes")RxClient client);

    MetricEventsListener<HttpClientMetricsEvent<HttpClientMetricsEvent.HttpEventType>> forHttpClient(@SuppressWarnings("rawtypes")HttpClient client);

    MetricEventsListener<UdpClientMetricsEvent<UdpClientMetricsEvent.UdpEventType>> forUdpClient(@SuppressWarnings("rawtypes")UdpClient client);

    MetricEventsListener<ServerMetricsEvent<ServerMetricsEvent.EventType>> forServer(@SuppressWarnings("rawtypes")RxServer server);

    MetricEventsListener<HttpServerMetricsEvent<HttpClientMetricsEvent.HttpEventType>> forHttpServer(@SuppressWarnings("rawtypes")HttpServer server);

    MetricEventsListener<UdpServerMetricsEvent<UdpServerMetricsEvent.UdpEventType>> forUdpServer(@SuppressWarnings("rawtypes")UdpServer server);
}
Instance level associations

In addition to global associations one can also register a metric events listener per client/server as all the Clients and Servers contracts will also implement the Metric Event Publishers interface defined above.

@allenxwang
Copy link

I am wondering if it is necessary to have boolean isTimed(); and boolean isError(); as part of the MetricsEvent interface, as MetricEventsListener will have different callbacks for timed events and errors.

@benjchristensen
Copy link
Member

I think MetricEventsListener is generally the right pattern. I like that is lets the values stay as primitives.

A few comments:

  1. Duplication

There appear to be 3 event types and then duplicate overloads of each with long duration:

    void onEvent(E event, Object value);

    void onEvent(E event, Object value, long duration);

Can we eliminate the one without duration so implementers don't have to deal with the either/or invocation of these?

  1. Object

What is the reason for this event type? I'd like to know the use case that requires sending the Object and seeming to confuse with onEvent(E event, long duration);

void onEvent(E event, Object value, long duration);
  1. Events

Are they going to be defined as an enum somewhere or left completely dynamic? I imagine that these won't change much so should be defined.

@NiteshKant
Copy link
Member Author

Thanks @allenxwang for the comment.

I am wondering if it is necessary to have boolean isTimed(); and boolean isError(); as part of the MetricsEvent interface, as MetricEventsListener will have different callbacks for timed events and errors.

Yeah its more like a metadata for human consumption (to know what data will be accompanied with that event) and not for RxNetty to make any decisions on those attributes.

In the below example looking at the different event, it is quiet evident what accompanying data one can expect while listening for these events:

public class ClientMetricsEvent extends AbstractMetricsEvent {

    public static final ClientMetricsEvent NEW_REQUEST = new ClientMetricsEvent("new-request", false, false);
    public static final ClientMetricsEvent REQUEST_SUCCEEDED = new ClientMetricsEvent("request-succeeded", true, false);
    public static final ClientMetricsEvent REQUEST_FAILED = new ClientMetricsEvent("request-failed", true, true);

    /*Always refer to as constants*/protected ClientMetricsEvent(String name, boolean isTimed, boolean isError) {
        super(name, isTimed, isError);
    }
}

OTOH, if this metadata is absent, someone has to look for code that publishes these events to know what data accompanies these events.

@NiteshKant
Copy link
Member Author

Thanks @benjchristensen for your comments.

Let me take this example to explain the reasoning for the design.

public class ClientMetricsEvent extends AbstractMetricsEvent {

    public static final ClientMetricsEvent NEW_REQUEST = new ClientMetricsEvent("new-request", false, false);
    public static final ClientMetricsEvent REQUEST_SUCCEEDED = new ClientMetricsEvent("request-succeeded", true, false);
    public static final ClientMetricsEvent REQUEST_FAILED = new ClientMetricsEvent("request-failed", true, true);

    /*Always refer to as constants*/protected ClientMetricsEvent(String name, boolean isTimed, boolean isError) {
        super(name, isTimed, isError);
    }
}
  1. Duplication

For NEW_REQUEST there would be only the event but for REQUEST_SUCCEEDED there will be event and duration.
Similarly, for REQUEST_SUCCEEDED there would just be duration but for REQUEST_FAILED there would be duration and exception.
These usecases were the reason for the methods in the listener.

  1. Object

Now, if we plan to also pass Http Status code with REQUEST_SUCCEEDED then we would need this method:

void onEvent(E event, Object value, long duration);

and pass the Status code for the value parameter. Thinking about this now, I think if the Event itself defines the type of value as a generic parameter, it will be more intutive.

  1. Events

Yes we will have classes like the above example which will define all events published by RxNetty.

@headinthebox
Copy link

No way to remove listeners?

@NiteshKant
Copy link
Member Author

@headinthebox Good point :) Added removeListener() to MetricEventsPublisher and updated my previous comment.

@NiteshKant
Copy link
Member Author

Added removeListener() to MetricEventsPublisher and updated my previous comment.

On thinking more about it , this feels better:

public interface MetricEventsPublisher<E extends MetricsEvent> {

    Subscription subscribe(MetricEventsListener<? extends E> listener);

}

returned Subscription can be unsubscribed to remove the listener. Thoughts?

@NiteshKant
Copy link
Member Author

Updated the MetricEventsPublisher interface in the original design to use Subscription as proposed in my previous comment.

@benjchristensen @allenxwang @headinthebox I am starting work on this now, so let me know if you have any more comments about the design.

@NiteshKant
Copy link
Member Author

As I started implementing this design I found that there were multiple things that were wrong :). I am stating them below:

  • The time unit for the duration parameter in MetricEventsListener interface is missing.
  • The lack of an enum in the event is quiet painful as listeners can not do a switch-case while handling these events. So, instead of MetricsEvent.name() I propose Metrics.getType() which returns an enum. So, the MetricEvent interface looks like:
public interface MetricsEvent<T extends Enum> {

    T getType();

    boolean isTimed();

    boolean isError();
}

Since, the Enum is different for different events, the MetricsEvent class is parameterized.

  • There is no onComplete() on MetricEventsListener so listeners can not do any cleanup when the associated client/server shutdown. It is more appropriate for client listeners as they can be large in number and have a lifecycle.
  • The multiple methods in MetricEventsListener are quiet error prone for event consumers. The biggest issue being that if the consumer started listening to an event in the wrong method, he will never get the event. The penalty of this inadvertent mistake is quiet high and tough to debug.
    I think @benjchristensen you were referring to this in your feedback.
    In order to eliminate this issue, I can think of the approach where we fold all these variations into a single method and make all other arguments but for the Event as optional.
    Based on the above, I can think of the following listener interface.
public interface MetricEventsListener<E extends MetricsEvent<?>> {

    void onEvent(E event, long duration, TimeUnit timeUnit, Throwable throwable, Object value);

    void onComplete();
}

All arguments to onEvent() above are optional but for event. duration will be -1 if the event is not timed. All other parameters will be null when not applicable.

Thoughts?

NiteshKant pushed a commit to NiteshKant/RxNetty that referenced this issue Jun 9, 2014
Did a few changes in the design as discussed in the github issue.

This code change suffices for the current Pool State change events. I will be adding the remaining events and migrate the tests to use the new model.
@NiteshKant
Copy link
Member Author

Updated the original design above with the proposed changes here

@allenxwang
Copy link

It seems that Event isTimed() and isError() API is somewhat redundant now that duration and Throwable is available in the onEvent() method. It could lead to ambiguity which one should take precedence for event consumer.

@NiteshKant
Copy link
Member Author

@allenxwang as mentioned before, the isTimed() and isError() APIs are for human consumption and not really to make a code decision. If a consumer is consuming a particular event type, the data published with the event must be known to the consume before hand and RxNetty will make sure that the metadata for the event type (isTimed(), isError()) is consistent with what is published.

The documentation in the MetricEventsListener.onEvent() is also pretty clear in suggesting when should the parameters be available.

So, if a consumer is consuming an event publishing duration and Throwable, the code should be able to directly access duration and Throwable without doing existence checks.

@NiteshKant NiteshKant modified the milestones: 0.3.7, 0.3.6 Jun 24, 2014
NiteshKant pushed a commit to NiteshKant/RxNetty that referenced this issue Jun 24, 2014
NiteshKant added a commit that referenced this issue Jun 25, 2014
@NiteshKant NiteshKant removed their assignment Aug 19, 2014
@mikeweisskopf
Copy link

At one point you mentioned you were adding removeListener() to MetricEventsPublisher, but later changed to use Subscription instead. The rational was "returned Subscription can be unsubscribed to remove the listener". However, it seems to me thats not what actually happens.....instead calling unsubscribe only changes the state of the subscription (for example BooleanSubscription.isUnsubscribed() will return true). However, the listener (MetricsEventsPublisher) remains unchanged....it still has the listener and the listener still receives events. Am I missing something here or is there actually no way to remove / stop a listener?

@NiteshKant
Copy link
Member Author

@mikeweisskopf that sounds like a bug, this piece of code is expected to remove the listener, but there is mismatch between the type of item removed and added. Can you file a bug or if you are interested in sending a PR, you are welcome too :)

@mikeweisskopf
Copy link

Sure, I will file a bug and followup with a PR. Thanks for the quick reply :-)

Pull request is (here) [https://github.com//pull/428]

@mikeweisskopf
Copy link

Wondering about another perspective on metrics. The approach here generally aggregates metrics on a per client basis. But what if one needs to track metrics on a per request basis? For example, suppose one needs to know how long it took for a response to a specific request? It seems like we'd need to be able to pass some sort of "Request Context" into the MetricEventListeners to make that kind of granularity possible. In a way we have a little of that already .... for example StartTime is request specific. Do you have any thoughts on how something like this might be done, or am I missing something that is already present?

@NiteshKant
Copy link
Member Author

@mikeweisskopf I have tried to address "general" requirements around what timings one would need to capture for HTTP/TCP interactions in 0.5.x. There is also a facility to pass custom events to the listeners.

There is an open request around what you are asking and one way of doing it would be to have an ability to attach a listener per request.

@mikeweisskopf
Copy link

Thank very much @NiteshKant , we'll take a look at listener per request as well as request context. Any thoughts on when there will be a release version of 0.5.x?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants