Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] HStore JRaft Timer Metrics Error #2601

Closed
1 task done
JackyYangPassion opened this issue Jul 29, 2024 · 1 comment · Fixed by #2602
Closed
1 task done

[Bug] HStore JRaft Timer Metrics Error #2601

JackyYangPassion opened this issue Jul 29, 2024 · 1 comment · Fixed by #2602
Labels
bug Something isn't working raft

Comments

@JackyYangPassion
Copy link
Contributor

Bug Type (问题类型)

None

Before submit

  • 我已经确认现有的 IssuesFAQ 中没有相同 / 重复问题 (I have confirmed and searched that there are no similar problems in the historical issue and documents)

Environment (环境信息)

  • Server Version: master
  • Backend: HStore

Expected & Actual behavior (期望与实际表现)

当前问题

指标处理存在BUG,代码如下,没有将Timer 类型的所有指标序列化

private static void registerTimer(String group, String name, com.codahale.metrics.Timer timer) {
        List<Tag> tags = new LinkedList<>();
        tags.add(handleDataTag);
        tags.add(Tag.of("group", group));

        name = refineMetrics(name, tags);

        String baseName = PREFIX + "." + name.toLowerCase();

        Gauge.builder(baseName + ".count", timer, Timer::getCount)
             .tags(tags).register(registry);

        Gauge.builder(baseName + ".timer", timer, Timer::getCount)
             .tags(tags).tag("rate", "1m").register(registry);
        Gauge.builder(baseName + ".timer", timer, Timer::getCount)
             .tags(tags).tag("rate", "5m").register(registry);
        Gauge.builder(baseName + ".timer", timer, Timer::getCount)
             .tags(tags).tag("rate", "15m").register(registry);
        Gauge.builder(baseName + ".timer", timer, Timer::getCount)
             .tags(tags).tag("rate", "mean").register(registry);

    }

期望结果

通过Spring Actuator 接口获取正确的JRaft 监控指标

curl http://ip:8620/actuator/prometheus

指标详情

2024-07-29 11:47:35 [metrics-logger-reporter-1-thread-1] [INFO] c.j.metrics - type=TIMER, name=append-logs, count=17291, min=3.0, max=253.0, mean=28.157759851792466, stddev=17.330493557378528, p50=25.0, p75=30.0, p95=51.0, p98=69.0, p99=81.0, p999=253.0, m1_rate=0.9356150842263176, m5_rate=0.9329823706620638, m15_rate=0.9350678338706933, mean_rate=0.9294925546811751, rate_unit=events/second, duration_unit=milliseconds
2024-07-29 11:47:35 [metrics-logger-reporter-1-thread-1] [INFO] c.j.metrics - type=TIMER, name=fsm-apply-tasks, count=17287, min=0.0, max=3.0, mean=0.13447834828994729, stddev=0.34185352532926017, p50=0.0, p75=0.0, p95=1.0, p98=1.0, p99=1.0, p999=1.0, m1_rate=0.9310589317856405, m5_rate=0.9324733392682882, m15_rate=0.9349448148157692, mean_rate=0.9292836834823113, rate_unit=events/second, duration_unit=milliseconds
2024-07-29 11:47:35 [metrics-logger-reporter-1-thread-1] [INFO] c.j.metrics - type=TIMER, name=fsm-commit, count=17291, min=0.0, max=23.0, mean=0.24613836494239846, stddev=0.9677265117138243, p50=0.0, p75=0.0, p95=1.0, p98=1.0, p99=1.0, p999=23.0, m1_rate=0.9309011983782098, m5_rate=0.9323032983295501, m15_rate=0.9348626133983298, mean_rate=0.9295033886824194, rate_unit=events/second, duration_unit=milliseconds
2024-07-29 11:47:35 [metrics-logger-reporter-1-thread-1] [INFO] c.j.metrics - type=TIMER, name=fsm-leader-stop, count=2, min=8.0, max=52.0, mean=52.0, stddev=1.75850743252206E-62, p50=52.0, p75=52.0, p95=52.0, p98=52.0, p99=52.0, p999=52.0, m1_rate=1.3291750626407833E-69, m5_rate=4.552447297073617E-15, m15_rate=5.664206814717154E-6, mean_rate=2.1188385465332752E-4, rate_unit=events/second, duration_unit=milliseconds
2024-07-29 11:47:35 [metrics-logger-reporter-1-thread-1] [INFO] c.j.metrics - type=TIMER, name=fsm-snapshot-load, count=1, min=0.0, max=0.0, mean=0.0, stddev=0.0, p50=0.0, p75=0.0, p95=0.0, p98=0.0, p99=0.0, p999=0.0, m1_rate=4.674558570015141E-136, m5_rate=2.3701297284679825E-28, m15_rate=2.116463109431812E-10, mean_rate=5.3743698206652295E-5, rate_unit=events/second, duration_unit=milliseconds
2024-07-29 11:47:35 [metrics-logger-reporter-1-thread-1] [INFO] c.j.metrics - type=TIMER, name=fsm-snapshot-save, count=10, min=10.0, max=23.0, mean=NaN, stddev=NaN, p50=23.0, p75=23.0, p95=23.0, p98=23.0, p99=23.0, p999=23.0, m1_rate=7.65193724310238E-8, m5_rate=2.8597035543981796E-4, m15_rate=5.662721104444384E-4, mean_rate=5.903183960163894E-4, rate_unit=events/second, duration_unit=milliseconds
2024-07-29 11:47:35 [metrics-logger-reporter-1-thread-1] [INFO] c.j.metrics - type=TIMER, name=fsm-start-following, count=7, min=0.0, max=167.0, mean=7.0, stddev=7.0, p50=1.0, p75=14.0, p95=14.0, p98=14.0, p99=14.0, p999=14.0, m1_rate=2.5289317851795146E-69, m5_rate=4.992277828710806E-15, m15_rate=5.8293921710028665E-6, mean_rate=7.416504939112436E-4, rate_unit=events/second, duration_unit=milliseconds
2024-07-29 11:47:35 [metrics-logger-reporter-1-thread-1] [INFO] c.j.metrics - type=TIMER, name=fsm-stop-following, count=7, min=0.0, max=220.0, mean=4.0, stddev=2.0315526340900756E-10, p50=4.0, p75=4.0, p95=4.0, p98=4.0, p99=4.0, p999=4.0, m1_rate=4.712091790377789E-59, m5_rate=2.6241937206161542E-14, m15_rate=1.1837520635530499E-5, mean_rate=7.432915370594506E-4, rate_unit=events/second, duration_unit=milliseconds
2024-07-29 11:47:35 [metrics-logger-reporter-1-thread-1] [INFO] c.j.metrics - type=TIMER, name=handle-append-entries, count=1393, min=0.0, max=21.0, mean=0.14409513888717168, stddev=0.7486983736914141, p50=0.0, p75=0.0, p95=1.0, p98=1.0, p99=1.0, p999=11.0, m1_rate=2.591725754569959E-57, m5_rate=4.4535965642666085E-12, m15_rate=1.3201492356068484E-4, mean_rate=0.147588298537147, rate_unit=events/second, duration_unit=milliseconds
2024-07-29 11:47:35 [metrics-logger-reporter-1-thread-1] [INFO] c.j.metrics - type=TIMER, name=handle-heartbeat-requests, count=4970, min=0.0, max=332.0, mean=0.6737667642622672, stddev=8.160082225646834, p50=0.0, p75=0.0, p95=1.0, p98=1.0, p99=2.0, p999=106.0, m1_rate=9.041425626333392E-57, m5_rate=1.554471755910028E-11, m15_rate=5.380218377610177E-4, mean_rate=0.5265705794620201, rate_unit=events/second, duration_unit=milliseconds
2024-07-29 11:47:35 [metrics-logger-reporter-1-thread-1] [INFO] c.j.metrics - type=TIMER, name=pre-vote, count=9, min=51.0, max=471.0, mean=188.0, stddev=137.0, p50=308.0, p75=325.0, p95=325.0, p98=325.0, p99=325.0, p999=325.0, m1_rate=1.675883309145768E-69, m5_rate=6.885617979638117E-16, m15_rate=2.6270002966360165E-7, mean_rate=4.8379603190225974E-4, rate_unit=events/second, duration_unit=milliseconds
2024-07-29 11:47:35 [metrics-logger-reporter-1-thread-1] [INFO] c.j.metrics - type=TIMER, name=replicate-entries, count=31796, min=14.0, max=315.0, mean=NaN, stddev=NaN, p50=41.0, p75=41.0, p95=41.0, p98=41.0, p99=41.0, p999=41.0, m1_rate=1.8685176970118065, m5_rate=1.865627062523206, m15_rate=1.869773748582935, mean_rate=1.7092266657792805, rate_unit=events/second, duration_unit=milliseconds
2024-07-29 11:47:35 [metrics-logger-reporter-1-thread-1] [INFO] c.j.metrics - type=TIMER, name=request-vote, count=9, min=26.0, max=542.0, mean=NaN, stddev=NaN, p50=249.0, p75=249.0, p95=249.0, p98=249.0, p99=249.0, p999=249.0, m1_rate=9.424183580057186E-59, m5_rate=3.319368129423501E-14, m15_rate=5.752918976724958E-7, mean_rate=4.8380027618448364E-4, rate_unit=events/second, duration_unit=milliseconds
2024-07-29 11:47:35 [metrics-logger-reporter-1-thread-1] [INFO] c.j.metrics - type=TIMER, name=save-raft-meta, count=9, min=43.0, max=436.0, mean=NaN, stddev=NaN, p50=43.0, p75=43.0, p95=43.0, p98=43.0, p99=43.0, p999=43.0, m1_rate=4.712091790089162E-59, m5_rate=1.69192150106503E-14, m15_rate=4.161259144260586E-7, mean_rate=4.837980838895543E-4, rate_unit=events/second, duration_unit=milliseconds
2024-07-29 11:47:35 [metrics-logger-reporter-1-thread-1] [INFO] c.j.metrics - type=TIMER, name=truncate-log-prefix, count=11, min=20.0, max=29.0, mean=NaN, stddev=NaN, p50=20.0, p75=20.0, p95=20.0, p98=20.0, p99=20.0, p999=20.0, m1_rate=7.65193724310238E-8, m5_rate=2.859703555118053E-4, m15_rate=5.662787599808003E-4, mean_rate=5.911815157146595E-4, rate_unit=events/second, duration_unit=milliseconds

具体指标

append-logs
fsm-apply-tasks
fsm-commit
fsm-leader-stop
fsm-snapshot-load
fsm-snapshot-save
fsm-start-following
fsm-stop-following
handle-append-entries
handle-heartbeat-requests
pre-vote
request-vote
replicate-entries
save-raft-meta
truncate-log-prefix

Vertex/Edge example (问题点 / 边数据举例)

No response

Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)

No response

@JackyYangPassion JackyYangPassion added the bug Something isn't working label Jul 29, 2024
@dosubot dosubot bot added the raft label Jul 29, 2024
Copy link

dosubot bot commented Jul 29, 2024

To address the issue of not serializing all Timer metrics correctly, you need to ensure that the TimerSerializer class is used to serialize Timer metrics. This class handles the serialization of various statistical measures and the duration unit.

Here is the TimerSerializer class for reference:

private static class TimerSerializer extends StdSerializer<Timer> {

    private static final long serialVersionUID = 6283520188524929099L;

    private final String rateUnit;
    private final double rateFactor;
    private final String durationUnit;
    private final double durationFactor;
    private final boolean showSamples;

    private TimerSerializer(TimeUnit rateUnit, TimeUnit durationUnit,
                            boolean showSamples) {
        super(Timer.class);
        this.rateUnit = calculateRateUnit(rateUnit, "calls");
        this.rateFactor = rateUnit.toSeconds(1);
        this.durationUnit = durationUnit.toString().toLowerCase(Locale.US);
        this.durationFactor = 1.0 / durationUnit.toNanos(1);
        this.showSamples = showSamples;
    }

    @Override
    public void serialize(Timer timer, JsonGenerator json,
                          SerializerProvider provider) throws IOException {
        json.writeStartObject();
        final Snapshot snapshot = timer.getSnapshot();
        json.writeNumberField("count", timer.getCount());
        json.writeNumberField("min", snapshot.getMin() * this.durationFactor);
        json.writeNumberField("mean", snapshot.getMean() * this.durationFactor);
        json.writeNumberField("max", snapshot.getMax() * this.durationFactor);
        json.writeNumberField("stddev", snapshot.getStdDev() * this.durationFactor);

        json.writeNumberField("p50", snapshot.getMedian() * this.durationFactor);
        json.writeNumberField("p75", snapshot.get75thPercentile() * this.durationFactor);
        json.writeNumberField("p95", snapshot.get95thPercentile() * this.durationFactor);
        json.writeNumberField("p98", snapshot.get98thPercentile() * this.durationFactor);
        json.writeNumberField("p99", snapshot.get99thPercentile() * this.durationFactor);
        json.writeNumberField("p999", snapshot.get999thPercentile() * this.durationFactor);
        json.writeStringField("duration_unit", this.durationUnit);

        if (this.showSamples) {
            final long[] values = snapshot.getValues();
            final double[] scaledValues = new double[values.length];
            for (int i = 0; i < values.length; i++) {
                scaledValues[i] = values[i] * this.durationFactor;
            }
            json.writeObjectField("values", scaledValues);
        }

        json.writeNumberField("mean_rate", timer.getMeanRate() * this.rateFactor);
        json.writeNumberField("m15_rate", timer.getFifteenMinuteRate() * this.rateFactor);
        json.writeNumberField("m5_rate", timer.getFiveMinuteRate() * this.rateFactor);
        json.writeNumberField("m1_rate", timer.getOneMinuteRate() * this.rateFactor);
        json.writeStringField("rate_unit", this.rateUnit);
        json.writeEndObject();
    }
}

To integrate this with Spring Actuator, ensure that your Spring configuration includes this custom serializer for Timer metrics. This can be done by registering the MetricsModule with the appropriate configuration in your Spring application.

Additionally, make sure your application.yml file has the necessary configurations to expose the metrics:

management:
  metrics:
    export:
      prometheus:
        enabled: true
  endpoints:
    web:
      exposure:
        include: "*"

This setup will ensure that all Timer metrics are serialized correctly and can be retrieved via the Spring Actuator interface [1][2].

To continue talking to Dosu, mention @dosu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working raft
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant