feat(metrics): add metrics integration with prometheus #3339

raymondfeng · 2019-07-11T18:13:29Z

PoC for https://prometheus.io/ integration

Checklist

👉 Read and sign the CLA (Contributor License Agreement) 👈

npm test passes on your machine
New tests added or existing tests modified to cover all changes
Code conforms with the style guide
API Documentation in code was updated
Documentation in /docs/site was updated
Affected artifact templates in packages/cli were updated
Affected example projects in examples/* were updated

hacksparrow · 2019-07-12T07:30:36Z

packages/metrics/README.md

+this.component(MetricsComponent);
+```
+
+By default, Metrics route is mounted at `/metrics`. This path can be customized


How about access control?

It's similar as how we expose openapi specs. Let's worry about that later.

There are a few things to consider:

pull vs push - prometheus prefers pull

use a different rest endpoint (host/port)

ACL

raymondfeng · 2019-07-17T20:13:27Z

FYI: I refactored the module to extensions/metrics - to be consistent with #3360

I had to include c74b870 so that CI can run.

bajtos · 2019-07-30T11:31:08Z

I had to include c74b870 so that CI can run.

As I commented in the other PR, can you please open a new PR to make the necessary changes to allow extensions to be hosted in extensions/ directory?

bajtos · 2019-07-30T11:36:23Z

I am not familiar with Prometheus. What kind of metrics is it collecting? What kind of metrics does it makes sense to provide from a LoopBack application? What can be exposed out of the box and what requires users to provide explicit configuration for?

For example:

Request latency - how long does it take to handle a request.
- From the time we receive request headers until we send the response headers (not waiting for the response body to be sent entirely)
- From the time we receive request body until we send the response headers
- From the time we receive request headers until the last byte of the response was sent
DataSource latency - how long does it take to execute a call to a datasource (make a DB query, call a backend web-service)
Memory usage
CPU usage
Event loop latency
Anything else?

I'd like the documentation for the extension (the README?) to better describe these aspects and educate LB4 users that are new to Prometheus.

raymondfeng · 2019-07-30T20:02:31Z

@bajtos I have updated README to include more information.

raymondfeng · 2019-08-23T16:15:48Z

@bajtos PTAL.

bajtos

The proposal looks reasonable, I'd like to discuss few aspects & design decisions you have made.

extensions/metrics/README.md

extensions/metrics/src/__examples__/demo.sh

bajtos · 2019-08-27T12:12:22Z

extensions/metrics/src/__tests__/acceptance/metrics-push.acceptance.ts

+// Only run the test on Travis with Linux
+const verb =
+  process.env.TRAVIS && os.platform() === 'linux' ? describe : describe.skip;
+verb('Metrics (with push gateway)', function() {


Please use skipIf or skipOnTravis from @loopback/testlab.

Example usage:

https://github.com/strongloop/loopback-next/blob/6ad0bb59c71ea2356cb951da786bdbff246b47e7/packages/repository-tests/src/crud/freeform-properties.suite.ts#L26-L30

skipIf(describe, '...', () => {}); does not compile.

skipIf(describe, '...', () => {}); does not compile.

That's a known limitation of the current version and/or TypeScript.

Did you try the slightly-longer form I shown in my comment?

skipIf<[(this: Suite) => void], void>(

bajtos · 2019-08-27T12:13:40Z

extensions/metrics/src/__tests__/acceptance/metrics-push.acceptance.ts

+  process.env.TRAVIS && os.platform() === 'linux' ? describe : describe.skip;
+verb('Metrics (with push gateway)', function() {
+  // eslint-disable-next-line no-invalid-this
+  this.timeout(30000);


Is this going to increase the duration of npm test by another 10-30 seconds? I am concerned that npm test is already taking to long to finish to make TDD practical, I am reluctant to make it even worse.

I just set the timeout conservatively to allow the prom/pushgateway docker container to be up and running. We may have to define a nightly build to run the costly integration tests. What do you think?

So how long does it usually take to get prom/pushgateway docker container up and running? When I run npm test on my local machine for the second time, how long delay will be introduced because of waiting for docker?

We may have to define a nightly build to run the costly integration tests. What do you think?

I feel it's not enough to run these tests nightly. If a pull request breaks one of these tests, then we will discover the problem too late.

Can we use the approach I have in place for running repository-test tests against real databases? Here is the gist:

These tests ARE NOT run as part of npm test.

There are clear instruction how to run the tests locally - see e.g. MySQL instructions.

The tests expect external services like databases to be already running and available. This way we pay the cost of starting the services only once, not for every test run.

There is a single Travis CI job for each test suite (MongoDB, MySQL, etc.), these jobs are executed in parallel with other jobs like npm test, code linting, commit linting, etc. Example job config: .travis.yml#L53-L69 - it does not use Docker, because native MySQL is faster to setup, but Travis CI does support docker.

Can you write a mock-up pushgateway that will call our metrics endpoint the same way as the real gateway does? That way we can verify push functionality from extensions/metrics tests.

Then we can add a new package to acceptance directory, e.g. acceptance/push-metrics, where we will use a real push gateway running in a docker container, to ensure our push implementation works with real gateways too and catch any discrepancies between our mock gateway and real gateways.

Can we remove this 30 second timeout now, or at least reduce it to something like 5-10s?

Also I see that you introduce a mock push gateway, which is great! But how can we be sure that it's accurately simulating the behavior of a real gateway? Shouldn't we have an acceptance tests using the docker-based gateway as I proposed in my comment above?

I think the mock-up push gateway gives us enough confidence as we just to have make sure this component is pushing metrics to the gateway (the correctness should have been covered by the prom-client).

extensions/metrics/src/interceptors/metrics.interceptor.ts

extensions/metrics/src/metrics.component.ts

package-lock.json

bajtos · 2019-08-27T12:22:14Z

Would you like to expose README of this new component in https://loopback.io/doc/en/lb4/Using-components.html?

raymondfeng · 2019-09-16T16:44:25Z

@bajtos PTAL

extensions/metrics/README.md

bajtos

The patch looks much better now! I'd like to discuss few points before approving it.

extensions/metrics/README.md

extensions/metrics/src/interceptors/metrics.interceptor.ts

bajtos · 2019-09-19T14:44:52Z

extensions/metrics/src/__tests__/acceptance/metrics-push.acceptance.ts

+  process.env.TRAVIS && os.platform() === 'linux' ? describe : describe.skip;
+verb('Metrics (with push gateway)', function() {
+  // eslint-disable-next-line no-invalid-this
+  this.timeout(30000);


Can we remove this 30 second timeout now, or at least reduce it to something like 5-10s?

Also I see that you introduce a mock push gateway, which is great! But how can we be sure that it's accurately simulating the behavior of a real gateway? Shouldn't we have an acceptance tests using the docker-based gateway as I proposed in my comment above?

extensions/metrics/src/__tests__/acceptance/mock-pushgateway.ts

raymondfeng · 2019-10-01T20:13:27Z

@bajtos PTAL

bajtos

I quickly skimmed through the changes, have few more comments.

Please get at least one more person from the team (@strongloop/loopback-maintainers) to review the changes too.

extensions/metrics/src/__tests__/acceptance/metrics-push.acceptance.ts

extensions/metrics/src/observers/pushgateway.observer.ts

bajtos

I don't have any more comments.

Please get approval from at least one more person from @strongloop/loopback-maintainers before landing.

hacksparrow · 2019-11-01T12:30:26Z

As a POC, this looks good.

However, I am not sure if hardwiring a core extension to a particular service is a good idea. Ideally, its interface should be an adapter - users should be able to use Prometheus alternatives, if they want to.

extensions/metrics/README.md

jannyHou · 2019-11-01T16:34:22Z

extensions/metrics/src/interceptors/metrics.interceptor.ts

+      targetName: invocationCtx.targetName,
+    });
+    try {
+      this.counter.inc();


A question for the counter: I run the demo and noticed the method invocation # is 105:

# HELP loopback_invocation_total method invocation counts # TYPE loopback_invocation_total counter loopback_invocation_total 105

The demo app doesn't have any controller/endpoints, so I am wondering... what are the methods being invoked?

Good question. We have a built-in controller in extensions/metrics.

So whenever the /metrics is scraped, our metrics interceptor is triggered.

ah, that makes sense 👍

jannyHou

LGTM 🚢 I run the demo and tried the /metrics endpoint, the report looks reasonable.
I have a general question for prometheus: is it also aimed to monitor particular endpoints or it's more for monitoring the health of an app?

My understanding for @loopback/extension-metrics is people can use it to monitor their app or project like the demo, do we plan to add a new package under /metrics to monitor our project with this extension?

raymondfeng · 2019-11-01T18:38:50Z

I have a general question for prometheus: is it also aimed to monitor particular endpoints or it's more for monitoring the health of an app?

We already have @loopback/extension-health for health checks. The metrics extension is to enable metrics reporting for prometheus. The metrics includes Node.js runtime, LoopBack framework code (TBA), and application logic.

raymondfeng requested a review from bajtos as a code owner July 11, 2019 18:13

raymondfeng changed the title ~~feat(metrics): add metrics integration with prometheus~~ [RFC WIP] feat(metrics): add metrics integration with prometheus Jul 11, 2019

raymondfeng force-pushed the prometheus branch 2 times, most recently from b401dbb to 8bec17b Compare July 12, 2019 05:01

hacksparrow reviewed Jul 12, 2019

View reviewed changes

raymondfeng force-pushed the prometheus branch 5 times, most recently from c65f811 to b42e83c Compare July 15, 2019 16:10

raymondfeng mentioned this pull request Jul 16, 2019

Export a method to get prometheus metrics without mounting to /metrics CloudNativeJS/appmetrics-prometheus#29

Open

raymondfeng force-pushed the prometheus branch from 3c53f41 to a24a8fb Compare July 17, 2019 19:57

raymondfeng added CloudNative Cloud native enablement Observability labels Jul 19, 2019

raymondfeng force-pushed the prometheus branch from a24a8fb to 25de7e2 Compare July 29, 2019 17:55

raymondfeng force-pushed the prometheus branch 2 times, most recently from 96d7916 to 0964afd Compare July 30, 2019 17:57

raymondfeng force-pushed the prometheus branch from 0964afd to fd08597 Compare July 30, 2019 21:58

raymondfeng force-pushed the prometheus branch 2 times, most recently from bd3e6d9 to fbb7f30 Compare August 19, 2019 17:06

raymondfeng force-pushed the prometheus branch 2 times, most recently from 0ba3204 to 9639c29 Compare August 23, 2019 02:16

raymondfeng changed the title ~~[RFC WIP] feat(metrics): add metrics integration with prometheus~~ feat(metrics): add metrics integration with prometheus Aug 26, 2019

bajtos reviewed Aug 27, 2019

View reviewed changes

raymondfeng force-pushed the prometheus branch from 91ffe33 to 75ad58e Compare September 13, 2019 17:51

raymondfeng force-pushed the prometheus branch from 75ad58e to b65f338 Compare September 16, 2019 19:51

bajtos reviewed Sep 19, 2019

View reviewed changes

extensions/metrics/README.md Outdated Show resolved Hide resolved

bajtos reviewed Sep 19, 2019

View reviewed changes

raymondfeng force-pushed the prometheus branch from b65f338 to 2c8cfcb Compare September 19, 2019 22:48

raymondfeng requested a review from bajtos September 20, 2019 14:40

raymondfeng force-pushed the prometheus branch 2 times, most recently from 33233ad to e3b60b9 Compare September 20, 2019 15:57

raymondfeng force-pushed the prometheus branch from e3b60b9 to 08cdb79 Compare October 1, 2019 17:01

bajtos reviewed Oct 17, 2019

View reviewed changes

extensions/metrics/src/__tests__/acceptance/metrics-push.acceptance.ts Outdated Show resolved Hide resolved

extensions/metrics/src/observers/pushgateway.observer.ts Outdated Show resolved Hide resolved

raymondfeng force-pushed the prometheus branch 2 times, most recently from ba183e4 to cb0aa16 Compare October 21, 2019 22:09

raymondfeng force-pushed the prometheus branch 2 times, most recently from 94ce1b5 to dfb854c Compare November 1, 2019 03:21

bajtos approved these changes Nov 1, 2019

View reviewed changes

jannyHou reviewed Nov 1, 2019

View reviewed changes

raymondfeng force-pushed the prometheus branch from dfb854c to 02ec7fa Compare November 1, 2019 16:44

jannyHou approved these changes Nov 1, 2019

View reviewed changes

raymondfeng force-pushed the prometheus branch from 99d97cb to bea38cc Compare November 1, 2019 18:27

raymondfeng force-pushed the prometheus branch 2 times, most recently from 67cb790 to aa9c952 Compare November 1, 2019 20:23

raymondfeng added 2 commits November 1, 2019 13:59

feat(extension-metrics): add metrics extension for prometheus

e6f7eaf

feat(example-metrics-prometheus): add an example for prometheus metrics

3202f7d

raymondfeng force-pushed the prometheus branch from aa9c952 to 3202f7d Compare November 1, 2019 20:59

raymondfeng merged commit 2c11c6d into master Nov 1, 2019

raymondfeng deleted the prometheus branch November 1, 2019 21:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(metrics): add metrics integration with prometheus #3339

feat(metrics): add metrics integration with prometheus #3339

raymondfeng commented Jul 11, 2019 •

edited

Loading

hacksparrow Jul 12, 2019

raymondfeng Jul 12, 2019

raymondfeng commented Jul 17, 2019

bajtos commented Jul 30, 2019

bajtos commented Jul 30, 2019

raymondfeng commented Jul 30, 2019

raymondfeng commented Aug 23, 2019

bajtos left a comment

bajtos Aug 27, 2019

raymondfeng Aug 27, 2019

bajtos Sep 13, 2019

bajtos Aug 27, 2019

raymondfeng Aug 27, 2019

bajtos Sep 13, 2019

bajtos Sep 13, 2019

bajtos Sep 19, 2019

raymondfeng Sep 19, 2019

bajtos commented Aug 27, 2019

raymondfeng commented Sep 16, 2019

bajtos left a comment

bajtos Sep 19, 2019

raymondfeng commented Oct 1, 2019

bajtos left a comment

bajtos left a comment

hacksparrow commented Nov 1, 2019

jannyHou Nov 1, 2019

raymondfeng Nov 1, 2019

jannyHou Nov 1, 2019

jannyHou left a comment

raymondfeng commented Nov 1, 2019

feat(metrics): add metrics integration with prometheus #3339

feat(metrics): add metrics integration with prometheus #3339

Conversation

raymondfeng commented Jul 11, 2019 • edited Loading

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raymondfeng commented Jul 17, 2019

bajtos commented Jul 30, 2019

bajtos commented Jul 30, 2019

raymondfeng commented Jul 30, 2019

raymondfeng commented Aug 23, 2019

bajtos left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bajtos commented Aug 27, 2019

raymondfeng commented Sep 16, 2019

bajtos left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raymondfeng commented Oct 1, 2019

bajtos left a comment

Choose a reason for hiding this comment

bajtos left a comment

Choose a reason for hiding this comment

hacksparrow commented Nov 1, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jannyHou left a comment

Choose a reason for hiding this comment

raymondfeng commented Nov 1, 2019

raymondfeng commented Jul 11, 2019 •

edited

Loading