-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(metrics): add metrics integration with prometheus #3339
Conversation
b401dbb
to
8bec17b
Compare
packages/metrics/README.md
Outdated
this.component(MetricsComponent); | ||
``` | ||
|
||
By default, Metrics route is mounted at `/metrics`. This path can be customized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about access control?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's similar as how we expose openapi specs. Let's worry about that later.
There are a few things to consider:
- pull vs push - prometheus prefers
pull
- use a different rest endpoint (host/port)
- ACL
c65f811
to
b42e83c
Compare
As I commented in the other PR, can you please open a new PR to make the necessary changes to allow extensions to be hosted in |
I am not familiar with Prometheus. What kind of metrics is it collecting? What kind of metrics does it makes sense to provide from a LoopBack application? What can be exposed out of the box and what requires users to provide explicit configuration for? For example:
I'd like the documentation for the extension (the README?) to better describe these aspects and educate LB4 users that are new to Prometheus. |
96d7916
to
0964afd
Compare
@bajtos I have updated README to include more information. |
bd3e6d9
to
fbb7f30
Compare
0ba3204
to
9639c29
Compare
@bajtos PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposal looks reasonable, I'd like to discuss few aspects & design decisions you have made.
// Only run the test on Travis with Linux | ||
const verb = | ||
process.env.TRAVIS && os.platform() === 'linux' ? describe : describe.skip; | ||
verb('Metrics (with push gateway)', function() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use skipIf
or skipOnTravis
from @loopback/testlab
.
Example usage:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
skipIf(describe, '...', () => {});
does not compile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
skipIf(describe, '...', () => {});
does not compile.
That's a known limitation of the current version and/or TypeScript.
Did you try the slightly-longer form I shown in my comment?
skipIf<[(this: Suite) => void], void>(
process.env.TRAVIS && os.platform() === 'linux' ? describe : describe.skip; | ||
verb('Metrics (with push gateway)', function() { | ||
// eslint-disable-next-line no-invalid-this | ||
this.timeout(30000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this going to increase the duration of npm test
by another 10-30 seconds? I am concerned that npm test
is already taking to long to finish to make TDD practical, I am reluctant to make it even worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just set the timeout conservatively to allow the prom/pushgateway
docker container to be up and running. We may have to define a nightly build to run the costly integration tests. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So how long does it usually take to get prom/pushgateway
docker container up and running? When I run npm test
on my local machine for the second time, how long delay will be introduced because of waiting for docker?
We may have to define a nightly build to run the costly integration tests. What do you think?
I feel it's not enough to run these tests nightly. If a pull request breaks one of these tests, then we will discover the problem too late.
Can we use the approach I have in place for running repository-test
tests against real databases? Here is the gist:
- These tests ARE NOT run as part of
npm test
. - There are clear instruction how to run the tests locally - see e.g. MySQL instructions.
- The tests expect external services like databases to be already running and available. This way we pay the cost of starting the services only once, not for every test run.
- There is a single Travis CI job for each test suite (MongoDB, MySQL, etc.), these jobs are executed in parallel with other jobs like
npm test
, code linting, commit linting, etc. Example job config: .travis.yml#L53-L69 - it does not use Docker, because native MySQL is faster to setup, but Travis CI does support docker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you write a mock-up pushgateway that will call our metrics endpoint the same way as the real gateway does? That way we can verify push functionality from extensions/metrics
tests.
Then we can add a new package to acceptance
directory, e.g. acceptance/push-metrics
, where we will use a real push gateway running in a docker container, to ensure our push implementation works with real gateways too and catch any discrepancies between our mock gateway and real gateways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this 30 second timeout now, or at least reduce it to something like 5-10s?
Also I see that you introduce a mock push gateway, which is great! But how can we be sure that it's accurately simulating the behavior of a real gateway? Shouldn't we have an acceptance tests using the docker-based gateway as I proposed in my comment above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the mock-up push gateway gives us enough confidence as we just to have make sure this component is pushing metrics to the gateway (the correctness should have been covered by the prom-client).
Would you like to expose README of this new component in https://loopback.io/doc/en/lb4/Using-components.html? |
91ffe33
to
75ad58e
Compare
@bajtos PTAL |
75ad58e
to
b65f338
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patch looks much better now! I'd like to discuss few points before approving it.
process.env.TRAVIS && os.platform() === 'linux' ? describe : describe.skip; | ||
verb('Metrics (with push gateway)', function() { | ||
// eslint-disable-next-line no-invalid-this | ||
this.timeout(30000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this 30 second timeout now, or at least reduce it to something like 5-10s?
Also I see that you introduce a mock push gateway, which is great! But how can we be sure that it's accurately simulating the behavior of a real gateway? Shouldn't we have an acceptance tests using the docker-based gateway as I proposed in my comment above?
extensions/metrics/src/__tests__/acceptance/mock-pushgateway.ts
Outdated
Show resolved
Hide resolved
b65f338
to
2c8cfcb
Compare
33233ad
to
e3b60b9
Compare
e3b60b9
to
08cdb79
Compare
@bajtos PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I quickly skimmed through the changes, have few more comments.
Please get at least one more person from the team (@strongloop/loopback-maintainers) to review the changes too.
extensions/metrics/src/__tests__/acceptance/metrics-push.acceptance.ts
Outdated
Show resolved
Hide resolved
ba183e4
to
cb0aa16
Compare
94ce1b5
to
dfb854c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any more comments.
Please get approval from at least one more person from @strongloop/loopback-maintainers before landing.
As a POC, this looks good. However, I am not sure if hardwiring a core extension to a particular service is a good idea. Ideally, its interface should be an adapter - users should be able to use Prometheus alternatives, if they want to. |
targetName: invocationCtx.targetName, | ||
}); | ||
try { | ||
this.counter.inc(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A question for the counter: I run the demo and noticed the method invocation # is 105:
# HELP loopback_invocation_total method invocation counts
# TYPE loopback_invocation_total counter
loopback_invocation_total 105
The demo app doesn't have any controller/endpoints, so I am wondering... what are the methods being invoked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. We have a built-in controller in extensions/metrics
.
So whenever the /metrics
is scraped, our metrics interceptor is triggered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, that makes sense 👍
dfb854c
to
02ec7fa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚢 I run the demo and tried the /metrics endpoint, the report looks reasonable.
I have a general question for prometheus: is it also aimed to monitor particular endpoints or it's more for monitoring the health of an app?
My understanding for @loopback/extension-metrics
is people can use it to monitor their app or project like the demo, do we plan to add a new package under /metrics
to monitor our project with this extension?
99d97cb
to
bea38cc
Compare
We already have |
67cb790
to
aa9c952
Compare
aa9c952
to
3202f7d
Compare
PoC for https://prometheus.io/ integration
Checklist
👉 Read and sign the CLA (Contributor License Agreement) 👈
npm test
passes on your machinepackages/cli
were updatedexamples/*
were updated👉 Check out how to submit a PR 👈