Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CON-181 - joaquin - Add PrometheusClient to ContentNode #2888

Closed
wants to merge 38 commits into from

Conversation

joaquincasares
Copy link
Contributor

Description

Add PrometheusClient to ContentNode with a simple test metric to graph track.js:upload() times.

Tests

Tested by uploading a track to a local audius-client w/ a remote-dev backend. This populated some tracks sample metrics while also testing the importer.

$ curl http://localhost:4001/prometheus_metrics
...
# HELP audius_cn_routes_tracks_runtime_seconds Runtimes for src.routes.tracks:/tracks
# TYPE audius_cn_routes_tracks_runtime_seconds histogram
audius_cn_routes_tracks_runtime_seconds_bucket{le="0.005",scope="full"} 0
audius_cn_routes_tracks_runtime_seconds_bucket{le="0.01",scope="full"} 0
audius_cn_routes_tracks_runtime_seconds_bucket{le="0.025",scope="full"} 0
audius_cn_routes_tracks_runtime_seconds_bucket{le="0.05",scope="full"} 0
audius_cn_routes_tracks_runtime_seconds_bucket{le="0.1",scope="full"} 0
audius_cn_routes_tracks_runtime_seconds_bucket{le="0.25",scope="full"} 0
audius_cn_routes_tracks_runtime_seconds_bucket{le="0.5",scope="full"} 0
audius_cn_routes_tracks_runtime_seconds_bucket{le="1",scope="full"} 0
audius_cn_routes_tracks_runtime_seconds_bucket{le="2.5",scope="full"} 0
audius_cn_routes_tracks_runtime_seconds_bucket{le="5",scope="full"} 0
audius_cn_routes_tracks_runtime_seconds_bucket{le="10",scope="full"} 0
audius_cn_routes_tracks_runtime_seconds_bucket{le="+Inf",scope="full"} 1
audius_cn_routes_tracks_runtime_seconds_sum{scope="full"} 598
audius_cn_routes_tracks_runtime_seconds_count{scope="full"} 1

How will this change be monitored? Are there sufficient logs?

Once deployed to staging, I'll upload a track and ensure metrics appear.

Once soaking in prod, I'll ensure metrics appear without issues.

@joaquincasares joaquincasares requested a review from SidSethi April 8, 2022 23:02
@joaquincasares joaquincasares changed the title Jc asi 983 Add PrometheusClient to ContentNode Apr 8, 2022
Copy link
Contributor

@SidSethi SidSethi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think i should be solo/primary reviewer here as i have zero context into prometheus. should be @dmanjunath

also would be good to clarify the purpose of these metrics vs eg our local redis metrics. are we planning on moving all of those to prometheus?

given this is a new pattern which no one else has context on, might be good to add a README in CN for our prometheus usage pattern

also the Discprov code changes should be separated into a separate PR, however minor. and someone with Discprov codebase familiarity should review those

@@ -58,16 +58,14 @@ def save_time(self, labels=None, start_time=None):
self.save(self.elapsed(start_time), labels)

def save(self, value, labels=None):
this_metric = self.metric
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this actually work? wouldn't the below assignments just modify the value of this_metric as opposed to updating self.metric?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, it does!

self.metric.labels() returns a new Metric object which we will then use to call .set() or value() with.

We don't seek to change self.metric since we might want to call self.metric.set() later without including any labels, but sometimes we do want to call self.metric.labels(labels).set().

New PR here: #2901

Copy link
Contributor

@dmanjunath dmanjunath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to @SidSethi's comments and left a few comments of my own

Copy link
Contributor

@SidSethi SidSethi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aside from several implementation related comments, i think it would be good to move the prometheus code into src/services/metrics/* folder and add a README for common usage patterns bc i found it pretty difficult to follow what was happening


module.exports = function (app) {
app.get(
'/prometheus_metrics',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we document what this route is doing
iiuc:

  1. have prometheus populate all metrics into a "collector"
  2. pull info from the collectors via ".metrics()"

would be nice to rewrite this a bit as:

try {
  // TODO document - what do these do
  PrometheusMetric.populateCollectors()
  const data = await Prometheus.register.metrics()

  // TODO - what/why
  res.set('Content-Type', Prometheus.register.contentType)

  return successResponse(data)
  
} catch (e) {
  console.error('Prometheus Metrics Exporter error:', e)
  return errorResponseServerError(e.message)
}```

take a look at how other routes do response handling e.g. https://github.com/AudiusProject/audius-protocol/blob/jc-asi-983/creator-node/src/routes/tracks.js#L595

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth rate limiting or caching response from this route?
not sure how expensive the processing inside this function is

cc @dmanjunath

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nah probably not worth doing either tbh. i think prometheus also clears a bunch of metrics automatically once it's collected

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SidSethi could you look at the new layout and confirm that is clearer?

I couldn't use successResponse since that only returns json objects, not raw text.

re: rate limiting, that's something that we discussed looking into if it became a problem by using a "secret api key" specifically for the /prometheus_metrics endpoint.

// set metric prefix of audius_project_
name = `audius_cn_${name}`

this.metricType = metric_type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we instead just define the constructor param as

constructor(name, description, labelNames, metric_type = PrometheusType.HISTOGRAM)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually even better - lets not allow a default and force each caller to explicitly specify the metric type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the constructor to use named parameters, so I believe I've lost the ability to set default parameters, right?

For almost all cases, we'll want histograms since those "time things", but sometimes we'll want gauges to see what the current stat is.

this.metricType = PrometheusType.HISTOGRAM
}

// CollectorRegistries must be uniquely named
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats a CollectorRegistry?

Copy link
Contributor Author

@joaquincasares joaquincasares Apr 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not allowed to call new PrometheusMetricClass({name}) twice with the same name.

Instead, we will use something like a Python defaultdict to create and return new PrometheusMetricClass({name}) if we haven't seen it before, or the previously created instance if we have seen it before.

A good hierarchical overview is:

  • Prometheus-Client Registries hold Collectors.
  • Collectors hold registers of the metrics that they are holding.
  • Exporters export all metrics held within registries for each collector held within the main registry.
  • Prometheus-Client provides both Metric Registries as well as an Exporter for the Registries.
  • Prometheus scrapes Exporters to import all metrics.

*/
saveTime(labels, startTime) {
this.save(this.elapsed(startTime), labels)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this nested flow is confusing
saveTime calls never specify startTime, so we should define the constructor here as
saveTime(labels, startTime = Date.now())
instead of nesting that default in elapsed()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

saveTime is saving the time it took between when the constructor was called (first two lines of the constructor) and when saveTime() is called.

However, you can choose to overload saveTime for different reasons while still relying on the same metric name.

What startTime can't be is Date.now() since we'll want Date.now() - startTime as seen in elapsed().

handleResponse(async (req, res) => {
try {
PrometheusMetric.populateCollectors()
// res.set('Content-Type', Prometheus.register.contentType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this line?

const { getApp } = require('./lib/app')
const { getLibsMock } = require('./lib/libsMock')

describe('test Users', async function () {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test Prometheus?


module.exports = function (app) {
app.get(
'/prometheus_metrics',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nah probably not worth doing either tbh. i think prometheus also clears a bunch of metrics automatically once it's collected

@SidSethi SidSethi changed the title Add PrometheusClient to ContentNode CON-181 Add PrometheusClient to ContentNode May 31, 2022
@SidSethi SidSethi changed the title CON-181 Add PrometheusClient to ContentNode CON-181 Add PrometheusClient to ContentNode (from joaquin) May 31, 2022
@SidSethi SidSethi changed the title CON-181 Add PrometheusClient to ContentNode (from joaquin) CON-181 v1 Add PrometheusClient to ContentNode (from joaquin) Jun 1, 2022
@SidSethi SidSethi changed the title CON-181 v1 Add PrometheusClient to ContentNode (from joaquin) CON-181 - joaquin - Add PrometheusClient to ContentNode Jun 3, 2022
@SidSethi
Copy link
Contributor

closing in favor of #3166 🙂 thx joaquin for all the help

@SidSethi SidSethi closed this Jun 14, 2022
@SidSethi SidSethi deleted the jc-asi-983 branch June 14, 2022 16:44
@AudiusProject AudiusProject deleted a comment from linear bot Sep 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants