Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple Outputs in Collector Breaks Exposition Format #80

Open
jhwbarlow opened this issue Apr 16, 2024 · 1 comment
Open

Multiple Outputs in Collector Breaks Exposition Format #80

jhwbarlow opened this issue Apr 16, 2024 · 1 comment

Comments

@jhwbarlow
Copy link

Because the Collector API has the specification of the metric type and the help text as part of the newCollector() call, it is not possible to have multiple output() calls in one collector without breaking the exposition format standard.

This is because the help text and type will only be printed once for the entire collector, and not once per-metric. The exposition standard also says that different time-series (label combinations) of the same metric should be grouped together with a single help text and type.

# HELP http_requests_total The total number of HTTP requests.
# TYPE http_requests_total counter
http_requests_total{method="post",code="200"} 1027 1395066363000
http_requests_total{method="post",code="400"}    3 1395066363000

Using multiple output() calls will tend to interleave the metrics if for example you are looping over a set of similar resources and reporting different metrics about the same resource in each loop iteration - although one could argue this is a programming error in the collector itself and the collector should be looping through each metric and outputting a time-series for each resource rather than looping through each resource and outputting each metric for that resource.

As an example, I have been playing around with an exporter to export Unix Socket metrics:

    import metrics, metrics/chronos_httpserver, posix

    const unixSocketSendQueueLenCollectorName = "unix_socket_send_queue_len"
    const unixSocketSendQueueLimitCollectorName = "unix_socket_send_queue_limit"
    const unixSocketCommonCollectorLabels = ["local_addr", "local_port", "peer_addr", "peer_port"]
    type UnixSocketSendQueueLenCollector = ref object of Collector

    method collect(self: UnixSocketSendQueueLenCollector, output: MetricHandler) =
      let timestamp = self.now()
      let mockSSDatasource = MockSocketStatsDatasource(data: DATA) # Just some mock data of `ss` output
      let ssLister = SocketStatsLister[MockSocketStatsDatasource](datasource: mockSSDatasource) # use SS for now to avoid netlink
      
      try:
        for socket in ssLister.list():        
            output(
              name = unixSocketSendQueueLenCollectorName,
              value = float64(socket.sendQueueLen),
              labels = unixSocketCommonCollectorLabels,
              labelValues = [socket.localAddr, $socket.localPort, socket.peerAddr, $socket.peerPort],
              timestamp = timestamp
            )
            output(
              name = unixSocketSendQueueLimitCollectorName,
              value = float64(socket.maxSendQueueLen),
              labels = unixSocketCommonCollectorLabels,
              labelValues = [socket.localAddr, $socket.localPort, socket.peerAddr, $socket.peerPort],
              timestamp = timestamp
            )
      except:
        # TODO
        discard

    discard UnixSocketSendQueueLenCollector.newCollector(
      name=unixSocketSendQueueLenCollectorName, # But this should not be per-collector, it should be per-gauge
      help="UNIX Socket send queue metrics", # But this should not be per-collector, it should be per-gauge
      labels=unixSocketCommonCollectorLabels # But what if I need different labels for each gauge? I dont, but some might.
    )

    startMetricsHttpServer()
    discard pause()

I want to loop through all the UNIX sockets in the system and report a couple of metrics (current send queue length and the send queue limit) on them (ignoring label cardinality explosions for now 😄 ).

But because the name and help text is defined at the collector level, this leads to the invalid output (according to the exposition standard):

# HELP unix_socket_send_queue_len UNIX Socket send queue length
# TYPE unix_socket_send_queue_len gauge
unix_socket_send_queue_len{local_addr="/run/dbus/system_bus_socket",local_port="33440",peer_addr="*",peer_port="35386"} 0.0
unix_socket_send_queue_limit{local_addr="/run/dbus/system_bus_socket",local_port="33440",peer_addr="*",peer_port="35386"} 212992.0
unix_socket_send_queue_len{local_addr="/run/systemd/journal/stdout",local_port="34516",peer_addr="*",peer_port="31117"} 0.0
unix_socket_send_queue_limit{local_addr="/run/systemd/journal/stdout",local_port="34516",peer_addr="*",peer_port="31117"} 212992.0
unix_socket_send_queue_len{local_addr="@/tmp/dbus-iLrUs0Z7H5",local_port="87027",peer_addr="*",peer_port="94610"} 0.0
unix_socket_send_queue_limit{local_addr="@/tmp/dbus-iLrUs0Z7H5",local_port="87027",peer_addr="*",peer_port="94610"} 212992.0

Looking at the builtin metrics, it would look like this also suffer from the same issue - the help text and name is actually defined to be generic instead of specific for each metric, which as far as my reading of the spec is incorrect.

Thanks!

@jhwbarlow
Copy link
Author

Of course, multiple collectors (one per metric) could be used to work around this, but if the call to gather the data is expensive (for example, the gathering of all the UNIX sockets), that gathering would have to happen multiple times redundantly. This also means the metrics will not reflect the exact same snapshot in time, but that is minor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant