Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON marshalling of GoCollector metrics in v1.12.1 results in NaN error #981

Closed
tsandall opened this issue Feb 1, 2022 · 7 comments · Fixed by #1049
Closed

JSON marshalling of GoCollector metrics in v1.12.1 results in NaN error #981

tsandall opened this issue Feb 1, 2022 · 7 comments · Fixed by #1049
Labels

Comments

@tsandall
Copy link
Contributor

tsandall commented Feb 1, 2022

We recently upgraded from v1.12.0 to v1.12.1 and noticed that the GoCollector metrics are generating JSON marshalling errors because of NaN values. The metrics in question are go_gc_pauses_seconds_total and go_sched_latencies_seconds. This is on Go 1.17 if that helps.

Here's a small program that reproduces the issue.

$ go run x.go
err: json: unsupported value: NaN

x.go:

package main

import (
	"encoding/json"
	"fmt"
	"log"

	"github.com/prometheus/client_golang/prometheus"
	"github.com/prometheus/client_golang/prometheus/collectors"
)

func main() {
	registry := prometheus.NewRegistry()
	registry.MustRegister(collectors.NewGoCollector())
	result, err := registry.Gather()
	if err != nil {
		log.Fatal(err)
	}

	_, err = json.Marshal(result)
	fmt.Println("err:", err)
}

go.mod:

module github.com/tsandall/prometheus-go-collector-bug

go 1.17

require github.com/prometheus/client_golang v1.12.1

require (
	github.com/beorn7/perks v1.0.1 // indirect
	github.com/cespare/xxhash/v2 v2.1.2 // indirect
	github.com/golang/protobuf v1.5.2 // indirect
	github.com/matttproud/golang_protobuf_extensions v1.0.1 // indirect
	github.com/prometheus/client_model v0.2.0 // indirect
	github.com/prometheus/common v0.32.1 // indirect
	github.com/prometheus/procfs v0.7.3 // indirect
	golang.org/x/sys v0.0.0-20220114195835-da31bd327af9 // indirect
	google.golang.org/protobuf v1.26.0 // indirect
)
@beorn7
Copy link
Member

beorn7 commented Feb 2, 2022

NaN (and +Inf and -Inf) are valid metric values in Prometheus. I'm not sure if they make sense in this particular case, but for a robust solution, the JSON marshalling needs to be fixed so that it can deal with those special floating point values (e.g. by rendering Prometheus metric values as strings, as it is done in the Prometheus HTTP query API).

In different news, I wasn't aware that anyone tries to JSON-marshal the protobuf DTOs. Not sure if that's a good idea in general…

@fabxc
Copy link
Contributor

fabxc commented Feb 12, 2022

I just ran into a similar issue due to the histogram sum being NaN. The spec is at least not very explicit about this being optional but certainly about it possibly being NaN. Is this worth clarifying? These metrics might be a first with this.

@srenatus
Copy link
Contributor

srenatus commented Mar 1, 2022

In different news, I wasn't aware that anyone tries to JSON-marshal the protobuf DTOs. Not sure if that's a good idea in general…

Thanks for pointing that out. We've done some steps to avoid using anything but jsonpb on the relevant objects, here, resolving the issue @tsandall reported initially. (We collaborate on OPA, which is where this came up recently.)

@bwplotka
Copy link
Member

Thanks all. Sounds like this issue can be closed here. Let us know if this is not indeed resolved, we will reopen. 👍🏽

@andig
Copy link

andig commented Apr 21, 2022

This is a real problem, popular monitoring solutions like InfluxDB currently choke on NaN values (see linked issue). It is very unfortunate to exploit a sofar-unused property of the spec in a bugfix release.

Would it be reasonable to ask to make this behaviour opt-in to allow for gradual upgrade path? Otherwise we'll be stuck on 1.12.0 until downstream is able to copy with NaNs.

@andig
Copy link

andig commented Apr 30, 2022

@bwplotka imho this is not resolved. It breaks current InfluxDB scraper. Could you kindly reopen?

@bwplotka bwplotka reopened this Apr 30, 2022
@bwplotka
Copy link
Member

Ack, in new release (to be done) we reverted new metrics by default, so it should be fixed for you.

Still the underlying problem stays so let's keep it open then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants