-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The distributed coordinate subsystem can be infected by floating point NaN values. #3023
Comments
Thanks for the report @rboyer. We will take a look through the math again and try to mitigate any other NaN-producing spots, and likely do the first two suggestions. Will think about the third. |
As we do not currently use coordinates for anything explicitly, my immediate plans are to reconfigure all of our clusters to set I poked around but without something akin to the 3rd point above, I can't figure out a good way to purge all of the
|
More debugging breadcrumbs: We scraped any |
Sorry for the hassle on this one. If you stopped your cluster and deleted all those lines from the |
@rboyer if you could share any of your |
FYI /ui/#/dc/services/ uses coordinates and breaks completely on |
consul version
for ServerServer:
0.7.1
consul info
for ServerServer:
Operating system and Environment details
Description of the Issue (and unexpected/desired result)
Somehow
NaN
values worked their way through the distributed coordinate subsystem in one of our consul clusters. It first was detected by a failure to JSON serialize some*coordinate.Coordinate
structs for the following endpoints:/v1/agent/self
/v1/coordinate/nodes
In an effort to see which fields in the server's representation of the coordinates were
NaN
I recompiled consul with the following patch and then hit the/v1/coordinate/nodes
endpoint to get the consul server to dump the coordinate database to stdout:This yielded the amusing results below:
It's everywhere! Our current working theory is that the NaN is infectious when any of the vector math, scaling, and the rest of the vivaldi math process even one NaN value. This is pretty awful.
Amusingly you can serialize a
NaN
in msgpack but not json.Initial ideas for how to prevent this from happening in the future:
NaN
(and possiblyInf
) and reject the coordinate before it gets internalizedNaN
values and reset the coordinates to shake out bad floating point numbersReproduction steps
It's unclear where the original NaN value came from given how many obvious places in the code already safeguard against this.
The text was updated successfully, but these errors were encountered: