-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inputs.gnmi: Authentication Broken starting in 1.29.2 #15236
Comments
FYI, this is NOT my production docker-compose.yml file. This is only for debug / release testing. I use the |
Hi, Here is the diff between those versions. There were no changes made to GNMI code, only a single spelling update to a comment.
This can be due to a number of things, even outside of Telegraf. For example, if the date and time are not correctly set in your container. It can also be due to the TLS protocol mismatch between the client and server. In terms of the 44 commits in that version of Telegraf, there are 6 commits that might stand out to me, 1 dep and 5 linters. The 5 linter PRs, none changed anything in GNMI or to anything it imports or uses. That leaves one dependency:
Please try to reproduce this outside of your container first. We want to eliminate anything that changes to the container or docker environment as a possibility. If you are still able to reproduce, then we may can start bisecting between the versions. Thanks! |
The other potential place is the @whizkidTRW it would also be useful to know which TLS version the device is speaking... |
@powersj, the containers were recreated within minutes of each other and I just confirmed the date/time is correct. I did download the 1.29.1, 1.29.2, & 1.30.2 binaries and the behavior is the same. @srebhan, I confirmed the Ciena box is running TLS v1.2. I also went on to test against our Cisco IOS-XR boxes, and those are fine, so this is limited to the Ciena's, which doesn't really surprise me . . . Bear in mind, we have little control over the certificates these boxes are using due to the NMS putting them on the box for its own management purposes and they are self-signed, hence
|
If I have counted correctly these are the first few commits we will want to try out:
If that works, then we need to go later
If that fails, then we need to earlier
I have put up #15240 which includes everything up to the 22nd commit. Could you please try that and let us know the result. Artifacts will be attached in ~25mins. If you are comfortable with git and building Telegraf you could do the gitbisect yourself, build the version and keep trying a bit faster, but this shouldn't be too bad. The 22nd commit omits the protobuf update.
Thank you for taking the time to confirm. fwiw when the containers are created may have no bearing on if the time is set correctly. Keep in mind that we make changes to the underlying containers between versions, and DockerHub, because they are official images are also making updates to the images for underlying security updates to the image. So actually verifying is still important. Thanks again! |
Yes sir, I'm happy to test. Understood on the containers, will keep running it directly outside of docker for now as this is all just local to my machine right now, so easily managed. I've not ever done gitbisect, but I can follow any instructions if you want to provide them. I'll keep an eye out for the artifacts on your PR and download as soon as they're available. Thanks for jumping on this! |
Looks like the first artifacts are up: Those are from the 22nd commit. I have some other branches building now as well. |
Done. That worked perfectly fine:
|
Not what I expected at all 😱 I was already preparing some other PRs that dealt with the earlier dependency updates, but I'll go close those now... That result narrows it down to the 22 commits after that. 3 of those commits are related to the release, like build #, change log, so really 19. Of the 19, there are dependabot 5 dependency updates and then I went through the remaining 14 and found 01f12c2 also updates the version of gRPC! Next steps: First, let's have you try #15246, which is from the 33rd commit. It is right before the gRPC update. If that works, then I think the next commit probably breaks you as all the other commits after this are either to individual plugins or unrelated doc updates. If that fails, then we will need to start going through those final 5 dependabot updates. Expect a new artifact in 30mins, assuming tests pass. |
Sounds good! I'm heading out of town for the weekend but can likely do some tests remotely late tonight. I'll post back an update as soon as I can. |
Feel free to wait to Monday and enjoy your weekend! I'm calling it shortly as well :) |
Had a few free minutes and grabbed the artifacts, still good at this point:
|
ok! That probably means the grpc library was the cause. For Monday, here is another PR that adds that commit: If that fails (and I sort of hope it does) then we need to look into what changed with the grpc library, could be and probably is an upstream issue. If that works, then I'll be very, very confused as to what is left to check. Thanks! |
Yep, that's it! Fails with that version. Went back one step to the previous version immediately after with the exact same config file just to verify and it was still good at that point, so yes, it must be the grpc library:
|
@powersj and @whizkidTRW: The GRPC library disables insecure ciphers by default starting from v1.60.0. The device reports
see here which is insecure as per In PR #15256 I allow to pass the accepted ciphers via tls_cipher_suites = ["TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA"] Please let me know if this fixes the issue! |
@srebhan, that worked perfectly!!! And I did confirm it works for both Ciena that was broken and continues to work for Cisco.
|
Thanks a lot for testing @whizkidTRW! |
Thanks to both of you for addressing it so quickly! |
Relevant telegraf.conf
Logs from Telegraf
System info
Telegraf 1.29.2, MacOs 14.4.1, Docker 26.0.0
Docker
Steps to reproduce
Expected behavior
Proper subscription is authenticated and able to subscribe to data
Actual behavior
Authentication handshake fails
Additional info
Exact same config was working in
1.29.1
and prior. I was attempting to upgrade my system from1.28.2
to the latest,1.30.2
, when I experienced this behavior. Backing down in versions, I identified it works in1.29.1
and fails starting in1.29.2
. No configuration was changed between1.29.1
working and>1.29.2
failing.The text was updated successfully, but these errors were encountered: