-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(inputs.tacacs): Add tacacs plugin for simple tacacs auth response time monitoring #12747
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for contributing this plugin @Hr0bar! I have some comments, mostly the same as for the radius one. Can you please also add a unit-test?!
Ill try to check on this likely over the coming weeks, likely most stuff that was done for the radius plugin will be applicable here as well. Good thing the tacacs GO package seem to have some basic testing server implementation for tests and https://hub.docker.com/r/dchidell/docker-tacacs seem to be hopefully usable for integration tests. |
Hi @Hr0bar, Any further progress on this PR recently? Thanks! |
Not yet, I shall have more time after April 22 (getting married, so not
much free time because of preparations)
…On Tue, Apr 4, 2023 at 4:16 PM Joshua Powers ***@***.***> wrote:
Hi @Hr0bar <https://github.com/Hr0bar>,
Any further progress on this PR recently?
Thanks!
—
Reply to this email directly, view it on GitHub
<#12747 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABTNGZQTLMJ62UBTPX75XEDW7QULLANCNFSM6AAAAAAVJNNOIE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Thanks
Congrats! I know how busy that time can be :) edit: I am going to mark this as a draft until you do another push and ping us to review again. |
Applied most of the comments. However: Tried using the status code returned, but it was very useless as the status depends on the current tacacs auth sequence. On initial start packet, return code X may mean something completely different than return code X for send username packet, then X may mean something else on send password packet etc. So without the secondary info at what stage the status code was returned, its useless. Anyway, its meant to monitor SUCCESSFUL authentications response time, so decided to simply return an error in all other cases. Let me know if that makes sense ? Deleted responsetime, kept only responsetime_ms, same as in the radius plugin. "Maybe also check for other empty required fields?" - mandatory fields are uncommented in the sample conf. If not set, I think these should catch it:
The other non mandatory are now defaults:
I could not move the client initialization to Init(), as the client itself contains the secret, it would be unnecessary stored in memory without defer releasesecret, or it gets cleaned from memory and secret is then broken for connections. Maybe there is a solution, but seems like unnecessary struggle (tried once, definitely can be avoided, just did not spent time on it, let me know if its a must to have it in Init). |
BTW the "Git repo dirty. Please run "make docs" and push the updated README. Failing CI." in "ci/circleci: test-go-linux" is likely caused by me changing README.MD -> README.md filename in GIT some time ago in this PR. Ran make docs in the past after that, but still pops up |
Hey @Hr0bar, regarding
running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Hr0bar thanks for the update. Just a few minor comments...
Thanks, Ill check these comments when Ill be back from travelling ~mid
June. Or feel free to adjust it until then.
…On Wed, May 17, 2023 at 3:44 PM Sven Rebhan ***@***.***> wrote:
***@***.**** commented on this pull request.
@Hr0bar <https://github.com/Hr0bar> thanks for the update. Just a few
minor comments...
------------------------------
In plugins/inputs/tacacs/README.md
<#12747 (comment)>
:
> @@ -0,0 +1,49 @@
+# Tacacs Input Plugin
+
+The Tacacs plugin collects successful tacacs authentication response times.
Is there an implementation/software we can or should reference?
------------------------------
In plugins/inputs/tacacs/tacacs.go
<#12747 (comment)>
:
> + "sync"
+ "time"
+
+ "github.com/nwaples/tacplus"
+
+ "github.com/influxdata/telegraf"
+ "github.com/influxdata/telegraf/config"
+ "github.com/influxdata/telegraf/plugins/inputs"
+)
+
+type Tacacs struct {
+ Servers []string `toml:"servers"`
+ Username config.Secret `toml:"username"`
+ Password config.Secret `toml:"password"`
+ Secret config.Secret `toml:"secret"`
+ RemAddr string `toml:"request_ip"`
Can we please use a more speaking name for the variable!?
⬇️ Suggested change
- RemAddr string `toml:"request_ip"`
+ RequestAddr string `toml:"request_ip"`
------------------------------
In plugins/inputs/tacacs/tacacs.go
<#12747 (comment)>
:
> +
+ "github.com/influxdata/telegraf"
+ "github.com/influxdata/telegraf/config"
+ "github.com/influxdata/telegraf/plugins/inputs"
+)
+
+type Tacacs struct {
+ Servers []string `toml:"servers"`
+ Username config.Secret `toml:"username"`
+ Password config.Secret `toml:"password"`
+ Secret config.Secret `toml:"secret"`
+ RemAddr string `toml:"request_ip"`
+ ResponseTimeout config.Duration `toml:"response_timeout"`
+ Log telegraf.Logger `toml:"-"`
+ clients []tacplus.Client
+ testAuthStart tacplus.AuthenStart
Why does it have a test prefix? Can we simply name it authStart or auth
or similar?
------------------------------
In plugins/inputs/tacacs/tacacs.go
<#12747 (comment)>
:
> + if net.ParseIP(t.RemAddr) == nil {
+ return fmt.Errorf("Invalid ip address provided for request_ip: %s", t.RemAddr)
+ }
Your sample.conf says the default is 127.0.0.1 and I think that should be
set here (instead of in init()) if empty. Furthermore, errors should
start with lowercase letters in Golang...
------------------------------
In plugins/inputs/tacacs/tacacs.go
<#12747 (comment)>
:
> + for index, serverValue := range t.Servers {
+ t.clients = append(t.clients, tacplus.Client{})
+ t.clients[index].Addr = serverValue
+ t.clients[index].ConnConfig = tacplus.ConnConfig{}
+ }
Maybe
⬇️ Suggested change
- for index, serverValue := range t.Servers {
- t.clients = append(t.clients, tacplus.Client{})
- t.clients[index].Addr = serverValue
- t.clients[index].ConnConfig = tacplus.ConnConfig{}
- }
+ t.clients = make([]tacplug.Client, 0, len(t.Servers))
+ for _, server := range t.Servers {
+ t.clients = append(t.clients, tacplus.Client{
+ Addr: server,
+ ConnConfig: tacplus.ConnConfig{},
+ })
+ }
------------------------------
In plugins/inputs/tacacs/tacacs.go
<#12747 (comment)>
:
> + // Check the returned status
+ if reply.Status != tacplus.AuthenStatusGetUser {
+ return fmt.Errorf("error on new tacacs authentication start request to %s : Unexpected response code %d", client.Addr, reply.Status)
+ }
I don't remember what we did on radius, but it might make sense to have
this in the metric to make it discoverable to monitoring....
------------------------------
In plugins/inputs/tacacs/tacacs_test.go
<#12747 (comment)>
:
> + "github.com/influxdata/telegraf/config"
+ "github.com/influxdata/telegraf/testutil"
+ "github.com/nwaples/tacplus"
+ "github.com/stretchr/testify/require"
+ "github.com/testcontainers/testcontainers-go/wait"
⬇️ Suggested change
- "github.com/influxdata/telegraf/config"
- "github.com/influxdata/telegraf/testutil"
- "github.com/nwaples/tacplus"
- "github.com/stretchr/testify/require"
- "github.com/testcontainers/testcontainers-go/wait"
+ "github.com/nwaples/tacplus"
+ "github.com/stretchr/testify/require"
+ "github.com/testcontainers/testcontainers-go/wait"
+
+ "github.com/influxdata/telegraf/config"
+ "github.com/influxdata/telegraf/testutil"
------------------------------
In plugins/inputs/tacacs/tacacs.go
<#12747 (comment)>
:
> + ResponseTimeout config.Duration `toml:"response_timeout"`
+ Log telegraf.Logger `toml:"-"`
+}
+
+//go:embed sample.conf
+var sampleConfig string
+
+func (t *Tacacs) SampleConfig() string {
+ return sampleConfig
+}
+
+func (t *Tacacs) Init() error {
+ if len(t.Servers) == 0 {
+ t.Servers = []string{"127.0.0.1"}
+ }
+ return nil
I do agree, it might be worth checking the options in Init() to fail
early and loudly!
—
Reply to this email directly, view it on GitHub
<#12747 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABTNGZU6KUCQPZGW4YF7TYTXGTI5JANCNFSM6AAAAAAVJNNOIE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Ok. It's a pitty as we will likely miss v1.27.0 which is planned for June 12th... :-( |
@Hr0bar any chance you work on this PR? |
I added all the changes, BUT the one regarding returning the return status in case of failures likely needs more thought. I added the response_code tag, and returning timeout value for response time, same as in the other plugin. That is all OK, but it works for wrong password, wrong username etc, but it doesnt cover for example wrong tacacs secret. In that case client.SendAuthenStart returns hard error, and no response code is returned. I think it should be consistent for the user, so as I commented on that suggestion already, what about creating the response codes completely custom and human readable for telegraf purposes? Instead of the hard errors there were previously. |
So either:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Hr0bar for the update! The code looks good to me and to me the current response-code strategy also seems good. IMO we error out if the sending reports an error. If the sending succeeds but the server reports some unexpected state we pass that information on to the user...
The only comment I have is that currently we do not report the (context/connection) timeout to the user. I think you should make an exception there when checking the error codes and also generate a metric for this case. This way an operator can see that Telegraf cannot reach the server and check what is going on...
Let me know, if the tests are readable/by telegraf standards enough, or if additional refactoring is needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, looks good. Just a few minor things..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thanks for contributing this plugin @Hr0bar!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question inline to help me understand the reported value for response time.
plugins/inputs/tacacs/tacacs.go
Outdated
if !errors.Is(err, context.DeadlineExceeded) { | ||
return fmt.Errorf("error on tacacs authentication continue username request to %s : %w", client.Addr, err) | ||
} | ||
fields["responsetime_ms"] = time.Duration(t.ResponseTimeout).Milliseconds() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize this code path means there was a timeout, but why is the response time set to the timeout value and not the real time?
Same question below, where it would be even more important to see how long the cumulative request time took.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is a valid point, especially for users who would set the timeout to zero. In other cases, it will be close to the configured timeout value anyway. With zero timeout configured it can theoretically also timeout from other sources than the configured context deadline (from some network functions called down the line). Maybe in addition to context.DealineExceeded error we should also allow errors.Is(err, os.ErrDeadlineExceeded) for these cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the used tacacs package seems to be correctly setting the timeout to the deeper used network calls, I could not deterministically say that it will always return only context.DeadlineExceeded error on timeout.
It seemd os.IsTimeout() would be better here, as it returns True for both context.DeadlineExceeded and os.ErrDeadlineExceeded and possibly other timeout time errors, but after testing it it was too lenient/broad and it was returning True even for immediate dial tcp or dns errors etc so will keep
if !errors.Is(err, context.DeadlineExceeded) && !errors.Is(err, os.ErrDeadlineExceeded)
The timeout time returned is now real value.
Download PR build artifacts for linux_amd64.tar.gz, darwin_amd64.tar.gz, and windows_amd64.zip. 📦 Click here to get additional PR build artifactsArtifact URLs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the effort and work on this!
Opening PR after discussion in this Issue: #12573 and separating the tacacs plugin into a separate PR as discussed int #12736