-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Wildcard Refresh Interval #276
Conversation
Adds new field wildcard_refresh_interval to the logging files receiver in the ops-agent config.
9325aa4
to
b95a9dd
Compare
@@ -0,0 +1 @@ | |||
cannot unmarshal uint64 into Go struct field UnifiedConfig.Logging of type time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can reuse some similar validation / error message customization that collection_interval
uses? In theory, the same type of validations should apply to both, except the minimum is different.
References:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been trying to figure this out but haven't come up with a solution. That spot in config.go
only seems to be called if it fails a validation, but this error is failing during the unmarshalling step. It looks like we also don't handle it for invalid port numbers, since those fail at the unmarshalling step as well. Reference: https://github.com/GoogleCloudPlatform/ops-agent/blob/master/confgenerator/testdata/invalid/linux/logging-receiver_tcp_type_invalid_parameter_listen_port_is_not_an_int/golden_error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about something like
ops-agent/confgenerator/config.go
Line 359 in 8677126
CollectionInterval string `yaml:"collection_interval" validate:"required,duration=10s"` // time.Duration format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was only done as a band-aid fix for the fact that the go-yaml
library couldn't unmarshal the time.Duration
type. Now that it can (see goccy/go-yaml#246) we'd rather use that than a string with a custom validator.
The reason we didn't run into this problem with CollectionInterval
is because it will unmarshal into a string, but fails on our duration
validator. (FWIW this also resulted in an unhelpful error message of "did not meet minimum duration").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the duration
validator was written before the YAML parser could unmarshal directly into a time.Duration
. I think it's totally fine to have the normal unmarshal error if there is garbage in the input config, so don't worry too much about what error you get.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. Do I get it right that we are basically choosing between two options:
Option 1. Use a custom validator like duration
.
Pros:
- Better user experience as the error message is much clearer (Sample errors in https://github.com/GoogleCloudPlatform/ops-agent/blob/master/confgenerator/testdata/invalid/linux/metrics-receiver_invalid_parameter_collection_interval_below_minimum/golden_error and https://github.com/GoogleCloudPlatform/ops-agent/blob/master/confgenerator/testdata/invalid/linux/metrics-receiver_invalid_parameter_malformed_collection_interval/golden_error)
Cons
- We need to maintain a custom validator
duration
.
Option 2. Use the upstream time.Duration
validator
Pros
- No need to maintain a custom validator for
duration
parsing.
Cons
- Degraded user experience as the error message does not point out which line of the configuration is problematic with a concise reason. Users need to guess based on the internal config struct names.
Any other tradeoffs I missed?
If this is the main pros and cons, option 2 makes sense if we do get the pros (aka we can get rid of the maintenance of a custom validator). Otherwise it feels like we get degraded user experience but not much gains. Also for consistency and long-term maintenance, we should keep things consistent for similar types (aka between collection_interval
and wildcard_refresh_interval
.)
BTW we do eventually want to fix all the raw unmarshal error messages to make them more actionable and self descriptive. We haven't tackled them because the cost is relatively high, but it's not the ideal state that we'd like to stick with in the long term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't about validation; it's about whether the config struct contains a time.Duration
or a string
.
Actually I don't know why it does't have a better error message in option 2; the YAML parser is the thing generating the error message, so it certainly has enough info to emit the line and column number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get terribly far when I researched this, but what I managed to find was that if the library fails to assert a string on it, it returns the typeError
which is the less helpful message we see here. If we put a different type of garbage input that does assert a string, we get the marginally more helpful time: invalid duration "asdfjk34802934809"
which is still not ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also for consistency and long-term maintenance, we should keep things consistent for similar types (aka between collection_interval and wildcard_refresh_interval.)
FWIW I have a branch off this one that changes CollectionInterval
to a time.Duration. I was planning to open a bug and PR for it depending what we decided on this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. A better unmarshal error message that emits the line and column number probably requires some upstream go-yaml
changes. Sounds like it's a limitation at that level.
A follow up PR to change CollectionInterval
sounds perfect. I'm ready to LGTM this one.
Note that we are now in a rare state of code chill / freeze, so feature PRs like this one needs to be held from merging for the time being. More details in the internal team email alias conversation.
@@ -0,0 +1 @@ | |||
cannot unmarshal uint64 into Go struct field UnifiedConfig.Logging of type time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the duration
validator was written before the YAML parser could unmarshal directly into a time.Duration
. I think it's totally fine to have the normal unmarshal error if there is garbage in the input config, so don't worry too much about what error you get.
Use a different strategy for the validator that is more scalable than the original `whole_time_denomination` implementation; take a time.Duration and see if it is a multiple of some other duration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, modulo a follow up PR to change collection_interval
so that the two are consistent for long-term maintenance.
Adds new field wildcard_refresh_interval to the logging files receiver
in the ops-agent config.