-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[heartbeat] Set IDs explicitly #9697
Conversation
Pinging @elastic/uptime |
@ruflin @justinkambic @dov0211 This PR is very WIP, but this is the place to redefine the ==== `id`
A unique identifier for this configuration. This should not change with edits to the monitor configuration
regardless of changes to any config fields. Examples: `uploader-service`, `http://example.net`, `us-west-loadbalancer`.
When querying against indexed monitor data this is the field you will be aggregating with. Appears in the
<<exported-fields,exported fields>> as `monitor.id`.
If you do not set this explicitly the monitor's config will be hashed and a generated value used. This value will
change with any options change to this monitor making aggregations over time between changes impossible. For this reason
it is recommended that you set this manually.
[float]
[[monitor-name]]
==== `name`
Optional human readable name for this monitor. This value appears in the <<exported-fields,exported fields>>
as `monitor.name`. |
@andrewvc You mentioned the wrong nick name |
sorry @jk thanks for the note. I meant to notify @justinkambic |
1841e4f
to
cc9cc65
Compare
heartbeat/hbtest/hbtestutil.go
Outdated
@@ -93,16 +94,23 @@ func TLSChecks(chainIndex, certIndex int, certificate *x509.Certificate) mapval. | |||
}) | |||
} | |||
|
|||
func HashMonitorIDCheck() mapval.Validator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exported function HashMonitorIDCheck should have comment or be unexported
832eb8c
to
15635d0
Compare
73056e6
to
01cdabd
Compare
Will we also have a UUID for each monitor? This UUID would change on any config change and also be unique per Heartbeat instance. Even if 5 heartbeat instances have the same I wonder if what we have above as |
@ruflin I don't think we really need a UUID, because if we have a unique ID the UUID is just the tuple The monitor ID should only be optionallybe human readable, because anything inside of it could change. For instance, if a GUI is powering the monitor, even the domain name could change. I'm picturing how these fields would work if a GUI was powering their changes. Usually, in web apps you have a synthetic PK for the The reason we shouldn't generate the UUID in the beat is that you might have multiple geo locations where each geo location runs the same |
return []monitors.Job{monitors.WithJobId(jobID, monitors.WithFields(fields, job))}, nil | ||
} | ||
|
||
func jobID(typ, jobType, host string, ports []uint16) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see there was some value in the old id. I wonder if this is still something we can reconstruct on the UI side as all the data should be available there if needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly. I don't think we're missing it inside of heartbeat itself.
heartbeat/monitors/job.go
Outdated
@@ -51,6 +58,11 @@ func (f *NamedJob) Run(event *beat.Event) ([]Job, error) { | |||
// AnonJob represents a job with no assigned name, backed by just a function. | |||
type AnonJob func(event *beat.Event) ([]Job, error) | |||
|
|||
// ID returns a unique ID for this Job. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment is not really accurate ;-)
// Set the correct monitor.id for all jobs | ||
var idWrapper func(job Job) Job | ||
if m.id != "" { | ||
// Ensure we don't have duplicate IDs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the users tries to have the same config twice, he gets an error? In MB / FB if we have the exact same config twice I think we just ignore it (not 100% sure).
Functionality wise I was thinking internally we always use the hash
as the job id for the registry and only enrich it / overwrite it with the user generated id when the event is sent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if a user is doing this it is a bug they'd want to know about. There's no legitimate reason to do this no? Given the convo we had about the ID being unique to a given config on a hb instance does this comment still stand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right that there is no reason a user wants to have it twice. We should probably also fail in the other Beats if this happens.
One reason to skip the error but still report is to make sure all other configs still get started even if one is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, I agree with you that this shouldn't be fatal. Rather than fix this here, I think the more general issue is that errors during monitor creation should not be fatal to heartbeat, rather just to the monitor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this is fixed in b158297
So, the error is still here, but if you use this in a dynamically loaded monitor it is not fatal.
It is still fatal if you define the conflicting monitor in heartbeat.yml
but I think that's correct, since that file cannot be fixed without a restart.
if err != nil { | ||
return 0, err | ||
} | ||
hash, err := hashstructure.Hash(unpacked, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure to compare this code with the hashing code in libbeat reloading. I remember there were some problems in the beginning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm, we use this same methodology for autodiscover, so I think it's correct. I did verify it myself testing it against different configs and watching the values change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect. Just wanted to double check.
OK, so the great thing about this change, and about moving both name and ID up to the monitor level is that it dramatically simplifies a ton of the code. It's actually good size refactor, but well worth it. We just need to pass around functions instead of This is going to make the code much more readable! |
78c6eae
to
0d07370
Compare
OK, I actually did a first pass of the big refactor in 0d07370 |
I think that in other competitive product, Monitor Name is a mandatory and unique field which powering the UI, Monitor ID generated automatically. Not saying we should change this design |
@dov0211 If |
I agree that we can enforce name uniqueness on the UI side. |
@dov0211 and @ruflin are there any blockers here? As far as I'm concerned most of those constraints can and should happen in the Kibana Uptime app. The beats code can only enforce constraints local to itself, so we'll need to figure out a solution there anyway. I think the current level of constraint (id unique on the local beat) is fine, but, as I mentioned the ultimate source needs to be in the datastore which has a global view. |
func (aj AnonJob) Run(event *beat.Event) ([]Job, error) { | ||
return aj(event) | ||
} | ||
type Job func(event *beat.Event) ([]Job, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 for getting rid of the many helpers.
03e67a4
to
b158297
Compare
@ruflin with the latest push I think all conflicts are resolved. |
This patch lets you set the `monitor.id` field from a monitor's config. It also redefines the specific meaning of heartbeat's `monitor.id` field. It also tightens up the definition of the `monitor.name` field in events. Fixes elastic#8862
b158297
to
680ea77
Compare
@ruflin just added a changelog and addressed all existing review comments. |
OK all tests looking good again. |
@@ -205,6 +206,26 @@ var IsNonEmptyString = Is("is a non-empty string", func(path path, v interface{} | |||
return ValidResult(path) | |||
}) | |||
|
|||
// IsStringMatching checks whether a value matches the given regexp. | |||
func IsStringMatching(regexp *regexp.Regexp) IsDef { | |||
return Is("is string matching regexp", func(path path, v interface{}) *Results { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about printing the regex in the message as well. Like is string matching '%v'
.
Have you considered to pass the regex as string to IsStringMatching
and compile it internally?
For matching we also have the package github.com/elastic/beats/libbeat/common/match
. This one runs some optimizations on the regex speeding up a few use-cases. As this function is mostly used for testing I do not favor one over the other tough.
We could also change the signature to func IsStringMatching(matcher interface{ MatchString(string) bool }) IsDef
(although this is valid syntax, better give the type a name).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the string here is actually not the main one you read in the output, the one on line 221 just a bit below here is the main one a user would see. That one does include the regexp source. We don't interpolate those descriptor strings for the other Is
definitions in this file, so it'd be weird to do it here.
I'm not super opinionated on it, but it from a UX perspective it's a bit redundant.
Cool stuff about common/match
. I agree that speed her isn't important and keeping the code simple is probably better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a side note, I've considered that maybe we should just not even have this initial string description since we have a better one that's emitted with the actual error. I think that's out of scope for this PR however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a side note, I've considered that maybe we should just not even have this initial string description since we have a better one that's emitted with the actual error. I think that's out of scope for this PR however.
SGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few nit picks. Ready to be merged, all the nits can be addressed in a follow up PR.
} | ||
|
||
func checkMonitorConfig(config *common.Config, registrar *pluginsReg, allowWatches bool) error { | ||
_, err := newMonitor(config, registrar, nil, nil, allowWatches) | ||
m, err := newMonitor(config, registrar, nil, nil, allowWatches) | ||
m.Stop() // Stop the monitor to free up the ID from uniqueness checks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad to see a comment on this line as I was confused at first why Stop
is called here.
@@ -246,6 +308,9 @@ func (m *Monitor) Stop() { | |||
m.internalsMtx.Lock() | |||
defer m.internalsMtx.Unlock() | |||
|
|||
// Free up the monitor ID for reuse | |||
uniqueMonitorIDs.Delete(m.id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it only be freed up after all jobs and tasks are stopped? Perhaps use defer?
"monitor": common.MapStr{ | ||
"id": id, | ||
"name": name, | ||
"type": typ}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: }
should probably be on a new line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good find , weird go fmt didn't catch that.
@@ -27,6 +27,7 @@ https://github.com/elastic/beats/compare/v7.0.0-alpha2...master[Check the HEAD d | |||
*Heartbeat* | |||
|
|||
- Remove monitor generator script that was rarely used. {pull}9648[9648] | |||
- monitor IDs are now configurable. Auto generated monitor IDs now use a different formula based on a hash of their config values. If you wish to have continuity with the old format of monitor IDs you'll need to set the `id` property explicitly. {pull}9697[9697] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also the change to the name
field be mentioned here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need, its behavior hasn't really changed. What has changed is our semantic interpretation of it for the UI, which no one is using yet, so I think that's not worth discussing.
This cleans up a couple small points missed in elastic#9697
This cleans up a couple small points missed in #9697
Timing broke in 6.x in #9697 when mode: all is used. Some events will wind up with an empty timestamp, which in go is the year 0001. This isn't slated for release till 6.7 so there's no huge rush here. This PR seeks to fix not only the bug, but the structural problems in the code that led to the bug. The key issue here is that the timing code should have been centralized for all beats, and invoked in one spot, vs. needing to be invoked in each different dialer type. That is not a small refactor, but this PR has been able to achieve this and as a result make the code much easier to reason about, and thus safer. A key thing here was handling the recursive nature of heartbeat tasks in a more uniform way. Previously some functions that modified jobs were recursive, other times they were not. Now, recursive application of modifications only happens in one spot the wrapper.WrapAll and wrapper.Wrap functions (the first one takes a list of jobs, while the second takes a single Job). A new type JobWrapper, can be passed into these. These Job wrapping functions don't handle any recursive logic themselves. This PR also breaks up some of this utility code, moving it into separate packages. The monitor package became too unwieldy with the additions here. More tests have been added. There is more work to do to clean stuff up, but there's more than enough here for a single PR.
This patch lets you set the
monitor.id
field from a monitor's config.It also redefines the specific meaning of heartbeat's
monitor.id
field. It also tightens up the definition of themonitor.name
field in events.Fixes #8862