Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[heartbeat] Set IDs explicitly #9697

Merged
merged 8 commits into from
Jan 7, 2019

Conversation

andrewvc
Copy link
Contributor

@andrewvc andrewvc commented Dec 20, 2018

This patch lets you set the monitor.id field from a monitor's config.

It also redefines the specific meaning of heartbeat's monitor.id field. It also tightens up the definition of the monitor.name field in events.

Fixes #8862

@andrewvc andrewvc added in progress Pull request is currently in progress. Heartbeat Team:obs-ds-hosted-services Label for the Observability Hosted Services team labels Dec 20, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/uptime

heartbeat/monitors/util.go Outdated Show resolved Hide resolved
heartbeat/monitors/job.go Outdated Show resolved Hide resolved
heartbeat/monitors/job.go Outdated Show resolved Hide resolved
@andrewvc andrewvc changed the title IDs not can be set explicitly [heartbeat] Set IDs explicitly Dec 20, 2018
@andrewvc
Copy link
Contributor Author

andrewvc commented Dec 20, 2018

@ruflin @justinkambic @dov0211

This PR is very WIP, but this is the place to redefine the id field as we've discussed in #8862 . Working on it made me realize we needed to tighten up the definition and behavior of the name field as well. I've modified the docs as below. Do you think the changes articulated there are good? The code is very rough (and has broken tests) but I'd like feedback on the functionality changes here:

==== `id`

A unique identifier for this configuration. This should not change with edits to the monitor configuration
regardless of changes to any config fields. Examples: `uploader-service`, `http://example.net`, `us-west-loadbalancer`.

When querying against indexed monitor data this is the field you will be aggregating with. Appears in the
<<exported-fields,exported fields>> as `monitor.id`.

If you do not set this explicitly the monitor's config will be hashed and a generated value used. This value will
change with any options change to this monitor making aggregations over time between changes impossible. For this reason
it is recommended that you set this manually.

[float]
[[monitor-name]]
==== `name`

Optional human readable name for this monitor. This value appears in the <<exported-fields,exported fields>>
as `monitor.name`.

@jk
Copy link

jk commented Dec 20, 2018

@andrewvc You mentioned the wrong nick name

@andrewvc
Copy link
Contributor Author

sorry @jk thanks for the note. I meant to notify @justinkambic

@andrewvc andrewvc force-pushed the improved-monitor-ids branch from 1841e4f to cc9cc65 Compare December 20, 2018 20:26
heartbeat/monitors/monitor.go Show resolved Hide resolved
@@ -93,16 +94,23 @@ func TLSChecks(chainIndex, certIndex int, certificate *x509.Certificate) mapval.
})
}

func HashMonitorIDCheck() mapval.Validator {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported function HashMonitorIDCheck should have comment or be unexported

@andrewvc andrewvc force-pushed the improved-monitor-ids branch 2 times, most recently from 832eb8c to 15635d0 Compare December 20, 2018 23:17
@andrewvc andrewvc requested a review from a team as a code owner December 21, 2018 01:59
@andrewvc andrewvc force-pushed the improved-monitor-ids branch 4 times, most recently from 73056e6 to 01cdabd Compare December 21, 2018 02:11
@ruflin
Copy link
Contributor

ruflin commented Dec 21, 2018

Will we also have a UUID for each monitor? This UUID would change on any config change and also be unique per Heartbeat instance. Even if 5 heartbeat instances have the same monitor.id, we can still tell the events come from 5 different monitors, even if the same monitor is defined 5 times in the same heartbeat instance.

I wonder if what we have above as monitor.id should be monitor.name. AFAIK that would be closer to what APM agents do with service.name which is used for the grouping. Do we really need monitor.name as described above or if user defined monitor.id is already human readable? Perhaps this should be monitor.description?

@andrewvc andrewvc removed the in progress Pull request is currently in progress. label Dec 21, 2018
@andrewvc
Copy link
Contributor Author

@ruflin I don't think we really need a UUID, because if we have a unique ID the UUID is just the tuple {beat.id,monitor.id}.

The monitor ID should only be optionallybe human readable, because anything inside of it could change. For instance, if a GUI is powering the monitor, even the domain name could change. I'm picturing how these fields would work if a GUI was powering their changes. Usually, in web apps you have a synthetic PK for the id, and then all the other fields are changeable. So, in that form you'd never see the ID, we'd just generate a UUID for it.

The reason we shouldn't generate the UUID in the beat is that you might have multiple geo locations where each geo location runs the same monitor.id, just from a different area. In this case you'd want to identify all the locations in which that monitor is running. To do so you'd aggregate on {monitor.id, host.geo.name}.

heartbeat/docs/heartbeat-options.asciidoc Outdated Show resolved Hide resolved
heartbeat/docs/heartbeat-options.asciidoc Show resolved Hide resolved
return []monitors.Job{monitors.WithJobId(jobID, monitors.WithFields(fields, job))}, nil
}

func jobID(typ, jobType, host string, ports []uint16) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see there was some value in the old id. I wonder if this is still something we can reconstruct on the UI side as all the data should be available there if needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly. I don't think we're missing it inside of heartbeat itself.

@@ -51,6 +58,11 @@ func (f *NamedJob) Run(event *beat.Event) ([]Job, error) {
// AnonJob represents a job with no assigned name, backed by just a function.
type AnonJob func(event *beat.Event) ([]Job, error)

// ID returns a unique ID for this Job.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is not really accurate ;-)

// Set the correct monitor.id for all jobs
var idWrapper func(job Job) Job
if m.id != "" {
// Ensure we don't have duplicate IDs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the users tries to have the same config twice, he gets an error? In MB / FB if we have the exact same config twice I think we just ignore it (not 100% sure).

Functionality wise I was thinking internally we always use the hash as the job id for the registry and only enrich it / overwrite it with the user generated id when the event is sent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if a user is doing this it is a bug they'd want to know about. There's no legitimate reason to do this no? Given the convo we had about the ID being unique to a given config on a hb instance does this comment still stand?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right that there is no reason a user wants to have it twice. We should probably also fail in the other Beats if this happens.

One reason to skip the error but still report is to make sure all other configs still get started even if one is wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I agree with you that this shouldn't be fatal. Rather than fix this here, I think the more general issue is that errors during monitor creation should not be fatal to heartbeat, rather just to the monitor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is fixed in b158297

So, the error is still here, but if you use this in a dynamically loaded monitor it is not fatal.

It is still fatal if you define the conflicting monitor in heartbeat.yml but I think that's correct, since that file cannot be fixed without a restart.

if err != nil {
return 0, err
}
hash, err := hashstructure.Hash(unpacked, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to compare this code with the hashing code in libbeat reloading. I remember there were some problems in the beginning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm, we use this same methodology for autodiscover, so I think it's correct. I did verify it myself testing it against different configs and watching the values change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect. Just wanted to double check.

heartbeat/monitors/pluginconf.go Show resolved Hide resolved
@andrewvc
Copy link
Contributor Author

OK, so the great thing about this change, and about moving both name and ID up to the monitor level is that it dramatically simplifies a ton of the code. It's actually good size refactor, but well worth it. We just need to pass around functions instead of Job objects, and continues the work in #9258 to make things even simpler than before. I'm not going to finish this refactor today however, but it'll be a half day's work once I get back after vacation next week.

This is going to make the code much more readable!

@andrewvc andrewvc force-pushed the improved-monitor-ids branch from 78c6eae to 0d07370 Compare December 22, 2018 01:46
@andrewvc
Copy link
Contributor Author

OK, I actually did a first pass of the big refactor in 0d07370

@dov0211
Copy link

dov0211 commented Dec 23, 2018

I think that in other competitive product, Monitor Name is a mandatory and unique field which powering the UI, Monitor ID generated automatically. Not saying we should change this design
but we should understand what will the outcome on the UI looks like, would we want the UI to reflect the monitor name (i think yes) having said that, how will it look like if this field is missing? (maybe talking the URL of the monitor instead)?

@ruflin
Copy link
Contributor

ruflin commented Dec 24, 2018

@dov0211 If name is missing, I think we should show either the id or reconstruct what we had before as name, see https://github.com/elastic/beats/pull/9697/files#r243614588. All the data for it should be available and it can happen on the UI side.

@andrewvc
Copy link
Contributor Author

I agree that we can enforce name uniqueness on the UI side.

@andrewvc
Copy link
Contributor Author

andrewvc commented Jan 2, 2019

@dov0211 and @ruflin are there any blockers here? As far as I'm concerned most of those constraints can and should happen in the Kibana Uptime app.

The beats code can only enforce constraints local to itself, so we'll need to figure out a solution there anyway.

I think the current level of constraint (id unique on the local beat) is fine, but, as I mentioned the ultimate source needs to be in the datastore which has a global view.

@andrewvc andrewvc mentioned this pull request Jan 2, 2019
7 tasks
func (aj AnonJob) Run(event *beat.Event) ([]Job, error) {
return aj(event)
}
type Job func(event *beat.Event) ([]Job, error)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 for getting rid of the many helpers.

@andrewvc andrewvc force-pushed the improved-monitor-ids branch from 03e67a4 to b158297 Compare January 3, 2019 01:39
@andrewvc
Copy link
Contributor Author

andrewvc commented Jan 3, 2019

@ruflin with the latest push I think all conflicts are resolved.

This patch lets you set the `monitor.id` field from a monitor's config.

It also redefines the specific meaning of heartbeat's `monitor.id` field. It also tightens up the definition of the `monitor.name` field in events.

Fixes elastic#8862
@andrewvc andrewvc force-pushed the improved-monitor-ids branch from b158297 to 680ea77 Compare January 3, 2019 02:36
@andrewvc
Copy link
Contributor Author

andrewvc commented Jan 3, 2019

@ruflin just added a changelog and addressed all existing review comments.

@andrewvc andrewvc requested review from a team as code owners January 3, 2019 19:18
@andrewvc
Copy link
Contributor Author

andrewvc commented Jan 3, 2019

OK all tests looking good again.

@@ -205,6 +206,26 @@ var IsNonEmptyString = Is("is a non-empty string", func(path path, v interface{}
return ValidResult(path)
})

// IsStringMatching checks whether a value matches the given regexp.
func IsStringMatching(regexp *regexp.Regexp) IsDef {
return Is("is string matching regexp", func(path path, v interface{}) *Results {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about printing the regex in the message as well. Like is string matching '%v'.

Have you considered to pass the regex as string to IsStringMatching and compile it internally?

For matching we also have the package github.com/elastic/beats/libbeat/common/match. This one runs some optimizations on the regex speeding up a few use-cases. As this function is mostly used for testing I do not favor one over the other tough.

We could also change the signature to func IsStringMatching(matcher interface{ MatchString(string) bool }) IsDef (although this is valid syntax, better give the type a name).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the string here is actually not the main one you read in the output, the one on line 221 just a bit below here is the main one a user would see. That one does include the regexp source. We don't interpolate those descriptor strings for the other Is definitions in this file, so it'd be weird to do it here.

I'm not super opinionated on it, but it from a UX perspective it's a bit redundant.

Cool stuff about common/match. I agree that speed her isn't important and keeping the code simple is probably better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a side note, I've considered that maybe we should just not even have this initial string description since we have a better one that's emitted with the actual error. I think that's out of scope for this PR however.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a side note, I've considered that maybe we should just not even have this initial string description since we have a better one that's emitted with the actual error. I think that's out of scope for this PR however.

SGTM.

Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few nit picks. Ready to be merged, all the nits can be addressed in a follow up PR.

}

func checkMonitorConfig(config *common.Config, registrar *pluginsReg, allowWatches bool) error {
_, err := newMonitor(config, registrar, nil, nil, allowWatches)
m, err := newMonitor(config, registrar, nil, nil, allowWatches)
m.Stop() // Stop the monitor to free up the ID from uniqueness checks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to see a comment on this line as I was confused at first why Stop is called here.

@@ -246,6 +308,9 @@ func (m *Monitor) Stop() {
m.internalsMtx.Lock()
defer m.internalsMtx.Unlock()

// Free up the monitor ID for reuse
uniqueMonitorIDs.Delete(m.id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it only be freed up after all jobs and tasks are stopped? Perhaps use defer?

"monitor": common.MapStr{
"id": id,
"name": name,
"type": typ},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: } should probably be on a new line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find , weird go fmt didn't catch that.

@@ -27,6 +27,7 @@ https://github.com/elastic/beats/compare/v7.0.0-alpha2...master[Check the HEAD d
*Heartbeat*

- Remove monitor generator script that was rarely used. {pull}9648[9648]
- monitor IDs are now configurable. Auto generated monitor IDs now use a different formula based on a hash of their config values. If you wish to have continuity with the old format of monitor IDs you'll need to set the `id` property explicitly. {pull}9697[9697]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also the change to the name field be mentioned here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need, its behavior hasn't really changed. What has changed is our semantic interpretation of it for the UI, which no one is using yet, so I think that's not worth discussing.

@andrewvc andrewvc merged commit 6dd4508 into elastic:master Jan 7, 2019
andrewvc added a commit to andrewvc/beats that referenced this pull request Jan 7, 2019
This cleans up a couple small points missed in elastic#9697
andrewvc added a commit that referenced this pull request Jan 7, 2019
This cleans up a couple small points missed in #9697
andrewvc added a commit that referenced this pull request Jan 18, 2019
Timing broke in 6.x in #9697 when mode: all is used. Some events will wind up with an empty timestamp, which in go is the year 0001. This isn't slated for release till 6.7 so there's no huge rush here.

This PR seeks to fix not only the bug, but the structural problems in the code that led to the bug. The key issue here is that the timing code should have been centralized for all beats, and invoked in one spot, vs. needing to be invoked in each different dialer type.

That is not a small refactor, but this PR has been able to achieve this and as a result make the code much easier to reason about, and thus safer.

A key thing here was handling the recursive nature of heartbeat tasks in a more uniform way. Previously some functions that modified jobs were recursive, other times they were not. Now, recursive application of modifications only happens in one spot the wrapper.WrapAll and wrapper.Wrap functions (the first one takes a list of jobs, while the second takes a single Job). A new type JobWrapper, can be passed into these. These Job wrapping functions don't handle any recursive logic themselves.

This PR also breaks up some of this utility code, moving it into separate packages. The monitor package became too unwieldy with the additions here.

More tests have been added. There is more work to do to clean stuff up, but there's more than enough here for a single PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Heartbeat review Team:obs-ds-hosted-services Label for the Observability Hosted Services team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants