-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CC-7044] Start HCP manager as part of link creation #20312
Conversation
f8d0b98
to
c71d5a8
Compare
c71d5a8
to
048440b
Compare
36b4dab
to
2a656de
Compare
b166213
to
8f9f407
Compare
agent/hcp/manager.go
Outdated
if m.isRunning() { | ||
m.logger.Trace("HCP manager already started") | ||
return nil | ||
} | ||
m.setRunning(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a possibility of a race here, even with the lock. Two goroutines can call m.isRunning()
sequentially and both can see false
, then both would proceed to call the rest of the code. I think we'd have to lock the lock, then do compare and set in one atomic operation like:
func (m *Manager) setRunningIfNotRunning() bool {
m.runLock.Lock()
defer m.runLock.Unlock()
if m.running { return false }
m.running = true
return true
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, great point, will make the change!
8f9f407
to
2912114
Compare
existingCfg := r.hcpManager.GetCloudConfig() | ||
newCfg := config.CloudConfig{ | ||
ResourceID: link.ResourceId, | ||
ClientID: link.ClientId, | ||
ClientSecret: link.ClientSecret, | ||
} | ||
cfg := config.Merge(existingCfg, newCfg) | ||
hcpClient, err := r.hcpClientFn(cfg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense, but just FYI, we'll have a merge conflict with Nick E's change: https://github.com/hashicorp/consul/pull/20257/files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the heads up, will keep this in mind!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still looking through this but have to stop now for a meeting. Will resume later
agent/consul/server.go
Outdated
StatusFn: s.hcpServerStatus(flat), | ||
Logger: logger.Named("hcp_manager"), | ||
SCADAProvider: flat.HCP.Provider, | ||
TelemetryProvider: flat.HCP.TelemetryProvider, | ||
ManagementTokenUpserterFn: func(name, secretId string) error { | ||
if s.IsLeader() { | ||
if s.config.ACLsEnabled && s.IsLeader() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I've seen the reasoning behind this change, can we clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was missed by me when I originally added this — if ACLs are not enabled, upsertManagementToken
will error, so we'd end up with unwanted error logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
existingCfg := r.hcpManager.GetCloudConfig() | ||
newCfg := config.CloudConfig{ | ||
ResourceID: link.ResourceId, | ||
ClientID: link.ClientId, | ||
ClientSecret: link.ClientSecret, | ||
} | ||
cfg := config.Merge(existingCfg, newCfg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we clarify why we need to merge instead of changing the existingCfg
values directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a comment about this! The reasoning is:
- Some values can be set at startup like what HCP API endpoint to use. We don't currently have a way in the linking API to set this, but support for this will be added later, so I thought a merge made more sense in the long term.
- The NodeID and the NodeName of the cluster is set on the cloud config when Consul starts, so we want to continue to use these values. I considered some different strategies around this like passing these two values as dependencies to the controller, but I figured I'll want to implement a merge anyway because of (1).
Link eventually will be creating a token, so require acl:write.
Start as part of link creation rather than always starting. Update the HCP manager with values from the link before starting as well.
The HCP metrics sink will always be enabled, so the length of sinks will always be greater than zero. This also means that we will also always default to prefixing metrics with the hostname, which is what our documentation states is the expected behavior anyway.
7ec7b41
to
5719b08
Compare
This all makes sense and the code looks good. Just to clarify the breaking change, this is for all metrics or just HCP related? |
@loshz It'll be for all metrics (of a specific type) and not just ones that are exported to HCP. For example, this table of the metrics that Consul collects has a column |
Makes sense! I'm wondering if we need to make this change more widely known in the org, just to get another perspective. Wondering if it's also worth us fixing this in a separate PR? Not sure who to tag, but it might worth us bringing it up in Slack? WDYT? |
b177812
to
8660a03
Compare
Description
Depends on #20306
This PR starts the HCP manger when an HCP link resource is created instead of when Consul is started. This allows the linking process to be initiated via the HCP link API.
These changes are best viewed commit-by-commit. A summary of the changes are:
acl:write
is also now required in addition tooperator:write
This PR also introduces a breaking change, though that change actually fixes Consul's behavior to match what we have documented. The agent telemetry docs and the agent configuration docs for
telemetry.disable_hostname
both state that by default, the hostname of the Consul agent should prefix gauge-type metrics. However, before this PR, if there were no additional metrics sinks enabled, the hostname prefixing was disabled. Now that we're always enabling the HCP metrics sink, we will now always prefix the gauge-metrics by default.Testing & Reproduction steps
Linking via API:
cloud
configurationOther variations tested:
Links
PR Checklist