-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[aggregator] Do not require aggregator ID to be concat with port and add debug logs #2012
Conversation
…t and add error logs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm w nits
const ( | ||
// HostIDInstanceIDType specifies to just use the host ID | ||
// as the instance ID for lookup in the placement. | ||
HostIDInstanceIDType InstanceIDType = "host_id" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: better to use iota
here?
|
||
// InstanceIDConfiguration is the instance ID configuration. | ||
type InstanceIDConfiguration struct { | ||
// Type specifies how to construct the instance ID that is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe best not to use Type
as the name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing, will use InstanceIDType.
// in the placement (legacy, how the instance ID used to be constructed | ||
// which imposed the strange requirement that the instance ID | ||
// in the topology used to require the port concat'd with the | ||
// host ID). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: may read better as
// in the placement.
// NB: this is a legacy instance ID type...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG, updated.
for _, instance := range placementInstances { | ||
placementInstanceIDs = append(placementInstanceIDs, instance.ID()) | ||
} | ||
agg.logger.Error("could not find aggregator instance ID in placement", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meganit: not sure if you're adding newlines after each }
as above, if so should add one here too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
for _, instance := range placementInstances { | ||
placementInstanceIDs = append(placementInstanceIDs, instance.ID()) | ||
} | ||
agg.logger.Error("could not find aggregator instance ID in placement", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe also to explicitly call out that instanceId in placement has to match actual instance/pod hostname?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good I'll say "aggregator instance ID must be in placement: missing from placement". I can't really say pod hostname here since some operators are not on kubernetes or using an environment variable with an arbitrary string for the instance ID.
Codecov Report
@@ Coverage Diff @@
## master #2012 +/- ##
=========================================
- Coverage 63.6% 56.7% -6.9%
=========================================
Files 1124 1094 -30
Lines 106753 103658 -3095
=========================================
- Hits 67917 58861 -9056
- Misses 34488 40704 +6216
+ Partials 4348 4093 -255
Continue to review full report at Codecov.
|
@@ -122,6 +125,13 @@ func (mgr *placementManager) Open() error { | |||
return nil | |||
} | |||
|
|||
func (mgr *placementManager) InstanceID() string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: may need a comment here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is an unexported type (placementManager
) it's not required thankfully (only comments on the exported interface definitions is required).
*t = defaultInstanceIDType | ||
return nil | ||
} | ||
strs := make([]string, 0, len(validInstanceIDTypes)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe better to make validInstanceIDTypes a []string{HostIDInstanceIDType.String(), HostIDPortInstanceIDType.String()}
to avoid allocing a new array here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's during config time I think this isn't necessary, I don't necessarily want to change how this is done everywhere else right at this time if that's ok, heh.
zap.String("currInstanceID", agg.placementManager.InstanceID()), | ||
zap.Strings("placementInstanceIDs", placementInstanceIDs)) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metagnit: is it possible to replace line 327 with this and remove the bool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible, I just wanted to only emit it though if we didn't hit some other error condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had another look, I'll move it and remove the bool, I'm adding some context to the message so it makes sense to emit the log from that location "no shards assigned since not found with current instance ID"
What this PR does / why we need it:
This fixes a common issue users experience setting up the aggregator, firstly that it requires the port to be concat'd with the host ID of the aggregator, and secondly that when it is not found in the placement there is no feedback to the operator.
For the first, a backwards compatible changes is made to allow just using the instance ID as the host ID instead of always concat'ing the port.
For the second, an error log is emitted every time a placement is processed that does not contain the local aggregator ID, since that is indeed likely an operational misconfiguration. This error log also contains the current instance IDs in the placement and the locally configured instance ID to help debug the problem.
Special notes for your reviewer:
Does this PR introduce a user-facing and/or backwards incompatible change?:
Does this PR require updating code package or user-facing documentation?: