Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[aggregator] Do not require aggregator ID to be concat with port and add debug logs #2012

Merged
merged 4 commits into from
Oct 21, 2019

Conversation

robskillington
Copy link
Collaborator

What this PR does / why we need it:

This fixes a common issue users experience setting up the aggregator, firstly that it requires the port to be concat'd with the host ID of the aggregator, and secondly that when it is not found in the placement there is no feedback to the operator.

For the first, a backwards compatible changes is made to allow just using the instance ID as the host ID instead of always concat'ing the port.

For the second, an error log is emitted every time a placement is processed that does not contain the local aggregator ID, since that is indeed likely an operational misconfiguration. This error log also contains the current instance IDs in the placement and the locally configured instance ID to help debug the problem.

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:

NONE

Does this PR require updating code package or user-facing documentation?:

NONE

Copy link
Collaborator

@schallert schallert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@arnikola arnikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm w nits

const (
// HostIDInstanceIDType specifies to just use the host ID
// as the instance ID for lookup in the placement.
HostIDInstanceIDType InstanceIDType = "host_id"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: better to use iota here?


// InstanceIDConfiguration is the instance ID configuration.
type InstanceIDConfiguration struct {
// Type specifies how to construct the instance ID that is
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe best not to use Type as the name?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, will use InstanceIDType.

Comment on lines 70 to 73
// in the placement (legacy, how the instance ID used to be constructed
// which imposed the strange requirement that the instance ID
// in the topology used to require the port concat'd with the
// host ID).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: may read better as

// in the placement.
// NB: this is a legacy instance ID type...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG, updated.

for _, instance := range placementInstances {
placementInstanceIDs = append(placementInstanceIDs, instance.ID())
}
agg.logger.Error("could not find aggregator instance ID in placement",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meganit: not sure if you're adding newlines after each } as above, if so should add one here too

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

for _, instance := range placementInstances {
placementInstanceIDs = append(placementInstanceIDs, instance.ID())
}
agg.logger.Error("could not find aggregator instance ID in placement",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe also to explicitly call out that instanceId in placement has to match actual instance/pod hostname?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good I'll say "aggregator instance ID must be in placement: missing from placement". I can't really say pod hostname here since some operators are not on kubernetes or using an environment variable with an arbitrary string for the instance ID.

@codecov
Copy link

codecov bot commented Oct 21, 2019

Codecov Report

Merging #2012 into master will decrease coverage by 6.8%.
The diff coverage is 18.1%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #2012     +/-   ##
=========================================
- Coverage    63.6%    56.7%   -6.9%     
=========================================
  Files        1124     1094     -30     
  Lines      106753   103658   -3095     
=========================================
- Hits        67917    58861   -9056     
- Misses      34488    40704   +6216     
+ Partials     4348     4093    -255
Flag Coverage Δ
#aggregator 68.8% <11.4%> (-11%) ⬇️
#cluster 44.7% <ø> (-11.7%) ⬇️
#collector 47.9% <ø> (-15.8%) ⬇️
#dbnode 64.6% <100%> (-0.4%) ⬇️
#m3em 59.6% <ø> (ø) ⬆️
#m3ninx 58.7% <ø> (-2.4%) ⬇️
#m3nsch 51.1% <ø> (ø) ⬆️
#metrics 17.7% <ø> (ø) ⬆️
#msg 74.9% <ø> (+0.1%) ⬆️
#query 45.9% <ø> (-22.7%) ⬇️
#x 68% <100%> (-7.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1eb7e3f...63ba988. Read the comment docs.

@@ -122,6 +125,13 @@ func (mgr *placementManager) Open() error {
return nil
}

func (mgr *placementManager) InstanceID() string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: may need a comment here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is an unexported type (placementManager) it's not required thankfully (only comments on the exported interface definitions is required).

*t = defaultInstanceIDType
return nil
}
strs := make([]string, 0, len(validInstanceIDTypes))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better to make validInstanceIDTypes a []string{HostIDInstanceIDType.String(), HostIDPortInstanceIDType.String()} to avoid allocing a new array here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's during config time I think this isn't necessary, I don't necessarily want to change how this is done everywhere else right at this time if that's ok, heh.

zap.String("currInstanceID", agg.placementManager.InstanceID()),
zap.Strings("placementInstanceIDs", placementInstanceIDs))
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metagnit: is it possible to replace line 327 with this and remove the bool?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible, I just wanted to only emit it though if we didn't hit some other error condition.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had another look, I'll move it and remove the bool, I'm adding some context to the message so it makes sense to emit the log from that location "no shards assigned since not found with current instance ID"

@robskillington robskillington merged commit 71f7318 into master Oct 21, 2019
@justinjc justinjc deleted the r/simplify-aggregator-id-config branch November 27, 2019 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants