-
Notifications
You must be signed in to change notification settings - Fork 582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync dependencies to Redis #10290
base: master
Are you sure you want to change the base?
Sync dependencies to Redis #10290
Conversation
95a27d3
to
17ba7c9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm somewhat confused by the DependencyGroup
class as it doesn't really map to the mental model I had from our discussions on that topic.
So in my understanding, a DependencyGroup
would represent a set a set of checkables that are used as parents in dependency config objects combined with the attributes from that dependency object that affect how the availability of that dependency is determined, i.e. ignore_soft_states
, period
, and states
. For dependencies without a redundancy group, that information is all that's needed to determine if all dependency objects that share the parent and these attribute mark all their children as unreachable. With a redundancy group, you have to look at all the parents from the redundancy group with the three aforementioned additional attributes. So that would be how to determine what creates a DependencyGroup
object for redundancy groups.
For dependencies without a redundancy group, this grouping provides no value in itself, the dependency objects can be considered individually. There are two reasons why we might instantiate such trivial groups explicitly nonetheless: for one, it may allow simpler code by being able to treat both cases consistently, but more importantly, there was a request from Johannes that if two children depend on the same parent in such a way that the state of these dependencies is always the same (i.e. the three aforementioned attributes are identical), then the different graph edges should refer to the same shared state. These groups may be used for this deduplication as well.
Consider the following example (P = parent checkable, RG = redundancy group as represented in the generated graph, C = child checkable):
graph BT;
p1((P1));
p2((P2));
p3((P3));
c1((C1));
c2((C2));
c3((C3));
c4((C4));
c5((C5));
rg1(RG1);
c1-->rg1;
c2-->rg1;
c3-->rg1;
rg1-->p1;
rg1-->p2;
c4-->p3;
c5-->p3;
Here I'd expect the creation of the following two DependencyGroups
(...
refers to the three magic attributes attached to the parent in the corresponding dependency objects):
{(P1, ...), (P2, ...)}
: This basically represents RG1{(P3, ...)}
: This is a if there was an imaginary second redundancy with only one parent, P3.
17ba7c9
to
a1175d1
Compare
763d77c
to
bc82c04
Compare
This does not work in this state! Trying to refresh Dependency if a Host or Service being member of this Dependency has a state change.
bc82c04
to
2889822
Compare
11c6498
to
78a0a29
Compare
The previous limit (32) doesn't seem to make sense, and appears to be some random number. So, this limit is set to 256 to match the limit in IsReachable().
4e5e2c8
to
9c678be
Compare
lib/icinga/checkable-dependency.cpp
Outdated
* Note: Re-registering the very same dependency groups to global registry will crash the process with superior error | ||
* messages that aren't easy to debug, so make sure to never call this method more than once for the same Checkable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder what that "superior" (that has to be sarcasm) error message is. I'd expect it to end up there:
icinga2/lib/icinga/dependency-group.cpp
Line 214 in 9c678be
VERIFY(this != dest); // Prevent from doing something stupid, i.e. deadlocking ourselves. |
Should anything else actually use m_DependencyGroups
before PushDependencyGroupsToRegistry()
was called? Like I'm wondering whether there is a particular reason to use m_DependencyGroups
to store both registered and pending/not-yet-registered groups? Or is this just an optimization to avoid adding another class member (that will then remain unused after the Checkable
was fully initialized)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder what that "superior" (that has to be sarcasm) error message is. I'd expect it to end up there:
Yes, it will end up there and crashes with something like this:
[2025-01-31 16:38:01 +0100] information/ConfigItem: Triggering Start signal for config items
/Users/yhabteab/Workspace/icinga2/lib/icinga/dependency-group.cpp:214: assertion failed: this != dest
Caught SIGABRT.
Current time: 2025-01-31 16:38:02 +0100
How can you easily trace this back to that being called multiple times? I just wanted to add some hints so that you won't be surprised if it mysteriously crashes, but if you're just concerned about this comment, I can remove it.
Should anything else actually use
m_DependencyGroups
beforePushDependencyGroupsToRegistry()
was called?
If by anything else you mean the AddDependency()
method, then yes, because nobody else has to deal with the checkable dependency groups before it is actually activated.
Like I'm wondering whether there is a particular reason to use
m_DependencyGroups
to store both registered and pending/not-yet-registered groups?
As I said above, nobody ever needs to access the checkable dependency groups before the PushDependencyGroupsToRegistry()
method is called apart from the AddDependency()
method, so I don't need to keep a separate list of pending/non-pending groups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have now added a simple boolean flag to eliminate such an unnecessary crash, i.e. the method can now be called multiple times either intentionally or not and will not terminate the process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but if you're just concerned about this comment, I can remove it.
I don't have a problem with a comment stating that this should be called only once per object. I just think that "superior error messages that aren't easy to debug" is a somewhat strange wording to convey this. I'm not sure, maybe there's something lost in translation?
I have now added a simple boolean flag to eliminate such an unnecessary crash, i.e. the method can now be called multiple times either intentionally or not and will not terminate the process.
Is that the only purpose of that flag? If so, you could also consider doing something like if (this == dest) { return; /* moving members to ourselves is a no-op */ }
instead of VERIFY(this != dest);
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, maybe there's something lost in translation?
Translation 😅? No, I didn't translate that! I intentionally used those phrases because the unittests were failing with some strange error messages, that I didn't initially know why. So I added this little hint, but it turned out that the tests were failing due to the now non-existent uninitialised std::unordered_set
hash and equal callbacks (which were used as function pointers and not functors).
Is that the only purpose of that flag?
No. It’s also used within the AddDependency()
method.
If so, you could also consider doing something like
if (this == dest) { return; /* moving members to ourselves is a no-op */ }
instead ofVERIFY(this != dest);
.
That's the thing, I don't want to silently ignore such stupid usages here and like to fail hard all the time, as such things should only happen due to unnoticed bugs and never intentionally.
lib/icinga/checkable-dependency.cpp
Outdated
* | ||
* @return DependencyGroup::Ptr The dependency group that has been modified. | ||
*/ | ||
DependencyGroup::Ptr Checkable::AddDependency(const Dependency::Ptr& dependency, bool refreshGlobalRegistry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand this correctly, refreshGlobalRegistry = true
means that everything is done immediately and refreshGlobalRegistry = false
means to delay parts of the registration until PushDependencyGroupsToRegistry()
is called. Shouldn't the Checkable
class be able to determine internally what's necessary? It's the class that calls PushDependencyGroupsToRegistry()
in the end and in one place, refreshGlobalRegistry
is set based on what a method of Checkable
calls already:
icinga2/lib/icinga/dependency.cpp
Line 209 in 9c678be
auto modifiedGroup(m_Child->AddDependency(this, m_Child->IsActive())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the
Checkable
class be able to determine internally what's necessary?
I have completely missed that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
lib/icinga/checkable-dependency.cpp
Outdated
for (const Checkable::Ptr& checkable : children) { | ||
std::set<Checkable::Ptr> cChildren = checkable->GetChildren(); | ||
|
||
if (!cChildren.empty()) { | ||
if (auto cChildren(checkable->GetChildren()); !cChildren.empty()) { | ||
GetAllChildrenInternal(cChildren, level + 1); | ||
localChildren.insert(cChildren.begin(), cChildren.end()); | ||
} | ||
|
||
localChildren.insert(checkable); | ||
if (level != 0) { // Recursion level 0 is the initiator, so checkable is already in the set. | ||
localChildren.insert(checkable); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about these changes. They look like they don't really change anything. The first change just uses C++17 syntax and the second change only skips the insert of an element that should already be in the set, so the insert does nothing already?
If changing something here, I'd rather give the function a complete makeover: currently it seems like it would visit checkables multiple times which could lead to its runtime complexity exploding for adverse dependency graphs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
9c678be
to
e083b31
Compare
Otherwise, it would require too much code changes to properly handle redundancy group runtime modification in Icinga DB for no real benefit.
Co-Authored-By: Julian Brost <[email protected]> Since @jbrost suggested simplifying the code for (un)registering the dependencies and even created a prototype of how he envisioned it (see 20faf1c), I have now redone everthing on that basis. Even if the end result doesn't quite match Julian's commit, I think it makes sense not to implement it 1-1 as Julian suggested for a variaty reasons. Most of the code where I personally think a clarifying comment is needed, I have provided detailed ones justifying why something was done that way.
The previous wasn't per-se wrong, but it was way too inefficient. With this commit each and every Checkable is going to be visited only once, and we won't traverse the same Checkable's children multiple times somewhere in the dependency chain.
dc5da1b
to
215057f
Compare
DependencyGroup::Ptr Checkable::AddDependency(const Dependency::Ptr& dependency) | ||
{ | ||
std::lock_guard lock(m_DependencyMutex); | ||
DependencyGroup::Ptr newGroup(new DependencyGroup(dependency->GetRedundancyGroup(), dependency)); | ||
if (auto it(m_DependencyGroups.find(newGroup)); it == m_DependencyGroups.end()) { | ||
// If the current Checkable is already started (all local dependency groups are already pushed | ||
// to the global registry), we need to directly forward newGroup to the global register. | ||
m_DependencyGroups.emplace(m_DependencyGroupsPushedToRegistry ? DependencyGroup::Register(newGroup) : newGroup); | ||
} else if (!m_DependencyGroupsPushedToRegistry) { | ||
// If we're not going to refresh the global registry, we just need to add the dependency to the existing group. | ||
// Meaning, the dependency group itself isn't registered globally yet, so we don't need to re-register it. | ||
(*it)->AddMember(dependency); | ||
return *it; | ||
} else { | ||
if (auto existingGroup(*it); existingGroup->HasIdenticalMember(dependency)) { | ||
// There's already an identical member in the group and this is likely an exact duplicate of it, | ||
// so it won't change the identity of the group after registration, i.e. regardless whether we're | ||
// supposed to refresh the global registry or not it's identity will remain the same. | ||
existingGroup->AddMember(dependency); | ||
} else { | ||
// We're going to either replace the existing group with "newGroup" or merge the two groups together. | ||
// Either way, it's identity will change, so we need to decouple it from the current Checkable. | ||
m_DependencyGroups.erase(it); | ||
|
||
// We need to unregister the existing group from the global registry if we're the only member of it, | ||
// as it's hash might change after adding the new dependency to it down below, and we want to re-register | ||
// it afterwards. This way, we'll also be able to eliminate the possibility of having two identical groups | ||
// in the registry that might occur due to the registration of the new dependency object below. | ||
if (DependencyGroup::Unregister(existingGroup, this)) { | ||
// The current Checkable is the only member of the group, so nothing to move around, just | ||
// add the _duplicate_ dependency to the existing group. Duplicate in the sense that it's | ||
// not identical to any of the existing members but similar enough to be part of the same | ||
// group, i.e. same parent, maybe different period, state filter, etc. | ||
existingGroup->AddMember(dependency); | ||
m_DependencyGroups.emplace(DependencyGroup::Register(existingGroup)); | ||
} else { | ||
// Obviously, the current Checkable is not the only member of the existing group, and it's going to | ||
// have more members than the other child Checkables in that group after adding the new dependency | ||
// to it. So, we need to move all the members this Checkable depends on to newGroup and leave the | ||
// existing group as-is, i.e. it's identity won't change afterwards. | ||
for (auto& member : existingGroup->GetMembers(this)) { | ||
existingGroup->RemoveMember(member); | ||
newGroup->AddMember(member); | ||
} | ||
m_DependencyGroups.emplace(DependencyGroup::Register(newGroup)); | ||
} | ||
|
||
// In both of the above cases, the identity of the existing group is going to probably change, | ||
// so we'll need to clean up all the database entries/relations referencing its old identity. | ||
return existingGroup; | ||
} | ||
} | ||
|
||
return nullptr; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That method has quite some complexity in it and some it feels it would better fit into the DependencyGroup
class rather than Checkable
. I'm not yet entirely sure what would be the best way here, hence I'm thinking out loud, let me know what you think: I'd consider adding a method soemthing like this to DependencyGroup
:
DependencyGroup::Ptr Extend(const Dependency::Ptr&);
That would do one of two things:
- If the dependency group is not (yet) shared between multiple child checkables, it extends the group, changing its parent set and updating the registry accordingly (which might merge the group into another one).
- Otherwise, i.e. if the dependency group is already shared, it takes all
Dependency
objects belonging to the child plus the new one and moves them to another group, either a newly created one or an already existing one for the new set of parents.
In both cases, it returns a pointer to the group the child is now registered to. In case this is a different pointer than the old group, this means the child no longer belongs to that group and should update its references to it, similar to what you already do in PushDependencyGroupsToRegistry()
.
That might even allow to remove the delayed registration again, as that's pretty much the same operation as Checkable::AddDependency()
currently does when m_DependencyGroupsPushedToRegistry = false
, but with the registry already being aware of what happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case this is a different pointer than the old group, this means the child no longer belongs to that group and should update its references to it
Wouldn't that be exactly the same behaviour as the previous implementation before I changed all that to match your prototype?
That might even allow to remove the delayed registration again, as that's pretty much the same operation as
Checkable::AddDependency()
currently does whenm_DependencyGroupsPushedToRegistry = false
, but with the registry already being aware of what happens.
No, that's not the same thing! Wasn't all this (un)register reimplementation done so that we don't have to deal with shared and unshared dependency groups at startup and instead populate the groups only once per checkable? If we pass every single AddDependency()
call to the global registry, it will pretty much end up having the same state as before (remember the white board discussion we had last time about this), so I don't quite see the benefit of this. Apart from that, is this...
I'd consider adding a method soemthing like this to
DependencyGroup
:DependencyGroup::Ptr Extend(const Dependency::Ptr&);
... supposed to be implemented as a static member? Because if not, you probably won't want to touch anything related to the global registry within that method, as they are cross-referenced (the existing static members calls some non-static member methods of a particular group), we'll end up with some mysterious deadlocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case this is a different pointer than the old group, this means the child no longer belongs to that group and should update its references to it
Wouldn't that be exactly the same behaviour as the previous implementation before I changed all that to match your prototype?
Maybe, I'm not sure (might also be because I didn't fully understand the old code). I wouldn't rule out that we might even end up with something similar to what that old RefreshRegistry()
function did, but then hopefully structured in way that makes it easier to understand. On the other hand, you said that changing all of this gave you a big performance boost and I wouldn't expect much of a slowdown from what I'm currently suggesting.
My prototype had a significant difference to the current state of the PR: the checkable still tracked all its dependencies individually which allowed it to regroup everything by redundancy group as needed. The code currently in the PR moves all these references into a DependencyGroup
object, so some kind of interaction with that class is necessary to retrieve the other members of a redundancy group that gets an additional member.
Another random though would be a unregister method that unregisters a specific child from a dependency group, returning the corresponding dependency objects. The newly added dependency objects can than be added to the dependencies, forming a new group that will then be registered.
That might even allow to remove the delayed registration again, as that's pretty much the same operation as
Checkable::AddDependency()
currently does whenm_DependencyGroupsPushedToRegistry = false
, but with the registry already being aware of what happens.No, that's not the same thing! Wasn't all this (un)register reimplementation done so that we don't have to deal with shared and unshared dependency groups at startup and instead populate the groups only once per checkable? If we pass every single
AddDependency()
call to the global registry, it will pretty much end up having the same state as before (remember the white board discussion we had last time about this), so I don't quite see the benefit of this.
That boils down to the question how much overhead the incremental registration operations introduce. In case of my prototype, loading each dependency object regrouped all the dependencies of that checkable by the redundancy group attribute and updated all its dependency group objects with the registry. That would have hugely benefited from doing that only once per checkable. When there's a more efficient incremental operation, that might not be necessary, but locking could even make enough of a difference on its own (i.e. that you'd only have to lock the registry once per checkable, not once per dependency).
Apart from that, is this...
I'd consider adding a method soemthing like this to
DependencyGroup
:DependencyGroup::Ptr Extend(const Dependency::Ptr&);... supposed to be implemented as a static member? Because if not, you probably won't want to touch anything related to the global registry within that method, as they are cross-referenced (the existing static members calls some non-static member methods of a particular group), we'll end up with some mysterious deadlocks.
I though of whether this should be a static method or not, but did not explicitly write something about it in my comment as my thinking was that it could be either, whatever works better.
if (seenChildren.find(checkable) == seenChildren.end()) { | ||
seenChildren.emplace(checkable); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
insert()
already returns whether it actually inserted a new value, so both of this can be done with a single call.
SSIA 🤪! Just kidding! This PR synchronises all the required points from Icinga/icingadb#347 (comment) to Redis. However, I'm not going to explain every implementation detail of the PR here, but it can be roughly understood as follows.
host/service.affected_children
andstate.affects_children
attributes to Redis as described in Track effect of an object on dependent children #10158.no_user_modify
flag to theredundancy_group
attribute in thedependency.ti
file to prevent any runtime alteration of its value, as there is now too much logic and functionality depending on this value and changing it at runtime would have a disastrous end.DependencyGroup
to easily group and manage identical dependencies of any checkable in one place. Yes, this is also used to group non-redundant dependencies, but such a group is entirely unique for each checkable and is never referenced by other checkables.DependencyGroup
at any given time as described in Let redundancy groups not just fail #10190.failedDependency
parameter of theCheckable::IsReachable()
method. It is obsolete because none of the callers make use of it, it just adds unnecessary complexity to the method for no reason.Checkable::IsReachable()
method and utilises theDependencyGroup::GetState()
method introduced above.activation_priority
of theDependency
object is set to-10
(just like for downtime objects). This way, the dependency objects will always get activated before their child/parent Checkables.fixes #10158
fixes #10190
fixes #10227
fixes #10014
fixes #10143