Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync dependencies to Redis #10290

Open
wants to merge 25 commits into
base: master
Choose a base branch
from
Open

Sync dependencies to Redis #10290

wants to merge 25 commits into from

Conversation

yhabteab
Copy link
Member

@yhabteab yhabteab commented Jan 13, 2025

SSIA 🤪! Just kidding! This PR synchronises all the required points from Icinga/icingadb#347 (comment) to Redis. However, I'm not going to explain every implementation detail of the PR here, but it can be roughly understood as follows.

  • 0af4679 and fe8169a syncs the host/service.affected_children and state.affects_children attributes to Redis as described in Track effect of an object on dependent children #10158.
  • 19862bd Adds the no_user_modify flag to the redundancy_group attribute in the dependency.ti file to prevent any runtime alteration of its value, as there is now too much logic and functionality depending on this value and changing it at runtime would have a disastrous end.
  • 9f4655b Introduces a new auxiliary class DependencyGroup to easily group and manage identical dependencies of any checkable in one place. Yes, this is also used to group non-redundant dependencies, but such a group is entirely unique for each checkable and is never referenced by other checkables.
  • 6638f36 Introduces a global shared registry for the dependency groups. In addition, here is also where all the dependency deduplication magic happens.
  • f478e5b Introduces an auxiliary method for determining the state of any DependencyGroup at any given time as described in Let redundancy groups not just fail #10190.
  • 49e6ae2 and 2ea7e59 Dumps all dependency-related information to Redis at Icinga 2 startup/reload.
  • 1fa8840 Same as the above two commits but only for dependencies runtime state updates.
  • e75d884 Processes runtime removed/created dependency objects.
  • ae269a8 Removes the obsolete failedDependency parameter of the Checkable::IsReachable() method. It is obsolete because none of the callers make use of it, it just adds unnecessary complexity to the method for no reason.
  • 90f20b8 Changes the implementation of the Checkable::IsReachable() method and utilises the DependencyGroup::GetState() method introduced above.
  • 6e7157e In order to handle runtime removed/created dependencies without additional overhead, the activation_priority of the Dependency object is set to -10 (just like for downtime objects). This way, the dependency objects will always get activated before their child/parent Checkables.

fixes #10158
fixes #10190
fixes #10227
fixes #10014
fixes #10143

Copy link
Contributor

@julianbrost julianbrost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm somewhat confused by the DependencyGroup class as it doesn't really map to the mental model I had from our discussions on that topic.

So in my understanding, a DependencyGroup would represent a set a set of checkables that are used as parents in dependency config objects combined with the attributes from that dependency object that affect how the availability of that dependency is determined, i.e. ignore_soft_states, period, and states. For dependencies without a redundancy group, that information is all that's needed to determine if all dependency objects that share the parent and these attribute mark all their children as unreachable. With a redundancy group, you have to look at all the parents from the redundancy group with the three aforementioned additional attributes. So that would be how to determine what creates a DependencyGroup object for redundancy groups.

For dependencies without a redundancy group, this grouping provides no value in itself, the dependency objects can be considered individually. There are two reasons why we might instantiate such trivial groups explicitly nonetheless: for one, it may allow simpler code by being able to treat both cases consistently, but more importantly, there was a request from Johannes that if two children depend on the same parent in such a way that the state of these dependencies is always the same (i.e. the three aforementioned attributes are identical), then the different graph edges should refer to the same shared state. These groups may be used for this deduplication as well.

Consider the following example (P = parent checkable, RG = redundancy group as represented in the generated graph, C = child checkable):

graph BT;
p1((P1));
p2((P2));
p3((P3));
c1((C1));
c2((C2));
c3((C3));
c4((C4));
c5((C5));
rg1(RG1);
c1-->rg1;
c2-->rg1;
c3-->rg1;
rg1-->p1;
rg1-->p2;
c4-->p3;
c5-->p3;
Loading

Here I'd expect the creation of the following two DependencyGroups (... refers to the three magic attributes attached to the parent in the corresponding dependency objects):

  • {(P1, ...), (P2, ...)}: This basically represents RG1
  • {(P3, ...)}: This is a if there was an imaginary second redundancy with only one parent, P3.

lib/icingadb/icingadb-objects.cpp Show resolved Hide resolved
lib/icinga/dependency.cpp Outdated Show resolved Hide resolved
lib/icinga/dependency.hpp Outdated Show resolved Hide resolved
lib/icinga/dependency.hpp Outdated Show resolved Hide resolved
lib/icinga/dependency.hpp Outdated Show resolved Hide resolved
lib/icinga/dependency.cpp Outdated Show resolved Hide resolved
@yhabteab yhabteab force-pushed the icingadb-dependencies-sync branch from 17ba7c9 to a1175d1 Compare January 20, 2025 07:17
This does not work in this state!
Trying to refresh Dependency if a Host or Service being member
of this Dependency has a state change.
@yhabteab yhabteab force-pushed the icingadb-dependencies-sync branch from bc82c04 to 2889822 Compare January 27, 2025 07:56
@yhabteab yhabteab added this to the 2.15.0 milestone Jan 27, 2025
@yhabteab yhabteab force-pushed the icingadb-dependencies-sync branch 2 times, most recently from 11c6498 to 78a0a29 Compare January 29, 2025 10:09
lib/icinga/checkable-dependency.cpp Outdated Show resolved Hide resolved
lib/icinga/checkable-dependency.cpp Outdated Show resolved Hide resolved
lib/icinga/checkable-dependency.cpp Outdated Show resolved Hide resolved
lib/icinga/checkable-dependency.cpp Outdated Show resolved Hide resolved
lib/icinga/dependency.hpp Outdated Show resolved Hide resolved
lib/icinga/checkable-dependency.cpp Outdated Show resolved Hide resolved
lib/icinga/checkable-dependency.cpp Outdated Show resolved Hide resolved
The previous limit (32) doesn't seem to make sense, and appears to be some random number.
So, this limit is set to 256 to match the limit in IsReachable().
@yhabteab yhabteab force-pushed the icingadb-dependencies-sync branch from 4e5e2c8 to 9c678be Compare January 30, 2025 15:18
@yhabteab yhabteab requested a review from julianbrost January 30, 2025 15:24
Comment on lines 26 to 27
* Note: Re-registering the very same dependency groups to global registry will crash the process with superior error
* messages that aren't easy to debug, so make sure to never call this method more than once for the same Checkable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what that "superior" (that has to be sarcasm) error message is. I'd expect it to end up there:

VERIFY(this != dest); // Prevent from doing something stupid, i.e. deadlocking ourselves.

Should anything else actually use m_DependencyGroups before PushDependencyGroupsToRegistry() was called? Like I'm wondering whether there is a particular reason to use m_DependencyGroups to store both registered and pending/not-yet-registered groups? Or is this just an optimization to avoid adding another class member (that will then remain unused after the Checkable was fully initialized)?

Copy link
Member Author

@yhabteab yhabteab Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what that "superior" (that has to be sarcasm) error message is. I'd expect it to end up there:

Yes, it will end up there and crashes with something like this:

[2025-01-31 16:38:01 +0100] information/ConfigItem: Triggering Start signal for config items
/Users/yhabteab/Workspace/icinga2/lib/icinga/dependency-group.cpp:214: assertion failed: this != dest
Caught SIGABRT.
Current time: 2025-01-31 16:38:02 +0100

How can you easily trace this back to that being called multiple times? I just wanted to add some hints so that you won't be surprised if it mysteriously crashes, but if you're just concerned about this comment, I can remove it.

Should anything else actually use m_DependencyGroups before PushDependencyGroupsToRegistry() was called?

If by anything else you mean the AddDependency() method, then yes, because nobody else has to deal with the checkable dependency groups before it is actually activated.

Like I'm wondering whether there is a particular reason to use m_DependencyGroups to store both registered and pending/not-yet-registered groups?

As I said above, nobody ever needs to access the checkable dependency groups before the PushDependencyGroupsToRegistry() method is called apart from the AddDependency() method, so I don't need to keep a separate list of pending/non-pending groups.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now added a simple boolean flag to eliminate such an unnecessary crash, i.e. the method can now be called multiple times either intentionally or not and will not terminate the process.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but if you're just concerned about this comment, I can remove it.

I don't have a problem with a comment stating that this should be called only once per object. I just think that "superior error messages that aren't easy to debug" is a somewhat strange wording to convey this. I'm not sure, maybe there's something lost in translation?

I have now added a simple boolean flag to eliminate such an unnecessary crash, i.e. the method can now be called multiple times either intentionally or not and will not terminate the process.

Is that the only purpose of that flag? If so, you could also consider doing something like if (this == dest) { return; /* moving members to ourselves is a no-op */ } instead of VERIFY(this != dest);.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, maybe there's something lost in translation?

Translation 😅? No, I didn't translate that! I intentionally used those phrases because the unittests were failing with some strange error messages, that I didn't initially know why. So I added this little hint, but it turned out that the tests were failing due to the now non-existent uninitialised std::unordered_set hash and equal callbacks (which were used as function pointers and not functors).

Is that the only purpose of that flag?

No. It’s also used within the AddDependency() method.

If so, you could also consider doing something like if (this == dest) { return; /* moving members to ourselves is a no-op */ } instead of VERIFY(this != dest);.

That's the thing, I don't want to silently ignore such stupid usages here and like to fail hard all the time, as such things should only happen due to unnoticed bugs and never intentionally.

*
* @return DependencyGroup::Ptr The dependency group that has been modified.
*/
DependencyGroup::Ptr Checkable::AddDependency(const Dependency::Ptr& dependency, bool refreshGlobalRegistry)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly, refreshGlobalRegistry = true means that everything is done immediately and refreshGlobalRegistry = false means to delay parts of the registration until PushDependencyGroupsToRegistry() is called. Shouldn't the Checkable class be able to determine internally what's necessary? It's the class that calls PushDependencyGroupsToRegistry() in the end and in one place, refreshGlobalRegistry is set based on what a method of Checkable calls already:

auto modifiedGroup(m_Child->AddDependency(this, m_Child->IsActive()));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the Checkable class be able to determine internally what's necessary?

I have completely missed that!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Comment on lines 342 to 347
for (const Checkable::Ptr& checkable : children) {
std::set<Checkable::Ptr> cChildren = checkable->GetChildren();

if (!cChildren.empty()) {
if (auto cChildren(checkable->GetChildren()); !cChildren.empty()) {
GetAllChildrenInternal(cChildren, level + 1);
localChildren.insert(cChildren.begin(), cChildren.end());
}

localChildren.insert(checkable);
if (level != 0) { // Recursion level 0 is the initiator, so checkable is already in the set.
localChildren.insert(checkable);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about these changes. They look like they don't really change anything. The first change just uses C++17 syntax and the second change only skips the insert of an element that should already be in the set, so the insert does nothing already?

If changing something here, I'd rather give the function a complete makeover: currently it seems like it would visit checkables multiple times which could lead to its runtime complexity exploding for adverse dependency graphs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@yhabteab yhabteab force-pushed the icingadb-dependencies-sync branch from 9c678be to e083b31 Compare January 31, 2025 15:51
Co-Authored-By: Julian Brost <[email protected]>

Since @jbrost suggested simplifying the code for (un)registering the
dependencies and even created a prototype of how he envisioned it (see
20faf1c), I have now redone everthing on that basis. Even if the end
result doesn't quite match Julian's commit, I think it makes sense not
to implement it 1-1 as Julian suggested for a variaty reasons. Most of
the code where I personally think a clarifying comment is needed, I have
provided detailed ones justifying why something was done that way.
The previous wasn't per-se wrong, but it was way too inefficient. With
this commit each and every Checkable is going to be visited only once,
and we won't traverse the same Checkable's children multiple times
somewhere in the dependency chain.
@yhabteab yhabteab force-pushed the icingadb-dependencies-sync branch from dc5da1b to 215057f Compare February 3, 2025 07:35
Comment on lines +59 to +113
DependencyGroup::Ptr Checkable::AddDependency(const Dependency::Ptr& dependency)
{
std::lock_guard lock(m_DependencyMutex);
DependencyGroup::Ptr newGroup(new DependencyGroup(dependency->GetRedundancyGroup(), dependency));
if (auto it(m_DependencyGroups.find(newGroup)); it == m_DependencyGroups.end()) {
// If the current Checkable is already started (all local dependency groups are already pushed
// to the global registry), we need to directly forward newGroup to the global register.
m_DependencyGroups.emplace(m_DependencyGroupsPushedToRegistry ? DependencyGroup::Register(newGroup) : newGroup);
} else if (!m_DependencyGroupsPushedToRegistry) {
// If we're not going to refresh the global registry, we just need to add the dependency to the existing group.
// Meaning, the dependency group itself isn't registered globally yet, so we don't need to re-register it.
(*it)->AddMember(dependency);
return *it;
} else {
if (auto existingGroup(*it); existingGroup->HasIdenticalMember(dependency)) {
// There's already an identical member in the group and this is likely an exact duplicate of it,
// so it won't change the identity of the group after registration, i.e. regardless whether we're
// supposed to refresh the global registry or not it's identity will remain the same.
existingGroup->AddMember(dependency);
} else {
// We're going to either replace the existing group with "newGroup" or merge the two groups together.
// Either way, it's identity will change, so we need to decouple it from the current Checkable.
m_DependencyGroups.erase(it);

// We need to unregister the existing group from the global registry if we're the only member of it,
// as it's hash might change after adding the new dependency to it down below, and we want to re-register
// it afterwards. This way, we'll also be able to eliminate the possibility of having two identical groups
// in the registry that might occur due to the registration of the new dependency object below.
if (DependencyGroup::Unregister(existingGroup, this)) {
// The current Checkable is the only member of the group, so nothing to move around, just
// add the _duplicate_ dependency to the existing group. Duplicate in the sense that it's
// not identical to any of the existing members but similar enough to be part of the same
// group, i.e. same parent, maybe different period, state filter, etc.
existingGroup->AddMember(dependency);
m_DependencyGroups.emplace(DependencyGroup::Register(existingGroup));
} else {
// Obviously, the current Checkable is not the only member of the existing group, and it's going to
// have more members than the other child Checkables in that group after adding the new dependency
// to it. So, we need to move all the members this Checkable depends on to newGroup and leave the
// existing group as-is, i.e. it's identity won't change afterwards.
for (auto& member : existingGroup->GetMembers(this)) {
existingGroup->RemoveMember(member);
newGroup->AddMember(member);
}
m_DependencyGroups.emplace(DependencyGroup::Register(newGroup));
}

// In both of the above cases, the identity of the existing group is going to probably change,
// so we'll need to clean up all the database entries/relations referencing its old identity.
return existingGroup;
}
}

return nullptr;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That method has quite some complexity in it and some it feels it would better fit into the DependencyGroup class rather than Checkable. I'm not yet entirely sure what would be the best way here, hence I'm thinking out loud, let me know what you think: I'd consider adding a method soemthing like this to DependencyGroup:

DependencyGroup::Ptr Extend(const Dependency::Ptr&);

That would do one of two things:

  1. If the dependency group is not (yet) shared between multiple child checkables, it extends the group, changing its parent set and updating the registry accordingly (which might merge the group into another one).
  2. Otherwise, i.e. if the dependency group is already shared, it takes all Dependency objects belonging to the child plus the new one and moves them to another group, either a newly created one or an already existing one for the new set of parents.

In both cases, it returns a pointer to the group the child is now registered to. In case this is a different pointer than the old group, this means the child no longer belongs to that group and should update its references to it, similar to what you already do in PushDependencyGroupsToRegistry().

That might even allow to remove the delayed registration again, as that's pretty much the same operation as Checkable::AddDependency() currently does when m_DependencyGroupsPushedToRegistry = false, but with the registry already being aware of what happens.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case this is a different pointer than the old group, this means the child no longer belongs to that group and should update its references to it

Wouldn't that be exactly the same behaviour as the previous implementation before I changed all that to match your prototype?

That might even allow to remove the delayed registration again, as that's pretty much the same operation as Checkable::AddDependency() currently does when m_DependencyGroupsPushedToRegistry = false, but with the registry already being aware of what happens.

No, that's not the same thing! Wasn't all this (un)register reimplementation done so that we don't have to deal with shared and unshared dependency groups at startup and instead populate the groups only once per checkable? If we pass every single AddDependency() call to the global registry, it will pretty much end up having the same state as before (remember the white board discussion we had last time about this), so I don't quite see the benefit of this. Apart from that, is this...

I'd consider adding a method soemthing like this to DependencyGroup:

DependencyGroup::Ptr Extend(const Dependency::Ptr&);

... supposed to be implemented as a static member? Because if not, you probably won't want to touch anything related to the global registry within that method, as they are cross-referenced (the existing static members calls some non-static member methods of a particular group), we'll end up with some mysterious deadlocks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case this is a different pointer than the old group, this means the child no longer belongs to that group and should update its references to it

Wouldn't that be exactly the same behaviour as the previous implementation before I changed all that to match your prototype?

Maybe, I'm not sure (might also be because I didn't fully understand the old code). I wouldn't rule out that we might even end up with something similar to what that old RefreshRegistry() function did, but then hopefully structured in way that makes it easier to understand. On the other hand, you said that changing all of this gave you a big performance boost and I wouldn't expect much of a slowdown from what I'm currently suggesting.

My prototype had a significant difference to the current state of the PR: the checkable still tracked all its dependencies individually which allowed it to regroup everything by redundancy group as needed. The code currently in the PR moves all these references into a DependencyGroup object, so some kind of interaction with that class is necessary to retrieve the other members of a redundancy group that gets an additional member.

Another random though would be a unregister method that unregisters a specific child from a dependency group, returning the corresponding dependency objects. The newly added dependency objects can than be added to the dependencies, forming a new group that will then be registered.

That might even allow to remove the delayed registration again, as that's pretty much the same operation as Checkable::AddDependency() currently does when m_DependencyGroupsPushedToRegistry = false, but with the registry already being aware of what happens.

No, that's not the same thing! Wasn't all this (un)register reimplementation done so that we don't have to deal with shared and unshared dependency groups at startup and instead populate the groups only once per checkable? If we pass every single AddDependency() call to the global registry, it will pretty much end up having the same state as before (remember the white board discussion we had last time about this), so I don't quite see the benefit of this.

That boils down to the question how much overhead the incremental registration operations introduce. In case of my prototype, loading each dependency object regrouped all the dependencies of that checkable by the redundancy group attribute and updated all its dependency group objects with the registry. That would have hugely benefited from doing that only once per checkable. When there's a more efficient incremental operation, that might not be necessary, but locking could even make enough of a difference on its own (i.e. that you'd only have to lock the registry once per checkable, not once per dependency).

Apart from that, is this...

I'd consider adding a method soemthing like this to DependencyGroup:

DependencyGroup::Ptr Extend(const Dependency::Ptr&);

... supposed to be implemented as a static member? Because if not, you probably won't want to touch anything related to the global registry within that method, as they are cross-referenced (the existing static members calls some non-static member methods of a particular group), we'll end up with some mysterious deadlocks.

I though of whether this should be a static method or not, but did not explicitly write something about it in my comment as my thinking was that it could be either, whatever works better.

Comment on lines +343 to +344
if (seenChildren.find(checkable) == seenChildren.end()) {
seenChildren.emplace(checkable);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

insert() already returns whether it actually inserted a new value, so both of this can be done with a single call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants