Sync dependencies to Redis #10290

yhabteab · 2025-01-13T08:31:02Z

SSIA 🤪! Just kidding! This PR synchronises all the required points from Icinga/icingadb#347 (comment) to Redis. However, I'm not going to explain every implementation detail of the PR here, but it can be roughly understood as follows.

0af4679 and fe8169a syncs the host/service.affected_children and state.affects_children attributes to Redis as described in Track effect of an object on dependent children #10158.
19862bd Adds the no_user_modify flag to the redundancy_group attribute in the dependency.ti file to prevent any runtime alteration of its value, as there is now too much logic and functionality depending on this value and changing it at runtime would have a disastrous end.
9f4655b Introduces a new auxiliary class DependencyGroup to easily group and manage identical dependencies of any checkable in one place. Yes, this is also used to group non-redundant dependencies, but such a group is entirely unique for each checkable and is never referenced by other checkables.
6638f36 Introduces a global shared registry for the dependency groups. In addition, here is also where all the dependency deduplication magic happens.
f478e5b Introduces an auxiliary method for determining the state of any DependencyGroup at any given time as described in Let redundancy groups not just fail #10190.
49e6ae2 and 2ea7e59 Dumps all dependency-related information to Redis at Icinga 2 startup/reload.
1fa8840 Same as the above two commits but only for dependencies runtime state updates.
e75d884 Processes runtime removed/created dependency objects.
ae269a8 Removes the obsolete failedDependency parameter of the Checkable::IsReachable() method. It is obsolete because none of the callers make use of it, it just adds unnecessary complexity to the method for no reason.
90f20b8 Changes the implementation of the Checkable::IsReachable() method and utilises the DependencyGroup::GetState() method introduced above.
6e7157e In order to handle runtime removed/created dependencies without additional overhead, the activation_priority of the Dependency object is set to -10 (just like for downtime objects). This way, the dependency objects will always get activated before their child/parent Checkables.

fixes #10158
fixes #10190
fixes #10227
fixes #10014
fixes #10143

julianbrost

I'm somewhat confused by the DependencyGroup class as it doesn't really map to the mental model I had from our discussions on that topic.

So in my understanding, a DependencyGroup would represent a set a set of checkables that are used as parents in dependency config objects combined with the attributes from that dependency object that affect how the availability of that dependency is determined, i.e. ignore_soft_states, period, and states. For dependencies without a redundancy group, that information is all that's needed to determine if all dependency objects that share the parent and these attribute mark all their children as unreachable. With a redundancy group, you have to look at all the parents from the redundancy group with the three aforementioned additional attributes. So that would be how to determine what creates a DependencyGroup object for redundancy groups.

For dependencies without a redundancy group, this grouping provides no value in itself, the dependency objects can be considered individually. There are two reasons why we might instantiate such trivial groups explicitly nonetheless: for one, it may allow simpler code by being able to treat both cases consistently, but more importantly, there was a request from Johannes that if two children depend on the same parent in such a way that the state of these dependencies is always the same (i.e. the three aforementioned attributes are identical), then the different graph edges should refer to the same shared state. These groups may be used for this deduplication as well.

Consider the following example (P = parent checkable, RG = redundancy group as represented in the generated graph, C = child checkable):

graph BT;
p1((P1));
p2((P2));
p3((P3));
c1((C1));
c2((C2));
c3((C3));
c4((C4));
c5((C5));
rg1(RG1);
c1-->rg1;
c2-->rg1;
c3-->rg1;
rg1-->p1;
rg1-->p2;
c4-->p3;
c5-->p3;

Here I'd expect the creation of the following two DependencyGroups (... refers to the three magic attributes attached to the parent in the corresponding dependency objects):

{(P1, ...), (P2, ...)}: This basically represents RG1
{(P3, ...)}: This is a if there was an imaginary second redundancy with only one parent, P3.

lib/icingadb/icingadb-objects.cpp

lib/icinga/dependency.cpp

lib/icinga/dependency.hpp

lib/icinga/dependency.cpp

lib/icinga/dependency.hpp

This does not work in this state! Trying to refresh Dependency if a Host or Service being member of this Dependency has a state change.

lib/icinga/checkable-dependency.cpp

lib/icinga/dependency.hpp

lib/icinga/checkable-dependency.cpp

The previous limit (32) doesn't seem to make sense, and appears to be some random number. So, this limit is set to 256 to match the limit in IsReachable().

julianbrost · 2025-01-31T13:25:38Z

lib/icinga/checkable-dependency.cpp

+ * Note: Re-registering the very same dependency groups to global registry will crash the process with superior error
+ * messages that aren't easy to debug, so make sure to never call this method more than once for the same Checkable.


I wonder what that "superior" (that has to be sarcasm) error message is. I'd expect it to end up there:

icinga2/lib/icinga/dependency-group.cpp

Line 214 in 9c678be

VERIFY(this != dest); // Prevent from doing something stupid, i.e. deadlocking ourselves.

Should anything else actually use m_DependencyGroups before PushDependencyGroupsToRegistry() was called? Like I'm wondering whether there is a particular reason to use m_DependencyGroups to store both registered and pending/not-yet-registered groups? Or is this just an optimization to avoid adding another class member (that will then remain unused after the Checkable was fully initialized)?

I wonder what that "superior" (that has to be sarcasm) error message is. I'd expect it to end up there:

Yes, it will end up there and crashes with something like this:

[2025-01-31 16:38:01 +0100] information/ConfigItem: Triggering Start signal for config items /Users/yhabteab/Workspace/icinga2/lib/icinga/dependency-group.cpp:214: assertion failed: this != dest Caught SIGABRT. Current time: 2025-01-31 16:38:02 +0100

How can you easily trace this back to that being called multiple times? I just wanted to add some hints so that you won't be surprised if it mysteriously crashes, but if you're just concerned about this comment, I can remove it.

Should anything else actually use m_DependencyGroups before PushDependencyGroupsToRegistry() was called?

If by anything else you mean the AddDependency() method, then yes, because nobody else has to deal with the checkable dependency groups before it is actually activated.

Like I'm wondering whether there is a particular reason to use m_DependencyGroups to store both registered and pending/not-yet-registered groups?

As I said above, nobody ever needs to access the checkable dependency groups before the PushDependencyGroupsToRegistry() method is called apart from the AddDependency() method, so I don't need to keep a separate list of pending/non-pending groups.

I have now added a simple boolean flag to eliminate such an unnecessary crash, i.e. the method can now be called multiple times either intentionally or not and will not terminate the process.

but if you're just concerned about this comment, I can remove it.

I don't have a problem with a comment stating that this should be called only once per object. I just think that "superior error messages that aren't easy to debug" is a somewhat strange wording to convey this. I'm not sure, maybe there's something lost in translation?

I have now added a simple boolean flag to eliminate such an unnecessary crash, i.e. the method can now be called multiple times either intentionally or not and will not terminate the process.

Is that the only purpose of that flag? If so, you could also consider doing something like if (this == dest) { return; /* moving members to ourselves is a no-op */ } instead of VERIFY(this != dest);.

I'm not sure, maybe there's something lost in translation?

Translation 😅? No, I didn't translate that! I intentionally used those phrases because the unittests were failing with some strange error messages, that I didn't initially know why. So I added this little hint, but it turned out that the tests were failing due to the now non-existent uninitialised std::unordered_set hash and equal callbacks (which were used as function pointers and not functors).

Is that the only purpose of that flag?

No. It’s also used within the AddDependency() method.

If so, you could also consider doing something like if (this == dest) { return; /* moving members to ourselves is a no-op */ } instead of VERIFY(this != dest);.

That's the thing, I don't want to silently ignore such stupid usages here and like to fail hard all the time, as such things should only happen due to unnoticed bugs and never intentionally.

julianbrost · 2025-01-31T13:33:30Z

lib/icinga/checkable-dependency.cpp

+ *
+ * @return DependencyGroup::Ptr The dependency group that has been modified.
+ */
+DependencyGroup::Ptr Checkable::AddDependency(const Dependency::Ptr& dependency, bool refreshGlobalRegistry)


If I understand this correctly, refreshGlobalRegistry = true means that everything is done immediately and refreshGlobalRegistry = false means to delay parts of the registration until PushDependencyGroupsToRegistry() is called. Shouldn't the Checkable class be able to determine internally what's necessary? It's the class that calls PushDependencyGroupsToRegistry() in the end and in one place, refreshGlobalRegistry is set based on what a method of Checkable calls already:

icinga2/lib/icinga/dependency.cpp

Line 209 in 9c678be

auto modifiedGroup(m_Child->AddDependency(this, m_Child->IsActive()));

Shouldn't the Checkable class be able to determine internally what's necessary?

I have completely missed that!

julianbrost · 2025-01-31T14:16:22Z

lib/icinga/checkable-dependency.cpp

 	for (const Checkable::Ptr& checkable : children) {
-		std::set<Checkable::Ptr> cChildren = checkable->GetChildren();
-
-		if (!cChildren.empty()) {
+		if (auto cChildren(checkable->GetChildren()); !cChildren.empty()) {
 			GetAllChildrenInternal(cChildren, level + 1);
 			localChildren.insert(cChildren.begin(), cChildren.end());
 		}

-		localChildren.insert(checkable);
+		if (level != 0) { // Recursion level 0 is the initiator, so checkable is already in the set.
+			localChildren.insert(checkable);
+		}
 	}


I'm not sure about these changes. They look like they don't really change anything. The first change just uses C++17 syntax and the second change only skips the insert of an element that should already be in the set, so the insert does nothing already?

If changing something here, I'd rather give the function a complete makeover: currently it seems like it would visit checkables multiple times which could lead to its runtime complexity exploding for adverse dependency graphs.

Otherwise, it would require too much code changes to properly handle redundancy group runtime modification in Icinga DB for no real benefit.

@jbrost

Co-Authored-By: Julian Brost <[email protected]> Since @jbrost suggested simplifying the code for (un)registering the dependencies and even created a prototype of how he envisioned it (see 20faf1c), I have now redone everthing on that basis. Even if the end result doesn't quite match Julian's commit, I think it makes sense not to implement it 1-1 as Julian suggested for a variaty reasons. Most of the code where I personally think a clarifying comment is needed, I have provided detailed ones justifying why something was done that way.

The previous wasn't per-se wrong, but it was way too inefficient. With this commit each and every Checkable is going to be visited only once, and we won't traverse the same Checkable's children multiple times somewhere in the dependency chain.

julianbrost · 2025-02-03T11:11:07Z

lib/icinga/checkable-dependency.cpp

+DependencyGroup::Ptr Checkable::AddDependency(const Dependency::Ptr& dependency)
+{
+	std::lock_guard lock(m_DependencyMutex);
+	DependencyGroup::Ptr newGroup(new DependencyGroup(dependency->GetRedundancyGroup(), dependency));
+	if (auto it(m_DependencyGroups.find(newGroup)); it == m_DependencyGroups.end()) {
+		// If the current Checkable is already started (all local dependency groups are already pushed
+		// to the global registry), we need to directly forward newGroup to the global register.
+		m_DependencyGroups.emplace(m_DependencyGroupsPushedToRegistry ? DependencyGroup::Register(newGroup) : newGroup);
+	} else if (!m_DependencyGroupsPushedToRegistry) {
+		// If we're not going to refresh the global registry, we just need to add the dependency to the existing group.
+		// Meaning, the dependency group itself isn't registered globally yet, so we don't need to re-register it.
+		(*it)->AddMember(dependency);
+		return *it;
+	} else {
+		if (auto existingGroup(*it); existingGroup->HasIdenticalMember(dependency)) {
+			// There's already an identical member in the group and this is likely an exact duplicate of it,
+			// so it won't change the identity of the group after registration, i.e. regardless whether we're
+			// supposed to refresh the global registry or not it's identity will remain the same.
+			existingGroup->AddMember(dependency);
+		} else {
+			// We're going to either replace the existing group with "newGroup" or merge the two groups together.
+			// Either way, it's identity will change, so we need to decouple it from the current Checkable.
+			m_DependencyGroups.erase(it);
+
+			// We need to unregister the existing group from the global registry if we're the only member of it,
+			// as it's hash might change after adding the new dependency to it down below, and we want to re-register
+			// it afterwards. This way, we'll also be able to eliminate the possibility of having two identical groups
+			// in the registry that might occur due to the registration of the new dependency object below.
+			if (DependencyGroup::Unregister(existingGroup, this)) {
+				// The current Checkable is the only member of the group, so nothing to move around, just
+				// add the _duplicate_ dependency to the existing group. Duplicate in the sense that it's
+				// not identical to any of the existing members but similar enough to be part of the same
+				// group, i.e. same parent, maybe different period, state filter, etc.
+				existingGroup->AddMember(dependency);
+				m_DependencyGroups.emplace(DependencyGroup::Register(existingGroup));
+			} else {
+				// Obviously, the current Checkable is not the only member of the existing group, and it's going to
+				// have more members than the other child Checkables in that group after adding the new dependency
+				// to it. So, we need to move all the members this Checkable depends on to newGroup and leave the
+				// existing group as-is, i.e. it's identity won't change afterwards.
+				for (auto& member : existingGroup->GetMembers(this)) {
+					existingGroup->RemoveMember(member);
+					newGroup->AddMember(member);
+				}
+				m_DependencyGroups.emplace(DependencyGroup::Register(newGroup));
+			}
+
+			// In both of the above cases, the identity of the existing group is going to probably change,
+			// so we'll need to clean up all the database entries/relations referencing its old identity.
+			return existingGroup;
+		}
+	}
+
+	return nullptr;
+}


That method has quite some complexity in it and some it feels it would better fit into the DependencyGroup class rather than Checkable. I'm not yet entirely sure what would be the best way here, hence I'm thinking out loud, let me know what you think: I'd consider adding a method soemthing like this to DependencyGroup:

DependencyGroup::Ptr Extend(const Dependency::Ptr&);

That would do one of two things:

If the dependency group is not (yet) shared between multiple child checkables, it extends the group, changing its parent set and updating the registry accordingly (which might merge the group into another one).

Otherwise, i.e. if the dependency group is already shared, it takes all Dependency objects belonging to the child plus the new one and moves them to another group, either a newly created one or an already existing one for the new set of parents.

In both cases, it returns a pointer to the group the child is now registered to. In case this is a different pointer than the old group, this means the child no longer belongs to that group and should update its references to it, similar to what you already do in PushDependencyGroupsToRegistry().

That might even allow to remove the delayed registration again, as that's pretty much the same operation as Checkable::AddDependency() currently does when m_DependencyGroupsPushedToRegistry = false, but with the registry already being aware of what happens.

In case this is a different pointer than the old group, this means the child no longer belongs to that group and should update its references to it

Wouldn't that be exactly the same behaviour as the previous implementation before I changed all that to match your prototype?

That might even allow to remove the delayed registration again, as that's pretty much the same operation as Checkable::AddDependency() currently does when m_DependencyGroupsPushedToRegistry = false, but with the registry already being aware of what happens.

No, that's not the same thing! Wasn't all this (un)register reimplementation done so that we don't have to deal with shared and unshared dependency groups at startup and instead populate the groups only once per checkable? If we pass every single AddDependency() call to the global registry, it will pretty much end up having the same state as before (remember the white board discussion we had last time about this), so I don't quite see the benefit of this. Apart from that, is this...

I'd consider adding a method soemthing like this to DependencyGroup:

DependencyGroup::Ptr Extend(const Dependency::Ptr&);

... supposed to be implemented as a static member? Because if not, you probably won't want to touch anything related to the global registry within that method, as they are cross-referenced (the existing static members calls some non-static member methods of a particular group), we'll end up with some mysterious deadlocks.

In case this is a different pointer than the old group, this means the child no longer belongs to that group and should update its references to it

Wouldn't that be exactly the same behaviour as the previous implementation before I changed all that to match your prototype?

Maybe, I'm not sure (might also be because I didn't fully understand the old code). I wouldn't rule out that we might even end up with something similar to what that old RefreshRegistry() function did, but then hopefully structured in way that makes it easier to understand. On the other hand, you said that changing all of this gave you a big performance boost and I wouldn't expect much of a slowdown from what I'm currently suggesting.

My prototype had a significant difference to the current state of the PR: the checkable still tracked all its dependencies individually which allowed it to regroup everything by redundancy group as needed. The code currently in the PR moves all these references into a DependencyGroup object, so some kind of interaction with that class is necessary to retrieve the other members of a redundancy group that gets an additional member.

Another random though would be a unregister method that unregisters a specific child from a dependency group, returning the corresponding dependency objects. The newly added dependency objects can than be added to the dependencies, forming a new group that will then be registered.

That might even allow to remove the delayed registration again, as that's pretty much the same operation as Checkable::AddDependency() currently does when m_DependencyGroupsPushedToRegistry = false, but with the registry already being aware of what happens.

No, that's not the same thing! Wasn't all this (un)register reimplementation done so that we don't have to deal with shared and unshared dependency groups at startup and instead populate the groups only once per checkable? If we pass every single AddDependency() call to the global registry, it will pretty much end up having the same state as before (remember the white board discussion we had last time about this), so I don't quite see the benefit of this.

That boils down to the question how much overhead the incremental registration operations introduce. In case of my prototype, loading each dependency object regrouped all the dependencies of that checkable by the redundancy group attribute and updated all its dependency group objects with the registry. That would have hugely benefited from doing that only once per checkable. When there's a more efficient incremental operation, that might not be necessary, but locking could even make enough of a difference on its own (i.e. that you'd only have to lock the registry once per checkable, not once per dependency).

Apart from that, is this...

I'd consider adding a method soemthing like this to DependencyGroup:

DependencyGroup::Ptr Extend(const Dependency::Ptr&);

... supposed to be implemented as a static member? Because if not, you probably won't want to touch anything related to the global registry within that method, as they are cross-referenced (the existing static members calls some non-static member methods of a particular group), we'll end up with some mysterious deadlocks.

I though of whether this should be a static method or not, but did not explicitly write something about it in my comment as my thinking was that it could be either, whatever works better.

julianbrost · 2025-02-03T11:18:56Z

lib/icinga/checkable-dependency.cpp

+		if (seenChildren.find(checkable) == seenChildren.end()) {
+			seenChildren.emplace(checkable);


insert() already returns whether it actually inserted a new value, so both of this can be done with a single call.

cla-bot bot added the cla/signed label Jan 13, 2025

yhabteab mentioned this pull request Jan 13, 2025

Checkable: Don't skip redundancy group checks for parent dependencies #10228

Closed

yhabteab force-pushed the icingadb-dependencies-sync branch from 95a27d3 to 17ba7c9 Compare January 15, 2025 16:28

julianbrost reviewed Jan 17, 2025

View reviewed changes

yhabteab force-pushed the icingadb-dependencies-sync branch from 17ba7c9 to a1175d1 Compare January 20, 2025 07:17

julianbrost reviewed Jan 20, 2025

View reviewed changes

lib/icinga/dependency.hpp Outdated Show resolved Hide resolved

yhabteab force-pushed the icingadb-dependencies-sync branch 2 times, most recently from 763d77c to bc82c04 Compare January 22, 2025 12:05

yhabteab mentioned this pull request Jan 24, 2025

Icinga DB state_type and is_acknowledged keys in redis don't match with what's inserted into the DB #9427

Open

IcingaDB: Start keeping track of Host/Service to Dependency relationship

9e0f750

This does not work in this state! Trying to refresh Dependency if a Host or Service being member of this Dependency has a state change.

yhabteab force-pushed the icingadb-dependencies-sync branch from bc82c04 to 2889822 Compare January 27, 2025 07:56

yhabteab added this to the 2.15.0 milestone Jan 27, 2025

yhabteab force-pushed the icingadb-dependencies-sync branch 2 times, most recently from 11c6498 to 78a0a29 Compare January 29, 2025 10:09

julianbrost reviewed Jan 30, 2025

View reviewed changes

yhabteab added 2 commits January 30, 2025 15:11

Checkable: Introduce GetAllChildrenCount() method

5fc78da

The previous limit (32) doesn't seem to make sense, and appears to be some random number. So, this limit is set to 256 to match the limit in IsReachable().

IcingaDB: Add affected_children to Host/Service Redis updates

66c6291

yhabteab force-pushed the icingadb-dependencies-sync branch from 4e5e2c8 to 9c678be Compare January 30, 2025 15:18

yhabteab requested a review from julianbrost January 30, 2025 15:24

julianbrost reviewed Jan 31, 2025

View reviewed changes

yhabteab force-pushed the icingadb-dependencies-sync branch from 9c678be to e083b31 Compare January 31, 2025 15:51

yhabteab added 9 commits February 3, 2025 08:21

IcingaDB: Sync affects_children as part of runtime state updates

b126721

Dependency: Don't allow to change redundancy_group at runtime

0cd7083

Otherwise, it would require too much code changes to properly handle redundancy group runtime modification in Icinga DB for no real benefit.

Introduce DependencyGroup helper class

584cd66

DependencyGroup: Add a global registry & deduplication logic

a973877

Checkable: Drop unused failedDependency argument from IsReachable()

f2f8359

Add DependencyGroup::GetState() helper method

94f12dd

Checkable: Store dependencies grouped by their redundancy group

b16aee1

IcingaDB: Dump checkables dependencies config to redis correctly

2946f55

IcingaDB: Sync dependencies states to Redis

7c6b871

yhabteab added 13 commits February 3, 2025 08:21

IcingaDB: Sync dependencies initial states on config dump

1efa838

IcingaDB: Handle runtime removed dependencies correctly

bb25825

Checkable: Use redundancy groups state in IsReachable

572bc14

tests: Add unittests for the redundancy groups registry

2424e07

IcingaDB: Bump expected redis version to 6

767afaa

Activate Dependency objects before their parent objects

643bfa9

tests: Add the new unittests to the CMakefile.txt

dc95bad

IcingaDB: Send reachablity state updates for all children recursively

f39a96e

Fix & adjust dependencies unittests

ad58c3a

Add basic unittests for bulk group registration

7e14d17

IcingaDB: Fix dependency runtime deletion & creation

31c1591

yhabteab force-pushed the icingadb-dependencies-sync branch from dc5da1b to 215057f Compare February 3, 2025 07:35

julianbrost reviewed Feb 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync dependencies to Redis #10290

Sync dependencies to Redis #10290

yhabteab commented Jan 13, 2025 •

edited

Loading

julianbrost left a comment

julianbrost Jan 31, 2025

yhabteab Jan 31, 2025 •

edited

Loading

yhabteab Feb 3, 2025

julianbrost Feb 3, 2025

yhabteab Feb 3, 2025

julianbrost Jan 31, 2025

yhabteab Jan 31, 2025

yhabteab Feb 3, 2025

julianbrost Jan 31, 2025

yhabteab Feb 3, 2025

julianbrost Feb 3, 2025

yhabteab Feb 3, 2025

julianbrost Feb 3, 2025

julianbrost Feb 3, 2025

		* Note: Re-registering the very same dependency groups to global registry will crash the process with superior error
		* messages that aren't easy to debug, so make sure to never call this method more than once for the same Checkable.

		if (seenChildren.find(checkable) == seenChildren.end()) {
		seenChildren.emplace(checkable);

Sync dependencies to Redis #10290

Are you sure you want to change the base?

Sync dependencies to Redis #10290

Conversation

yhabteab commented Jan 13, 2025 • edited Loading

julianbrost left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yhabteab Jan 31, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yhabteab commented Jan 13, 2025 •

edited

Loading

yhabteab Jan 31, 2025 •

edited

Loading