WARN: "update about identity with same prefix as ours, declaring it down" #30

jeromegn · 2023-08-29T12:24:00Z

In our setup, I'm cleanly leaving the cluster by using leave_cluster and waiting 2 seconds w/ the hope that the update propagates to as many nodes as possible.

Since leave_cluster moves the Foca instance, we can't call handle_data and such on it anymore. We've lost control of it. Does foca still handle dispatching the leave message thoroughly?

We're often restarting the cluster w/ a concurrency of 6 (or more) nodes at a time. I figure it's possible for nodes to not receive the leave / down message and consider this node as up. So when they start again, they might apply_many an up state for the node and it might be outdated.

For example, there's no way to store the current state of the cluster past the leave_cluster call, therefore as other nodes are also leaving at the same time, we'll have stored the wrong identity for them.

When there's a deploy (and therefore a restart), we keep getting these log lines:

WARN: update about identity with same prefix as ours, declaring it down ...

I know these are mostly harmless, but I wonder if there's a way to either avoid them or to reduce their log level.

The text was updated successfully, but these errors were encountered:

caio · 2023-08-31T07:03:33Z

Yeah, foca does gossip when leaving the cluster but consuming the instance on leave_cluster wasn't a good decision. It's cute, but one should still be able to poke at the members, flush the updates backlog and even rejoin the cluster (via reuse_down_identity or just changing the identity and announcing) if they like- there's no good reason to prevent that.

I think we should make leave_cluster take &mut self instead of consuming self. Then if you wanna try harder at disseminating your final update(s) you could call into gossip() as many times as you want (updates_backlog() will help here)

As for the traces: ack. I'm quite unhappy with their state atm- I think everything debug level and lower is ok, but the other ones have a tendency of getting in the way and going down the route of filtering traces via subscribers and whatnot is annoying af. I'll try to come up with something less unpleasant in the future. For now, I'll lower the level of this to Debug.

I'll try to ship these along with something for #28 early this w/e

caio · 2023-09-03T10:35:08Z

I've shipped v0.14.0 with a rework on traces. Now foca never emits anything higher than DEBUG. On this level, only high level traces are emitted (membership changes, probes, etc); And the TRACE level exposes the innards (messages being sent, timer events, etc)

And as I write this I realize I forgot about changing leave_cluster 😅 I suspect you won't care much given that you won't be seeing the noise anymore heh (I'll still change it)

jeromegn · 2023-09-03T22:54:56Z

I won't see the noise anymore, however I do notice double the cluster size for a while when I restart my cluster. I'm sort of attributing that to the unclean leave.

caio · 2023-09-04T09:00:42Z

hmm... The leave is definitely not unclean and since the higher counts go down after a while it suggests that the knowledge persists in the cluster. I've released v0.15.0 right now so you can see if gossiping a bunch after leave_cluster helps, but I'm skeptical: as soon as someone thinks a node is dead they start ignoring their messages, so chances are the extra messages will be discarded.

I wonder if persisting recent exists would help here: idk how you're doing the store/load of the state during restarts but if you have a global storage, you could also save the "node X decided to leave at time T" then you can feed the recent ones to foca.apply_many just like you do for the active members.

jeromegn · 2023-09-04T11:57:48Z

The simplest way to explain how we do it is: before leaving the cluster, we iterate all members and store their serialized state per identity in sqlite. On start we only pick Alive or Down states and apply_many for them. It's replacing them entirely within a transaction so there shouldn't be any stray members.

I wonder if persisting recent exists would help here: idk how you're doing the store/load of the state during restarts but if you have a global storage, you could also save the "node X decided to leave at time T" then you can feed the recent ones to foca.apply_many just like you do for the active members.

What do you mean by this? I can add a timestamp to the persisted states for sure. Or are you saying something else?

caio · 2023-09-04T14:07:19Z

Heh sorry it got confusing since I jumped straight to an attempt at solving the problem.

Let's say you're taking members A and B out of the cluster for a restart and doing the state save + load that you describe. By the time they come back B is now B' and a is A' however B' thinks that A is still alive (and A' thinks that B is). So they feed this information back to the cluster.

^ That's where I believe your double counting is coming from. And then when A' learns about the cluster thinks that A is alive the noise starts. Thankfully the cluster recovers nicely after a while since we learned our lesson with previous bugs 😀

So one solution to minimize this would be to store in a table that everyone can access when you're leaving. Say, you write {timestamp, identity} to a table and you can use this information during the restart of any instance. It's ok if there's some replication delay because of said self correcting behaviour

caio · 2023-09-04T18:00:21Z

Since we're talking about persisting the Down state, what I think would be best is:

new functionality in foca to expose down members
you persist the whole state (or at least Alive and Down members; whatever you choose for Suspect)
you use some heuristic to prune down members that you are persisting somehow (just to prevent it growing forever)

less moving pieces on your side and pretty trivial to expose in foca

forgetting the down members during a restart reenables all the issues we had with double counting again

jeromegn · 2023-09-04T19:09:22Z

So one solution to minimize this would be to store in a table that everyone can access when you're leaving. Say, you write {timestamp, identity} to a table and you can use this information during the restart of any instance. It's ok if there's some replication delay because of said self correcting behaviour

Turns out the thing that uses foca is an eventually consistent data store. Fairly easy to add some data that gets replicated.

What's odd to me is: when I'm restarting the cluster, every current known identity is going down, assuredly. The order is random though, and members seeing a new state for a left member is also random. So I don't really know what the solution is except that maybe it could be a new type of message. When leaving, send the new down state to every other member, not just a few random members.

I'm now using QUIC instead of UDP / TCP, so updates are pretty reliable (pretty much as reliable as TCP). I figure all nodes would get the "leave" message.

Ultimately, I wouldn't have to use apply_many if the cluster discovered itself much faster. Perhaps that's what needs to be sent more? new identities?

caio · 2023-09-05T08:07:03Z

What's odd to me is: when I'm restarting the cluster, every current known identity is going down, assuredly. The order is random though, and members seeing a new state for a left member is also random. So I don't really know what the solution is except that maybe it could be a new type of message. When leaving, send the new down state to every other member, not just a few random members.

I'm pretty sure the knowledge is getting disseminated correctly and fast enough. The problem is not that they never learn that the member went down, is that they forget :)

When you're restarting the cluster, this scenario from a couple of comments ago repeats itself multiple times:

Let's say you're taking members A and B out of the cluster for a restart and doing the state save + load that you describe. By the time they come back B is now B' and a is A' however B' thinks that A is still alive (and A' thinks that B is). So they feed this information back to the cluster.

(But instead of 2 members, it's whatever your rolling restart batch size is)

It doesn't look like a problem when you think about the first time this happens, but in the second batch of nodes the problem becomes evident:

Let's say you're restarting your cluster in 3 batches. B1, the first one, just completed and now you're doing B2. At this point in time you have:

B1 has the knowledge as described in my previous comment: its view of the cluster is good but its view of the members in the same batch is outdated
B2 has just restarted, so has forgotten everything about down nodes

So when B2 starts coming back online, if a node from B1 talks to B2 and still has updates in its backlog, B2 will think that a identity that just declared itself as down is actually alive.

The larger the batch size and the number of batches, the higher the likelihood that it reintroduces down nodes as alive.

If you persist the down members the problem mostly disappears, only the nodes within a batch will (possibly) have outdated knowledge.

To be clear: the problem here is asymmetric knowledge of the terminal (Down) state of identities. It's the same problem we had before with forgetting down members too early due to configuration.

So, getting back to the meat of it: you're trying to speed up discovery of cluster members. The reason foca doesn't do this very well is because it's limited to a maximum packet size. You, however, aren't.

I think you should consider the approach that memberlist uses: periodically, a member connects to another member and they do a full synch (i.e.: member A sends its full list of members, including down ones; member B applies these to its own state THEN sends its full list back to A). It's very similar to what foca does with announce, but having a proper connection between members enables giving the whole state and ensuring a reply.

Whether you stick to the current approach or try a different one, foca can facilitate this by exposing the full state directly (think foca.iter_members(), but without filtering for liveness); I'll get this done

jeromegn · 2023-09-11T19:59:11Z

I think you should consider the approach that memberlist uses: periodically, a member connects to another member and they do a full synch (i.e.: member A sends its full list of members, including down ones; member B applies these to its own state THEN sends its full list back to A). It's very similar to what foca does with announce, but having a proper connection between members enables giving the whole state and ensuring a reply.

Whether you stick to the current approach or try a different one, foca can facilitate this by exposing the full state directly (think foca.iter_members(), but without filtering for liveness); I'll get this done

Would this help on batch restarts? I suppose it could self correct by merging states in a way to dismisses a lot of "bad" identities?

caio · 2023-10-01T10:58:12Z

Huh I thought I'd released v0.16.0 and replied to this before. Apologies.

Doing this sync between live members pretty much eliminates any knowledge disparity because it guarantees that the nodes will have the exact same state.

I understand the need to converge to the full cluster size as fast as possible so you'll have to address the problem somehow.

Possible approaches. Useful for any scenario related to converging
state:

Persist the full state and reload it after restart (I've shipped v0.16.0 that facilitates this a bit). The batch that's currently being restarted will still have outdated knowledge (of each other), which you may choose to ignore (letting foca self correct) or inject a correction (if you know which addresses were restarted it's not difficult to find out which identities need to be declared down)
Never persist data and exchange knowledge only between live nodes (v0.16.0 helps here the same way). This is the push-pull thing that memberlist does: a request-response cycle where members share their full state with each other
Ignore the problem and use your identities for self correction: if there's a timestamp or something else that grows monotonically (per address at least), you can teach your nodes something like: "whenever there's a new member, check if there's an identity using it, declare the oldest one down". You might want to use this as a way to correct the stale knowledge from the the first approach

My tiny cluster uses an identity with a timestamp (I golfed it and actually has a bug if I restart during a year change but I digress 😅 ), so I go for the last approach as it'd rather not introduce tcp or something similar here

jeromegn · 2023-10-04T12:40:58Z

Thanks, that's helpful!

I'm tempted to try the timestamp technique. That should work fine for us.

you can teach your nodes something like: "whenever there's a new member, check if there's an identity using it, declare the oldest one down"

Do you mean declare it down with Foca or declare it down internally (like the a Members map)?

caio · 2023-10-05T06:27:51Z

Do you mean declare it down with Foca or declare it down internally (like the a Members map)?

With foca. The idea here is that someone is spreading this knowledge to the cluster, some might have learned it already and you want it to stop; So you teach foca the correct state and it disseminates.

(As soon as you do the foca.apply_many(...) dance the MemberDown notification will fire and your client will update the Members map accordingly so there's nothing to worry about on that end)

jeromegn · 2023-10-05T12:36:04Z

I think the timestamp change helped! The cluster seems to eventually coalesce to the same number of members for each node.

I've also started using the new iter_membership_state, that might've helped too.

How can I tell foca that a member is definitely down? I don't think there's a mechanism to do that.

caio · 2023-10-06T06:34:51Z

How can I tell foca that a member is definitely down? I don't think there's a mechanism to do that.

You've just done it by feeding the output of iter_membership_state to apply_many :) Any member you insert using this becomes part of the distributed state and Down is a state members can't transition out of so:

foca.apply_many(core::iter::once(
    Member::new(identity_to_kill, Incarnation::default(), State::Down)
), &mut runtime)

Makes foca declare identity_to_kill as down for the whole cluster. That identity won't be able to be used again, the only thing that node can do is change its own id to rejoin.

caio · 2024-03-07T09:56:06Z

closing stale issues that seem resolved. feel free to reopen

caio mentioned this issue Oct 1, 2023

Ever growing members #31

Closed

caio closed this as completed Mar 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WARN: "update about identity with same prefix as ours, declaring it down" #30

WARN: "update about identity with same prefix as ours, declaring it down" #30

jeromegn commented Aug 29, 2023 •

edited

Loading

caio commented Aug 31, 2023

caio commented Sep 3, 2023

jeromegn commented Sep 3, 2023

caio commented Sep 4, 2023

jeromegn commented Sep 4, 2023

caio commented Sep 4, 2023

caio commented Sep 4, 2023

jeromegn commented Sep 4, 2023

caio commented Sep 5, 2023

jeromegn commented Sep 11, 2023

caio commented Oct 1, 2023

jeromegn commented Oct 4, 2023

caio commented Oct 5, 2023

jeromegn commented Oct 5, 2023 •

edited

Loading

caio commented Oct 6, 2023

caio commented Mar 7, 2024

WARN: "update about identity with same prefix as ours, declaring it down" #30

WARN: "update about identity with same prefix as ours, declaring it down" #30

Comments

jeromegn commented Aug 29, 2023 • edited Loading

caio commented Aug 31, 2023

caio commented Sep 3, 2023

jeromegn commented Sep 3, 2023

caio commented Sep 4, 2023

jeromegn commented Sep 4, 2023

caio commented Sep 4, 2023

caio commented Sep 4, 2023

jeromegn commented Sep 4, 2023

caio commented Sep 5, 2023

jeromegn commented Sep 11, 2023

caio commented Oct 1, 2023

jeromegn commented Oct 4, 2023

caio commented Oct 5, 2023

jeromegn commented Oct 5, 2023 • edited Loading

caio commented Oct 6, 2023

caio commented Mar 7, 2024

jeromegn commented Aug 29, 2023 •

edited

Loading

jeromegn commented Oct 5, 2023 •

edited

Loading