Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: dont gossip to disconnected nodes if none are connected #1042

Merged
merged 1 commit into from
Nov 30, 2023

Conversation

njgheorghita
Copy link
Collaborator

@njgheorghita njgheorghita commented Nov 27, 2023

What was wrong?

From a conversation on discord, disconnected nodes should only be used within the routing table, and shouldn't be exposed in exterior processes. Responsibility for pinging disconnected nodes falls within the scope of the routing table, and nowhere else.

How was it fixed?

Removed usage of disconnected nodes from our gossip / rfc processes.

To-Do

@njgheorghita njgheorghita changed the title chore: dont gossip to disconnected nodes if there are no connected no… chore: dont gossip to disconnected nodes if none are connected Nov 27, 2023
@njgheorghita njgheorghita force-pushed the dont-gossip-disconnected branch from 1047c77 to e3adcda Compare November 27, 2023 17:04
@njgheorghita njgheorghita marked this pull request as ready for review November 27, 2023 17:26
@njgheorghita njgheorghita requested review from carver, mrferris, ogenev and KolbyML and removed request for carver and mrferris November 27, 2023 17:27
@njgheorghita njgheorghita self-assigned this Nov 27, 2023
KolbyML

This comment was marked as duplicate.

Copy link
Member

@KolbyML KolbyML left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one more case missing

    fn init_find_nodes_query(
        &mut self,
        target: &NodeId,
        callback: Option<oneshot::Sender<Vec<Enr>>>,
    ) -> Option<QueryId> {
        let target_key = Key::from(*target);


        let mut closest_enrs = self.closest_connected_nodes(&target_key, self.query_num_results);


        // All nodes may be disconnected at boot, before they respond to pings.
        // If no connected nodes, use the closest nodes regardless of connectivity.
        if closest_enrs.is_empty() {
            closest_enrs = self
                .kbuckets
                .write()
                .closest_values(&target_key)
                .map(|closest| closest.value.enr)
                .take(self.query_num_results)
                .collect();
            // `closest_values` return is empty if querying our own Node ID
            if closest_enrs.is_empty() {
                closest_enrs = self.table_entries_enr();
            }
        }

Should be changed to

    fn init_find_nodes_query(
        &mut self,
        target: &NodeId,
        callback: Option<oneshot::Sender<Vec<Enr>>>,
    ) -> Option<QueryId> {
        let target_key = Key::from(*target);

        let closest_enrs = self.closest_connected_nodes(&target_key, self.query_num_results);

        if closest_enrs.is_empty() {
            // If there are no nodes whatsoever in the routing table the query cannot proceed.
            warn!("No nodes in routing table, find nodes query cannot proceed.");
            if let Some(callback) = callback {
                let _ = callback.send(vec![]);
            }
            return None;
        }

Or something like that

Copy link
Member

@ogenev ogenev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@njgheorghita njgheorghita force-pushed the dont-gossip-disconnected branch 2 times, most recently from 6b0e6df to c815599 Compare November 28, 2023 17:28
@njgheorghita
Copy link
Collaborator Author

@KolbyML Well, I agree with your suggestion, especially if we're following the convention that "disconnected" nodes should not be a concept outside of the routing table. However, making your suggested change does seem to have broken a few of our tests. At first glance, it seems like on startup, our routing table is designed to expect to operate w/ disconnected nodes. And the assumption is that as the initial find nodes queries process, they'll become connected.

On one hand, we have this convention that we're trying to enforce "no disconnected nodes outside of the routing table" - on the other hand, we have broken tests and possibly significant amount of work to strictly enforce this convention.

Imo, we should revert your suggested change, and track it in an issue so that it doesn't get dropped, and we can merge the other fixes asap. Thoughts?

@KolbyML
Copy link
Member

KolbyML commented Nov 28, 2023

@njgheorghita the failing test appears to be for the same reason hive tests will fail, just merge the PR without the change and I will make a new one with the fix for the test.

My priority is to move fast and fix bugs, so it is fine to merge and create an issue for this kind of thing. It might be a good idea to mark these kind of issues with a shelf life and low priority tag as well

@njgheorghita
Copy link
Collaborator Author

@KolbyML I strongly disagree that we should be merging broken tests into master

@KolbyML
Copy link
Member

KolbyML commented Nov 29, 2023

@KolbyML I strongly disagree that we should be merging broken tests into master

I am not saying to merge with broken tests. I said the failing test appears to be for the same reason hive tests will fail, just merge the PR without the change

i.e. remove the change which broke the tests then merge this PR. Then I will make a new PR with the changes "which broke the test" and fix the tests. I said nothing about merging failing CI code into master.

This isn't a issue with this PR this is a issue with how the tests were written is it not? and being written under the assumption of "gossip to disconnected nodes if none are connected".

we have broken tests and possibly significant amount of work to strictly enforce this convention. ... Imo, we should revert your suggested change, and track it in an issue so that it doesn't get dropped, and we can merge the other fixes asap. Thoughts?

I said that is fine. I will just fix the miswritten tests in a follow up PR. I agreed with this and said I would fix it after this is merged.

@KolbyML I strongly disagree that we should be merging broken tests into master

I am not sure where this is coming from, maybe confusion over text which happens. But if something is unclear saying strongly disaggree for something I didn't say doesn't feel right.

@njgheorghita njgheorghita force-pushed the dont-gossip-disconnected branch from c815599 to 0f0a3fd Compare November 30, 2023 15:27
@njgheorghita
Copy link
Collaborator Author

njgheorghita commented Nov 30, 2023

@KolbyML My bad, I just misunderstood your comment, sorry if it came across a bit harsh. When I read just merge the PR without the change and I will make a new one with the fix for the test. I interpreted it as saying don't make any changes to the pr as is (eg. merge it with failing tests) and you'll correct the failing tests in a following pr. Now I understand that the change you were referencing was actually the change that broke the tests. Again sorry for the misunderstanding, communication over github can be tricky at times.

@KolbyML
Copy link
Member

KolbyML commented Nov 30, 2023 via email

@njgheorghita njgheorghita merged commit 01168cf into ethereum:master Nov 30, 2023
@njgheorghita njgheorghita deleted the dont-gossip-disconnected branch November 30, 2023 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants