-
Notifications
You must be signed in to change notification settings - Fork 790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Telemetry refactor #4026
Telemetry refactor #4026
Conversation
0ae5ba8
to
2650cc0
Compare
# Conflicts: # nano/lib/stats.cpp # nano/lib/stats.hpp # nano/node/node.cpp # nano/node/telemetry.cpp # nano/slow_test/node.cpp
# Conflicts: # nano/node/node.cpp
# Conflicts: # nano/lib/stats.cpp # nano/lib/stats.hpp # nano/node/telemetry.cpp # nano/node/telemetry.hpp
This is not easy to review mainly because the previous design was messy and it is hard to know what changes with the new design. The new design seems simple and beautiful. It is hard to say if any problems will arise due to the change in behaviour but the changes seem compatible. I wonder why we broadcast telemetries since we regularly request them anyway. The RPC used to apparently wait for a telemetry to arrive whereas now we return immediately. The frequency of telemetry requests and broadcasts seem excessive. |
ASSERT_TIMELY (10s, 1 == node_server->stats.count (nano::stat::type::message, nano::stat::detail::telemetry_ack, nano::stat::dir::in)); | ||
} | ||
|
||
namespace nano |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this test removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like I forgot to do that check. Ideally the genesis blocks should be checked during handshake and prevent connection to mismatched peers altogether, but for now doing it in telemetry seems like a must have workaround.
# Conflicts: # nano/core_test/network.cpp
2468f30
to
8a24a49
Compare
You can ask the same question in reverse, why request telemetries if they are broadcasted regularly anyway? Handling telemetry requests introduces some complications (caching local telemetry, rate limiting) that are not present if we simply periodically broadcast telemetry to all peers. For now we need to do both because it's a transition period, but long term moving to broadcast only mode will simplify things. |
OK, broadcast-only makes sense. |
This PR significantly simplifies the telemetry class. The previous implementation was quite complex, with deep nested callbacks, which led to some subtle and hard to debug bugs. Hopefully this implementation will be less susceptible to such behavior.
One major behavior change is that instead of only replying to telemetry requests, we now broadcast our own telemetry periodically. This can be taken further in subsequent node versions, removing the need for active telemetry requests will remove additional complexity from message handlers that now have to do additional checks just to handle telemetry replies.
For reviewing it's probably easiest to view the end result code instead of comparing diff, as majority of the class has changed.