-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ping/pong heartbeating to WSClient, and fix concurrent map on config #420
Conversation
Signed-off-by: Peter Broadhurst <[email protected]>
Codecov Report
@@ Coverage Diff @@
## main #420 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 277 277
Lines 14858 14939 +81
=========================================
+ Hits 14858 14939 +81
Continue to review full report at Codecov.
|
Signed-off-by: Peter Broadhurst <[email protected]>
Signed-off-by: Peter Broadhurst <[email protected]>
E2e failed due to #423 |
Signed-off-by: Peter Broadhurst <[email protected]>
Signed-off-by: Peter Broadhurst <[email protected]>
I added a fix to #298 into this PR, as I saw it pop on an e2e test. I introduced a separate |
defer w.heartbeatMux.Unlock() | ||
|
||
if isPong && w.activePingSent != nil { | ||
log.L(w.ctx).Debugf("WS %s heartbeat completed (pong) after %.2fms", w.url, float64(time.Since(*w.activePingSent))/float64(time.Millisecond)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this will get very chatty... But I guess it's at debug level.
Primary fix in this PR, is to add a heartbeat to the
WSClient
connections, to allow quicker detection of a case where the websocket between FireFly Core and a connector (EthConnect, FabConnect, DX) fails silently.The default is 30 seconds between sending
ping
packets, with a 30 second timeout.The feature can be disabled by setting the
heartbeatInterval
to zero on a given websocket client configuration.Example log output showing operation of the heartbeating:
Also added a fix for this intermittent UT failure, which could hit real environments - by changing the locking in
config
to cover all cases of get/set concurrency: